MySQL: How to get an excerpt of a string if you only know how the excerpt start and ends
The problem
Let's say we have the texts of some old books in a database table.
We want to find the part of a book that begins with "Marley was dead" and ends with "whatever about that". We don't know anything else. We don't know what the book is or the author.
The solution
The solution is to use this query:
SELECT
CONCAT(
'start of string',
SUBSTRING_INDEX(
SUBSTRING_INDEX(`Text`, 'start of string', -1),
'end of string',
1
),
'end of string'
)
FROM
Table
Here's the query if we put in the values from the scenario.
SELECT
CONCAT(
'Marley was dead',
SUBSTRING_INDEX(
SUBSTRING_INDEX(`Text`, 'Marley was dead', -1),
'whatever about that',
1
),
'whatever about that'
)
FROM
Table
If we had A Christmas Carol in our database, then this query would be expected to return:
Marley was dead: to begin with. There is no doubt whatever about that
The query looks a bit complicated but I'll explain it bit by bit.
SUBSTRING_INDEX
First, what does the SUBSTRING_INDEX part do? What this function does is return a substring of a string before a specified number of delimiter occurs. Take this line:
SUBSTRING_INDEX(`Text`, 'Marley was dead', -1),
Here, the specified delimiter is 'Marley was dead".
Okay, but what about the -1? With SUBSTRING_INDEX, -1 means that everything to the right of the delimiter will be returned instead of the left.
As w3schools says:
If it is a positive number, this function returns all to the left of the delimiter. If it is a negative number, this function returns all to the right of the delimiter. https://www.w3schools.com/sql/func_MySQL_substring_index.asp
So the line above will retrieve everything to the right of Marley was dead.
The second SUBSTRING_INDEX takes the result of the first SUBSTRING_INDEX and trims off everything before 'whatever about that':
SUBSTRING_INDEX(
SUBSTRING_INDEX(`Text`, 'Marley was dead', -1),
'whatever about that',
1
),
There is a problem however. If we run the above as a query, we would get
: to begin with. There is no doubt
It's missing the parts we searched for: "Marley was dead" and "whatever about that".
CONCAT
The solution is to use CONCAT to add these parts back on:
CONCAT(
'Marley was dead',
SUBSTRING_INDEX(
SUBSTRING_INDEX(`Text`, 'Marley was dead', -1),
'whatever about that',
1
),
'whatever about that'
)
And that explains the query.
In practice
In practice, I've used the query to find substrings within strings. So say I have a long string - a book chapter, for example - and I want to save an excerpt, I only need to save the first few words of the start and end of the excerpt. Then later I can find the full excerpt again with:
SELECT
CONCAT(
'start of excerpt',
SUBSTRING_INDEX(
SUBSTRING_INDEX(`Text`, 'start of excerpt', -1),
'end of excerpt',
1
),
'end of excerpt'
)
FROM
Table
Leave a comment