Paul Chris Jones's coding blog
PHP
JS

MySQL: How to get an excerpt of a string if you only know how the excerpt start and ends

25th October 2019

The problem

Let's say we have the texts of some old books in a database table.

We want to find the part of a book that begins with "Marley was dead" and ends with "whatever about that". We don't know anything else. We don't know what the book is or the author.

The solution

The solution is to use this query:

SELECT
    CONCAT(
        'start of string',
        SUBSTRING_INDEX(
            SUBSTRING_INDEX(`Text`, 'start of string', -1),
            'end of string',
            1
        ),
        'end of string'
    )
FROM
    Table

Here's the query if we put in the values from the scenario.

SELECT
    CONCAT(
        'Marley was dead',
        SUBSTRING_INDEX(
            SUBSTRING_INDEX(`Text`, 'Marley was dead', -1),
            'whatever about that',
            1
        ),
        'whatever about that'
    )
FROM
    Table

If we had A Christmas Carol in our database, then this query would be expected to return:

Marley was dead: to begin with. There is no doubt whatever about that

The query looks a bit complicated but I'll explain it bit by bit.

SUBSTRING_INDEX

First, what does the SUBSTRING_INDEX part do? What this function does is return a substring of a string before a specified number of delimiter occurs. Take this line:

SUBSTRING_INDEX(`Text`, 'Marley was dead', -1),

Here, the specified delimiter is 'Marley was dead".

Okay, but what about the -1? With SUBSTRING_INDEX, -1 means that everything to the right of the delimiter will be returned instead of the left.

As w3schools says:

If it is a positive number, this function returns all to the left of the delimiter. If it is a negative number, this function returns all to the right of the delimiter. https://www.w3schools.com/sql/func_MySQL_substring_index.asp

So the line above will retrieve everything to the right of Marley was dead.

The second SUBSTRING_INDEX takes the result of the first SUBSTRING_INDEX and trims off everything before 'whatever about that':

SUBSTRING_INDEX(
    SUBSTRING_INDEX(`Text`, 'Marley was dead', -1),
       'whatever about that',
    1
),

There is a problem however. If we run the above as a query, we would get

: to begin with. There is no doubt

It's missing the parts we searched for: "Marley was dead" and "whatever about that".

CONCAT

The solution is to use CONCAT to add these parts back on:

CONCAT(
    'Marley was dead',
    SUBSTRING_INDEX(
        SUBSTRING_INDEX(`Text`, 'Marley was dead', -1),
        'whatever about that',
        1
    ),
    'whatever about that'
)

And that explains the query.

In practice

In practice, I've used the query to find substrings within strings. So say I have a long string - a book chapter, for example - and I want to save an excerpt, I only need to save the first few words of the start and end of the excerpt. Then later I can find the full excerpt again with:

SELECT
    CONCAT(
        'start of excerpt',
        SUBSTRING_INDEX(
            SUBSTRING_INDEX(`Text`, 'start of excerpt', -1),
            'end of excerpt',
            1
        ),
        'end of excerpt'
    )
FROM
    Table

Leave a comment