How to extract the nth word and count word occurrences in a MySQL string

Question

I would like to have a mysql query like this   select  lt second word in text gt  word  count    from table group by word    All the regex examples in mysql are used to query if the text matches the expression  but not to extract text out of an expression  Is there such a syntax

User · Answer

I used Brendan Bullen s answer as a starting point for a similar issue I had which was to retrive the value of a specific field in a JSON string  However  like I commented on his answer  it is not entirely accurate  If your left boundary isn t just a space like in the original question  then the discrepancy increases   Corrected solution   SUBSTRING      sentence      LOCATE      sentence    1      LOCATE      sentence   LOCATE      sentence    1   - LOCATE      sentence  - 1     The two differences are the  1 in the SUBSTRING index parameter and the -1 in the length parameter   For a more general solution to  find the first occurence of a string between two provided boundaries    SUBSTRING      haystack      LOCATE   lt leftBoundary gt    haystack    CHAR LENGTH   lt leftBoundary gt         LOCATE            lt rightBoundary gt            haystack          LOCATE   lt leftBoundary gt    haystack    CHAR LENGTH   lt leftBoundary gt              -  LOCATE   lt leftBoundary gt    haystack    CHAR LENGTH   lt leftBoundary gt

User · Answer

As others have said  mysql does not provide regex tools for extracting sub-strings   That s not to say you can t have them though if you re prepared to extend mysql with user-defined functions   https   github com mysqludf lib mysqludf preg  That may not be much help if you want to distribute your software  being an impediment to installing your software  but for an in-house solution it may be appropriate

User · Answer

According to http   dev mysql com  the SUBSTRING function uses start position then the length so surely the function for the second word would be    SUBSTRING sentence LOCATE     sentence   LOCATE     LOCATE     sentence  -LOCATE     sentence

User · Answer

The following is a proposed solution for the OP s specific problem  extracting the 2nd word of a string   but it should be noted that  as mc0e s answer states  actually extracting regex matches is not supported out-of-the-box in MySQL  If you really need this  then your choices are basically to 1  do it in post-processing on the client  or 2  install a MySQL extension to support it     BenWells has it very almost correct  Working from his code  here s a slightly adjusted version   SUBSTRING    sentence    LOCATE      sentence    CHAR LENGTH         LOCATE      sentence      LOCATE      sentence    1   -   LOCATE      sentence    CHAR LENGTH            As a working example  I used   SELECT SUBSTRING    sentence    LOCATE      sentence    CHAR LENGTH         LOCATE      sentence      LOCATE      sentence    1   -   LOCATE      sentence    CHAR LENGTH          as string FROM  SELECT  THIS IS A TEST  AS sentence  temp   This successfully extracts the word IS

User · Answer

The field s value is      - DE-HEB 20  - DTopTen 1 2   SELECT      SUBSTRING INDEX SUBSTRING INDEX DesctosAplicados   DE-HEB     -1    -   1  DE-HEB   SUBSTRING INDEX SUBSTRING INDEX DesctosAplicados   DTopTen     -1    -   1  DTopTen    FROM TABLA    Result is     DE-HEB       DTopTEn     20           1 2

User · Answer

No  there isn t a syntax for extracting text using regular expressions  You have to use the ordinary string manipulation functions   Alternatively select the entire value from the database  or the first n characters if you are worried about too much data transfer  and then use a regular expression on the client

User · Answer

I don t think such a thing is possible  You can use SUBSTRING function to extract the part you want

User · Answer

My home-grown regular expression replace function can be used for this   Demo  See this DB-Fiddle demo  which returns the second word   I   from a famous sonnet and the number of occurrences of it  1    SQL  Assuming MySQL 8 or later is being used  to allow use of a Common Table Expression   the following will return the second word and the number of occurrences of it   WITH cte AS        SELECT digits idx              SUBSTRING INDEX SUBSTRING INDEX words       digits idx   1        -1  word      FROM       SELECT reg replace UPPER txt                                    a-zA-Z-                                                            TRUE                           1                           0  AS words       FROM tbl  delimited      INNER JOIN       SELECT  row     row   1 as idx FROM         SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9  t1         SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9  t2          SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9  t3          SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9  t4          SELECT  row    -1  t5  digits      ON LENGTH REPLACE words              lt   LENGTH words  - digits idx  SELECT c word         subq occurrences FROM cte c LEFT JOIN     SELECT word           COUNT    AS occurrences   FROM cte   GROUP BY word   subq ON c word   subq word WHERE idx   1     idx is zero-based so 1 here gets the second word      Explanation  A few tricks are used in the SQL above and some accreditation is needed  Firstly the regular expression replacer is used to replace all continuous blocks of non-word characters - each being replaced by a single tilda     character  Note  A different character could be chosen instead if there is any possibility of a tilda appearing in the text   The technique from this answer is then used for transforming a string with delimited values into separate row values  It s combined with the clever technique from this answer for generating a table consisting of a sequence of incrementing numbers  0 - 10 000 in this case

User · Answer

Shorter option to extract the second word in a sentence   SELECT SUBSTRING INDEX SUBSTRING INDEX  THIS IS A TEST         2        -1  as FoundText   MySQL docs for SUBSTRING INDEX

[mysql] How to extract the nth word and count word occurrences in a MySQL string?

Examples related to mysql

Examples related to regex

Examples related to word-count