RegEx Grabbing values between quotation marks

Question

I have a value like this    Foo Bar   Another Value  something else   What regex will return the values enclosed in the quotation marks  e g  Foo Bar and Another Value

User · Answer

If you re trying to find strings that only have a certain suffix  such as dot syntax  you can try this                        localized  Where  localized is the suffix   Example   print  this is something I need to return  localized    so is this  localized    but this is not    It will capture  this is something I need to return  localized and  so is this  localized but not  but this is not

User · Answer

I would go for               The      is regex for any character except     The reason I use this over the non greedy many operator is that I have to keep looking that up just to make sure I get it correct

User · Answer

From Greg H  I was able to create this regex to suit my needs   I needed to match a specific value that was qualified by being inside quotes   It must be a full match  no partial matching could should trigger a hit   e g   test  could not match for  test2    reg   r           s  1    if re search reg  needle   haystack  re IGNORECASE       print  winning       Hunter

User · Answer

I ve been using the following with great success                      2     1   It supports nested quotes as well   For those who want a deeper explanation of how this works  here s an explanation from user ephemient              match a quote             2   if backslash exists  gobble it  and whether or not that happens  match a character     match many times  non-greedily  as to not eat the closing quote    1 match the same quote that was use for opening

User · Answer

MORE ANSWERS   Here is the solution i used            icon            TLDR  replace the word icon with what your looking for in said quotes and voila     The way this works is it looks for the keyword and doesn t care what else in between the quotes  EG  id  fb-icon  id  icon-close  id  large-icon-close  the regex looks for a quote mark   then it looks for any possible group of letters thats not   until it finds icon and any possible group of letters that is not   it then looks for a closing

User · Answer

Lets see two efficient ways that deal with escaped quotes  These patterns are not designed to be concise nor aesthetic  but to be efficient   These ways use the first character discrimination to quickly find quotes in the string without the cost of an alternation   The idea is to discard quickly characters that are not quotes without to test the two branches of the alternation    Content between quotes is described with an unrolled loop  instead of a repeated alternation  to be more efficient too                          Obviously to deal with strings that haven t balanced quotes  you can use possessive quantifiers instead                           or a workaround to emulate them  to prevent too much backtracking  You can choose too that a quoted part can be an opening quote until the next  non-escaped  quote or the end of the string  In this case there is no need to use possessive quantifiers  you only need to make the last quote optional   Notice  sometimes quotes are not escaped with a backslash but by repeating the quote  In this case the content subpattern looks like this                     The patterns avoid the use of a capture group and a backreference  I mean something like             1  and use a simple alternation but with      at the beginning  in factor   Perl like             lt             s                  lt             s                   note that   s      is a syntactic sugar to switch on the dotall singleline mode inside the non-capturing group  If this syntax is not supported you can easily switch this mode on for all the pattern or replace the dot with   s S     The way this pattern is written is totally  hand-driven  and doesn t take account of eventual engine internal optimizations   ECMA script                             s S                           s S               POSIX extended                  n                           n              or simply                   n                    n

User · Answer

echo  junk  Foo Bar  not empty one    this  but this  and this neither    sed  s                             gt  1 lt  g    This will result in   Foo Bar lt   lt  but this lt   Here I showed the result string between   lt  s for clarity  also using the non-greedy version with this sed command we first throw out the junk before and after that    s and then replace this with the part between the    s and surround this by   lt  s

User · Answer

I liked Axeman s more expansive version  but had some trouble with it  it didn t match for example  foo  string    string  bar   or  foo  string1    bar    string2    correctly  so I tried to fix it          opening quote                    repeat  non-greedy  so we don t span multiple strings                    anything  except not the opening quote  and not           a backslash  which are handled separately             1                         consume any double backslash  unnecessary                                             Allow backslash to escape characters                            same character as opening quote  1

User · Answer

Unlike Adam s answer  I have a simple but worked one               1      1   And just add parenthesis if you want to get content in quotes like this                1       1   Then  1 matches quote char and  2 matches content string

User · Answer

In general  the following regular expression fragment is what you are looking for             This uses the non-greedy    operator to capture everything up to but not including the next double quote  Then  you use a language-specific mechanism to extract the matched text   In Python  you could do    gt  gt  gt  import re  gt  gt  gt  string     Foo Bar   Another Value    gt  gt  gt  print re findall r           string    Foo Bar    Another Value

User · Answer

This version   accounts for escaped quotes controls backtracking                 1                           1

User · Answer

In general  the following regular expression fragment is what you are looking for             This uses the non-greedy    operator to capture everything up to but not including the next double quote  Then  you use a language-specific mechanism to extract the matched text   In Python  you could do    gt  gt  gt  import re  gt  gt  gt  string     Foo Bar   Another Value    gt  gt  gt  print re findall r           string    Foo Bar    Another Value

User · Answer

string       foo bar     loloo    print re findall r          string    just try this out   works like a charm        indicates skip character

User · Answer

The RegEx of accepted answer returns the values including their sourrounding quotation marks   Foo Bar  and  Another Value  as matches   Here are RegEx which return only the values between quotation marks  as the questioner was asking for    Double quotes only  use value of capture group  1                  Single quotes only  use value of capture group  1                   Both  use value of capture group  2                      1  -  All support escaped and nested quotes

User · Answer

From Greg H  I was able to create this regex to suit my needs   I needed to match a specific value that was qualified by being inside quotes   It must be a full match  no partial matching could should trigger a hit   e g   test  could not match for  test2    reg   r           s  1    if re search reg  needle   haystack  re IGNORECASE       print  winning       Hunter

User · Answer

I liked Eugen Mihailescu s solution to match the content between quotes whilst allowing to escape quotes  However  I discovered some problems with escaping and came up with the following regex to fix them                1            1   It does the trick and is still pretty simple and easy to maintain   Demo  with some more test-cases  feel free to use it and expand on it       PS  If you just want the content between quotes in the full match   0   and are not afraid of the performance penalty use      lt         b        1               1    Unfortunately  without the quotes as anchors  I had to add a boundary  b which does not play well with spaces and non-word boundary characters after the starting quote   Alternatively  modify the initial version by simply adding a group and extract the string form  2                 1             1   PPS  If your focus is solely on efficiency  go with Casimir et Hippolyte s solution  it s a good one

User · Answer

All the answer above are good     except they DOES NOT support all the unicode characters  at ECMA Script  Javascript   If you are a Node users  you might want the the modified version of accepted answer that support all unicode characters        lt      lt    s                             2        1  gmu   Try here

User · Answer

string       foo bar     loloo    print re findall r          string    just try this out   works like a charm        indicates skip character

User · Answer

I ve been using the following with great success                      2     1   It supports nested quotes as well   For those who want a deeper explanation of how this works  here s an explanation from user ephemient              match a quote             2   if backslash exists  gobble it  and whether or not that happens  match a character     match many times  non-greedily  as to not eat the closing quote    1 match the same quote that was use for opening

User · Answer

The RegEx of accepted answer returns the values including their sourrounding quotation marks   Foo Bar  and  Another Value  as matches   Here are RegEx which return only the values between quotation marks  as the questioner was asking for    Double quotes only  use value of capture group  1                  Single quotes only  use value of capture group  1                   Both  use value of capture group  2                      1  -  All support escaped and nested quotes

User · Answer

A supplementary answer for the subset of Microsoft VBA coders only one uses the library Microsoft VBScript Regular Expressions 5 5 and this gives the following code  Sub TestRegularExpression        Dim oRE As VBScript RegExp 55 RegExp       Tools- gt References  Microsoft VBScript Regular Expressions 5 5     Set oRE   New VBScript RegExp 55 RegExp      oRE Pattern                        oRE Global   True      Dim sTest As String     sTest      Foo Bar     Another Value   something else       Debug Assert oRE test sTest       Dim oMatchCol As VBScript RegExp 55 MatchCollection     Set oMatchCol   oRE Execute sTest      Debug Assert oMatchCol Count   2      Dim oMatch As Match     For Each oMatch In oMatchCol         Debug Print oMatch SubMatches 0       Next oMatch  End Sub

User · Answer

I would go for               The      is regex for any character except     The reason I use this over the non greedy many operator is that I have to keep looking that up just to make sure I get it correct

User · Answer

The pattern                    2     1 above does the job but I am concerned of its performances  it s not bad but could be better   Mine below it s  20  faster   The pattern         is just incomplete  My advice for everyone reading this is just DON T USE IT      For instance it cannot capture many strings  if needed I can provide an exhaustive test-case  like the one below       string    How are you  I  m fine  thank you     The rest of them are just as  good  as the one above   If you really care both about performance and precision then start with the one below               1       1 gm  In my tests it covered every string I met but if you find something that doesn t work I would gladly update it for you   Check my pattern in an online regex tester

User · Answer

A very late answer  but like to answer       w s        http   regex101 com r cB0kB8 1

User · Answer

In general  the following regular expression fragment is what you are looking for             This uses the non-greedy    operator to capture everything up to but not including the next double quote  Then  you use a language-specific mechanism to extract the matched text   In Python  you could do    gt  gt  gt  import re  gt  gt  gt  string     Foo Bar   Another Value    gt  gt  gt  print re findall r           string    Foo Bar    Another Value

User · Answer

This version   accounts for escaped quotes controls backtracking                 1                           1

User · Answer

I ve been using the following with great success                      2     1   It supports nested quotes as well   For those who want a deeper explanation of how this works  here s an explanation from user ephemient              match a quote             2   if backslash exists  gobble it  and whether or not that happens  match a character     match many times  non-greedily  as to not eat the closing quote    1 match the same quote that was use for opening

User · Answer

echo  junk  Foo Bar  not empty one    this  but this  and this neither    sed  s                             gt  1 lt  g    This will result in   Foo Bar lt   lt  but this lt   Here I showed the result string between   lt  s for clarity  also using the non-greedy version with this sed command we first throw out the junk before and after that    s and then replace this with the part between the    s and surround this by   lt  s

User · Answer

I ve been using the following with great success                      2     1   It supports nested quotes as well   For those who want a deeper explanation of how this works  here s an explanation from user ephemient              match a quote             2   if backslash exists  gobble it  and whether or not that happens  match a character     match many times  non-greedily  as to not eat the closing quote    1 match the same quote that was use for opening

User · Answer

I would go for               The      is regex for any character except     The reason I use this over the non greedy many operator is that I have to keep looking that up just to make sure I get it correct

User · Answer

In general  the following regular expression fragment is what you are looking for             This uses the non-greedy    operator to capture everything up to but not including the next double quote  Then  you use a language-specific mechanism to extract the matched text   In Python  you could do    gt  gt  gt  import re  gt  gt  gt  string     Foo Bar   Another Value    gt  gt  gt  print re findall r           string    Foo Bar    Another Value

User · Answer

Unlike Adam s answer  I have a simple but worked one               1      1   And just add parenthesis if you want to get content in quotes like this                1       1   Then  1 matches quote char and  2 matches content string

User · Answer

For me worked this one                  1 i   I ve used in a sentence like this one   preg match all                1 i    cont   matches     and it worked great

User · Answer

Lets see two efficient ways that deal with escaped quotes  These patterns are not designed to be concise nor aesthetic  but to be efficient   These ways use the first character discrimination to quickly find quotes in the string without the cost of an alternation   The idea is to discard quickly characters that are not quotes without to test the two branches of the alternation    Content between quotes is described with an unrolled loop  instead of a repeated alternation  to be more efficient too                          Obviously to deal with strings that haven t balanced quotes  you can use possessive quantifiers instead                           or a workaround to emulate them  to prevent too much backtracking  You can choose too that a quoted part can be an opening quote until the next  non-escaped  quote or the end of the string  In this case there is no need to use possessive quantifiers  you only need to make the last quote optional   Notice  sometimes quotes are not escaped with a backslash but by repeating the quote  In this case the content subpattern looks like this                     The patterns avoid the use of a capture group and a backreference  I mean something like             1  and use a simple alternation but with      at the beginning  in factor   Perl like             lt             s                  lt             s                   note that   s      is a syntactic sugar to switch on the dotall singleline mode inside the non-capturing group  If this syntax is not supported you can easily switch this mode on for all the pattern or replace the dot with   s S     The way this pattern is written is totally  hand-driven  and doesn t take account of eventual engine internal optimizations   ECMA script                             s S                           s S               POSIX extended                  n                           n              or simply                   n                    n

User · Answer

echo  junk  Foo Bar  not empty one    this  but this  and this neither    sed  s                             gt  1 lt  g    This will result in   Foo Bar lt   lt  but this lt   Here I showed the result string between   lt  s for clarity  also using the non-greedy version with this sed command we first throw out the junk before and after that    s and then replace this with the part between the    s and surround this by   lt  s

User · Answer

I liked Eugen Mihailescu s solution to match the content between quotes whilst allowing to escape quotes  However  I discovered some problems with escaping and came up with the following regex to fix them                1            1   It does the trick and is still pretty simple and easy to maintain   Demo  with some more test-cases  feel free to use it and expand on it       PS  If you just want the content between quotes in the full match   0   and are not afraid of the performance penalty use      lt         b        1               1    Unfortunately  without the quotes as anchors  I had to add a boundary  b which does not play well with spaces and non-word boundary characters after the starting quote   Alternatively  modify the initial version by simply adding a group and extract the string form  2                 1             1   PPS  If your focus is solely on efficiency  go with Casimir et Hippolyte s solution  it s a good one

User · Answer

MORE ANSWERS   Here is the solution i used            icon            TLDR  replace the word icon with what your looking for in said quotes and voila     The way this works is it looks for the keyword and doesn t care what else in between the quotes  EG  id  fb-icon  id  icon-close  id  large-icon-close  the regex looks for a quote mark   then it looks for any possible group of letters thats not   until it finds icon and any possible group of letters that is not   it then looks for a closing

User · Answer

Peculiarly  none of these answers produce a regex where the returned match is the text inside the quotes  which is what is asked for  MA-Madden tries but only gets the inside match as a captured group rather than the whole match  One way to actually do it would be       lt        b               2        1    Examples for this can be seen in this demo https   regex101 com r Hbj8aP 1  The key here is the the positive lookbehind at the start  the   lt     and the positive lookahead at the end  the      The lookbehind is looking behind the current character to check for a quote  if found then start from there and then the lookahead is checking the character ahead for a quote and if found stop on that character  The lookbehind group  the       is wrapped in brackets to create a group for whichever quote was found at the start  this is then used at the end lookahead     1  to make sure it only stops when it finds the corresponding quote    The only other complication is that because the lookahead doesn t actually consume the end quote  it will be found again by the starting lookbehind which causes text between ending and starting quotes on the same line to be matched  Putting a word boundary on the opening quote       b  helps with this  though ideally I d like to move past the lookahead but I don t think that is possible  The bit allowing escaped characters in the middle I ve taken directly from Adam s answer

User · Answer

For me worked this one                  1 i   I ve used in a sentence like this one   preg match all                1 i    cont   matches     and it worked great

User · Answer

A very late answer  but like to answer       w s        http   regex101 com r cB0kB8 1

User · Answer

This version   accounts for escaped quotes controls backtracking                 1                           1

User · Answer

Peculiarly  none of these answers produce a regex where the returned match is the text inside the quotes  which is what is asked for  MA-Madden tries but only gets the inside match as a captured group rather than the whole match  One way to actually do it would be       lt        b               2        1    Examples for this can be seen in this demo https   regex101 com r Hbj8aP 1  The key here is the the positive lookbehind at the start  the   lt     and the positive lookahead at the end  the      The lookbehind is looking behind the current character to check for a quote  if found then start from there and then the lookahead is checking the character ahead for a quote and if found stop on that character  The lookbehind group  the       is wrapped in brackets to create a group for whichever quote was found at the start  this is then used at the end lookahead     1  to make sure it only stops when it finds the corresponding quote    The only other complication is that because the lookahead doesn t actually consume the end quote  it will be found again by the starting lookbehind which causes text between ending and starting quotes on the same line to be matched  Putting a word boundary on the opening quote       b  helps with this  though ideally I d like to move past the lookahead but I don t think that is possible  The bit allowing escaped characters in the middle I ve taken directly from Adam s answer

User · Answer

echo  junk  Foo Bar  not empty one    this  but this  and this neither    sed  s                             gt  1 lt  g    This will result in   Foo Bar lt   lt  but this lt   Here I showed the result string between   lt  s for clarity  also using the non-greedy version with this sed command we first throw out the junk before and after that    s and then replace this with the part between the    s and surround this by   lt  s

User · Answer

I would go for               The      is regex for any character except     The reason I use this over the non greedy many operator is that I have to keep looking that up just to make sure I get it correct

User · Answer

All the answer above are good     except they DOES NOT support all the unicode characters  at ECMA Script  Javascript   If you are a Node users  you might want the the modified version of accepted answer that support all unicode characters        lt      lt    s                             2        1  gmu   Try here

User · Answer

This version   accounts for escaped quotes controls backtracking                 1                           1

User · Answer

If you re trying to find strings that only have a certain suffix  such as dot syntax  you can try this                        localized  Where  localized is the suffix   Example   print  this is something I need to return  localized    so is this  localized    but this is not    It will capture  this is something I need to return  localized and  so is this  localized but not  but this is not

User · Answer

A supplementary answer for the subset of Microsoft VBA coders only one uses the library Microsoft VBScript Regular Expressions 5 5 and this gives the following code  Sub TestRegularExpression        Dim oRE As VBScript RegExp 55 RegExp       Tools- gt References  Microsoft VBScript Regular Expressions 5 5     Set oRE   New VBScript RegExp 55 RegExp      oRE Pattern                        oRE Global   True      Dim sTest As String     sTest      Foo Bar     Another Value   something else       Debug Assert oRE test sTest       Dim oMatchCol As VBScript RegExp 55 MatchCollection     Set oMatchCol   oRE Execute sTest      Debug Assert oMatchCol Count   2      Dim oMatch As Match     For Each oMatch In oMatchCol         Debug Print oMatch SubMatches 0       Next oMatch  End Sub

User · Answer

I liked Axeman s more expansive version  but had some trouble with it  it didn t match for example  foo  string    string  bar   or  foo  string1    bar    string2    correctly  so I tried to fix it          opening quote                    repeat  non-greedy  so we don t span multiple strings                    anything  except not the opening quote  and not           a backslash  which are handled separately             1                         consume any double backslash  unnecessary                                             Allow backslash to escape characters                            same character as opening quote  1

User · Answer

The pattern                    2     1 above does the job but I am concerned of its performances  it s not bad but could be better   Mine below it s  20  faster   The pattern         is just incomplete  My advice for everyone reading this is just DON T USE IT      For instance it cannot capture many strings  if needed I can provide an exhaustive test-case  like the one below       string    How are you  I  m fine  thank you     The rest of them are just as  good  as the one above   If you really care both about performance and precision then start with the one below               1       1 gm  In my tests it covered every string I met but if you find something that doesn t work I would gladly update it for you   Check my pattern in an online regex tester

[regex] RegEx: Grabbing values between quotation marks

Examples related to regex