[regex] Regex for quoted string with escaping quotes

How do I get the substring " It's big \"problem " using a regular expression?

s = ' function(){  return " It\'s big \"problem  ";  }';     

This question is related to regex escaping quotes

The answer is


If your IDE is IntelliJ Idea, you can forget all these headaches and store your regex into a String variable and as you copy-paste it inside the double-quote it will automatically change to a regex acceptable format.

example in Java:

String s = "\"en_usa\":[^\\,\\}]+";

now you can use this variable in your regexp or anywhere.


/"(?:[^"\\]|\\.)*"/

Works in The Regex Coach and PCRE Workbench.

Example of test in JavaScript:

_x000D_
_x000D_
    var s = ' function(){ return " Is big \\"problem\\", \\no? "; }';_x000D_
    var m = s.match(/"(?:[^"\\]|\\.)*"/);_x000D_
    if (m != null)_x000D_
        alert(m);
_x000D_
_x000D_
_x000D_


Messed around at regexpal and ended up with this regex: (Don't ask me how it works, I barely understand even tho I wrote it lol)

"(([^"\\]?(\\\\)?)|(\\")+)+"

One has to remember that regexps aren't a silver bullet for everything string-y. Some stuff are simpler to do with a cursor and linear, manual, seeking. A CFL would do the trick pretty trivially, but there aren't many CFL implementations (afaik).


If your IDE is IntelliJ Idea, you can forget all these headaches and store your regex into a String variable and as you copy-paste it inside the double-quote it will automatically change to a regex acceptable format.

example in Java:

String s = "\"en_usa\":[^\\,\\}]+";

now you can use this variable in your regexp or anywhere.


If it is searched from the beginning, maybe this can work?

\"((\\\")|[^\\])*\"

/"(?:[^"\\]++|\\.)*+"/

Taken straight from man perlre on a Linux system with Perl 5.22.0 installed. As an optimization, this regex uses the 'posessive' form of both + and * to prevent backtracking, for it is known beforehand that a string without a closing quote wouldn't match in any case.


"(?:\\"|.)*?"

Alternating the \" and the . passes over escaped quotes while the lazy quantifier *? ensures that you don't go past the end of the quoted string. Works with .NET Framework RE classes


One has to remember that regexps aren't a silver bullet for everything string-y. Some stuff are simpler to do with a cursor and linear, manual, seeking. A CFL would do the trick pretty trivially, but there aren't many CFL implementations (afaik).


here is one that work with both " and ' and you easily add others at the start.

("|')(?:\\\1|[^\1])*?\1

it uses the backreference (\1) match exactley what is in the first group (" or ').

http://www.regular-expressions.info/backref.html


As provided by ePharaoh, the answer is

/"([^"\\]*(\\.[^"\\]*)*)"/

To have the above apply to either single quoted or double quoted strings, use

/"([^"\\]*(\\.[^"\\]*)*)"|\'([^\'\\]*(\\.[^\'\\]*)*)\'/

/"(?:[^"\\]|\\.)*"/

Works in The Regex Coach and PCRE Workbench.

Example of test in JavaScript:

_x000D_
_x000D_
    var s = ' function(){ return " Is big \\"problem\\", \\no? "; }';_x000D_
    var m = s.match(/"(?:[^"\\]|\\.)*"/);_x000D_
    if (m != null)_x000D_
        alert(m);
_x000D_
_x000D_
_x000D_


I faced a similar problem trying to remove quoted strings that may interfere with parsing of some files.

I ended up with a two-step solution that beats any convoluted regex you can come up with:

 line = line.replace("\\\"","\'"); // Replace escaped quotes with something easier to handle
 line = line.replaceAll("\"([^\"]*)\"","\"x\""); // Simple is beautiful

Easier to read and probably more efficient.


Most of the solutions provided here use alternative repetition paths i.e. (A|B)*.

You may encounter stack overflows on large inputs since some pattern compiler implements this using recursion.

Java for instance: http://bugs.java.com/bugdatabase/view_bug.do?bug_id=6337993

Something like this: "(?:[^"\\]*(?:\\.)?)*", or the one provided by Guy Bedford will reduce the amount of parsing steps avoiding most stack overflows.


/"(?:[^"\\]|\\.)*"/

Works in The Regex Coach and PCRE Workbench.

Example of test in JavaScript:

_x000D_
_x000D_
    var s = ' function(){ return " Is big \\"problem\\", \\no? "; }';_x000D_
    var m = s.match(/"(?:[^"\\]|\\.)*"/);_x000D_
    if (m != null)_x000D_
        alert(m);
_x000D_
_x000D_
_x000D_


This one works perfect on PCRE and does not fall with StackOverflow.

"(.*?[^\\])??((\\\\)+)?+"

Explanation:

  1. Every quoted string starts with Char: " ;
  2. It may contain any number of any characters: .*? {Lazy match}; ending with non escape character [^\\];
  3. Statement (2) is Lazy(!) optional because string can be empty(""). So: (.*?[^\\])??
  4. Finally, every quoted string ends with Char("), but it can be preceded with even number of escape sign pairs (\\\\)+; and it is Greedy(!) optional: ((\\\\)+)?+ {Greedy matching}, bacause string can be empty or without ending pairs!

/"(?:[^"\\]++|\\.)*+"/

Taken straight from man perlre on a Linux system with Perl 5.22.0 installed. As an optimization, this regex uses the 'posessive' form of both + and * to prevent backtracking, for it is known beforehand that a string without a closing quote wouldn't match in any case.


One has to remember that regexps aren't a silver bullet for everything string-y. Some stuff are simpler to do with a cursor and linear, manual, seeking. A CFL would do the trick pretty trivially, but there aren't many CFL implementations (afaik).


Most of the solutions provided here use alternative repetition paths i.e. (A|B)*.

You may encounter stack overflows on large inputs since some pattern compiler implements this using recursion.

Java for instance: http://bugs.java.com/bugdatabase/view_bug.do?bug_id=6337993

Something like this: "(?:[^"\\]*(?:\\.)?)*", or the one provided by Guy Bedford will reduce the amount of parsing steps avoiding most stack overflows.


An option that has not been touched on before is:

  1. Reverse the string.
  2. Perform the matching on the reversed string.
  3. Re-reverse the matched strings.

This has the added bonus of being able to correctly match escaped open tags.

Lets say you had the following string; String \"this "should" NOT match\" and "this \"should\" match" Here, \"this "should" NOT match\" should not be matched and "should" should be. On top of that this \"should\" match should be matched and \"should\" should not.

First an example.

// The input string.
const myString = 'String \\"this "should" NOT match\\" and "this \\"should\\" match"';

// The RegExp.
const regExp = new RegExp(
    // Match close
    '([\'"])(?!(?:[\\\\]{2})*[\\\\](?![\\\\]))' +
    '((?:' +
        // Match escaped close quote
        '(?:\\1(?=(?:[\\\\]{2})*[\\\\](?![\\\\])))|' +
        // Match everything thats not the close quote
        '(?:(?!\\1).)' +
    '){0,})' +
    // Match open
    '(\\1)(?!(?:[\\\\]{2})*[\\\\](?![\\\\]))',
    'g'
);

// Reverse the matched strings.
matches = myString
    // Reverse the string.
    .split('').reverse().join('')
    // '"hctam "\dluohs"\ siht" dna "\hctam TON "dluohs" siht"\ gnirtS'

    // Match the quoted
    .match(regExp)
    // ['"hctam "\dluohs"\ siht"', '"dluohs"']

    // Reverse the matches
    .map(x => x.split('').reverse().join(''))
    // ['"this \"should\" match"', '"should"']

    // Re order the matches
    .reverse();
    // ['"should"', '"this \"should\" match"']

Okay, now to explain the RegExp. This is the regexp can be easily broken into three pieces. As follows:

# Part 1
(['"])         # Match a closing quotation mark " or '
(?!            # As long as it's not followed by
  (?:[\\]{2})* # A pair of escape characters
  [\\]         # and a single escape
  (?![\\])     # As long as that's not followed by an escape
)
# Part 2
((?:          # Match inside the quotes
(?:           # Match option 1:
  \1          # Match the closing quote
  (?=         # As long as it's followed by
    (?:\\\\)* # A pair of escape characters
    \\        # 
    (?![\\])  # As long as that's not followed by an escape
  )           # and a single escape
)|            # OR
(?:           # Match option 2:
  (?!\1).     # Any character that isn't the closing quote
)
)*)           # Match the group 0 or more times
# Part 3
(\1)           # Match an open quotation mark that is the same as the closing one
(?!            # As long as it's not followed by
  (?:[\\]{2})* # A pair of escape characters
  [\\]         # and a single escape
  (?![\\])     # As long as that's not followed by an escape
)

This is probably a lot clearer in image form: generated using Jex's Regulex

Image on github (JavaScript Regular Expression Visualizer.) Sorry, I don't have a high enough reputation to include images, so, it's just a link for now.

Here is a gist of an example function using this concept that's a little more advanced: https://gist.github.com/scagood/bd99371c072d49a4fee29d193252f5fc#file-matchquotes-js


/(["\']).*?(?<!\\)(\\\\)*\1/is

should work with any quoted string


If it is searched from the beginning, maybe this can work?

\"((\\\")|[^\\])*\"

This one comes from nanorc.sample available in many linux distros. It is used for syntax highlighting of C style strings

\"(\\.|[^\"])*\"

A more extensive version of https://stackoverflow.com/a/10786066/1794894

/"([^"\\]{50,}(\\.[^"\\]*)*)"|\'[^\'\\]{50,}(\\.[^\'\\]*)*\'|“[^”\\]{50,}(\\.[^“\\]*)*”/   

This version also contains

  1. Minimum quote length of 50
  2. Extra type of quotes (open and close )

I faced a similar problem trying to remove quoted strings that may interfere with parsing of some files.

I ended up with a two-step solution that beats any convoluted regex you can come up with:

 line = line.replace("\\\"","\'"); // Replace escaped quotes with something easier to handle
 line = line.replaceAll("\"([^\"]*)\"","\"x\""); // Simple is beautiful

Easier to read and probably more efficient.


This one works perfect on PCRE and does not fall with StackOverflow.

"(.*?[^\\])??((\\\\)+)?+"

Explanation:

  1. Every quoted string starts with Char: " ;
  2. It may contain any number of any characters: .*? {Lazy match}; ending with non escape character [^\\];
  3. Statement (2) is Lazy(!) optional because string can be empty(""). So: (.*?[^\\])??
  4. Finally, every quoted string ends with Char("), but it can be preceded with even number of escape sign pairs (\\\\)+; and it is Greedy(!) optional: ((\\\\)+)?+ {Greedy matching}, bacause string can be empty or without ending pairs!

An option that has not been touched on before is:

  1. Reverse the string.
  2. Perform the matching on the reversed string.
  3. Re-reverse the matched strings.

This has the added bonus of being able to correctly match escaped open tags.

Lets say you had the following string; String \"this "should" NOT match\" and "this \"should\" match" Here, \"this "should" NOT match\" should not be matched and "should" should be. On top of that this \"should\" match should be matched and \"should\" should not.

First an example.

// The input string.
const myString = 'String \\"this "should" NOT match\\" and "this \\"should\\" match"';

// The RegExp.
const regExp = new RegExp(
    // Match close
    '([\'"])(?!(?:[\\\\]{2})*[\\\\](?![\\\\]))' +
    '((?:' +
        // Match escaped close quote
        '(?:\\1(?=(?:[\\\\]{2})*[\\\\](?![\\\\])))|' +
        // Match everything thats not the close quote
        '(?:(?!\\1).)' +
    '){0,})' +
    // Match open
    '(\\1)(?!(?:[\\\\]{2})*[\\\\](?![\\\\]))',
    'g'
);

// Reverse the matched strings.
matches = myString
    // Reverse the string.
    .split('').reverse().join('')
    // '"hctam "\dluohs"\ siht" dna "\hctam TON "dluohs" siht"\ gnirtS'

    // Match the quoted
    .match(regExp)
    // ['"hctam "\dluohs"\ siht"', '"dluohs"']

    // Reverse the matches
    .map(x => x.split('').reverse().join(''))
    // ['"this \"should\" match"', '"should"']

    // Re order the matches
    .reverse();
    // ['"should"', '"this \"should\" match"']

Okay, now to explain the RegExp. This is the regexp can be easily broken into three pieces. As follows:

# Part 1
(['"])         # Match a closing quotation mark " or '
(?!            # As long as it's not followed by
  (?:[\\]{2})* # A pair of escape characters
  [\\]         # and a single escape
  (?![\\])     # As long as that's not followed by an escape
)
# Part 2
((?:          # Match inside the quotes
(?:           # Match option 1:
  \1          # Match the closing quote
  (?=         # As long as it's followed by
    (?:\\\\)* # A pair of escape characters
    \\        # 
    (?![\\])  # As long as that's not followed by an escape
  )           # and a single escape
)|            # OR
(?:           # Match option 2:
  (?!\1).     # Any character that isn't the closing quote
)
)*)           # Match the group 0 or more times
# Part 3
(\1)           # Match an open quotation mark that is the same as the closing one
(?!            # As long as it's not followed by
  (?:[\\]{2})* # A pair of escape characters
  [\\]         # and a single escape
  (?![\\])     # As long as that's not followed by an escape
)

This is probably a lot clearer in image form: generated using Jex's Regulex

Image on github (JavaScript Regular Expression Visualizer.) Sorry, I don't have a high enough reputation to include images, so, it's just a link for now.

Here is a gist of an example function using this concept that's a little more advanced: https://gist.github.com/scagood/bd99371c072d49a4fee29d193252f5fc#file-matchquotes-js


"(?:\\"|.)*?"

Alternating the \" and the . passes over escaped quotes while the lazy quantifier *? ensures that you don't go past the end of the quoted string. Works with .NET Framework RE classes


/(["\']).*?(?<!\\)(\\\\)*\1/is

should work with any quoted string


This one comes from nanorc.sample available in many linux distros. It is used for syntax highlighting of C style strings

\"(\\.|[^\"])*\"

Messed around at regexpal and ended up with this regex: (Don't ask me how it works, I barely understand even tho I wrote it lol)

"(([^"\\]?(\\\\)?)|(\\")+)+"

A more extensive version of https://stackoverflow.com/a/10786066/1794894

/"([^"\\]{50,}(\\.[^"\\]*)*)"|\'[^\'\\]{50,}(\\.[^\'\\]*)*\'|“[^”\\]{50,}(\\.[^“\\]*)*”/   

This version also contains

  1. Minimum quote length of 50
  2. Extra type of quotes (open and close )

One has to remember that regexps aren't a silver bullet for everything string-y. Some stuff are simpler to do with a cursor and linear, manual, seeking. A CFL would do the trick pretty trivially, but there aren't many CFL implementations (afaik).


here is one that work with both " and ' and you easily add others at the start.

("|')(?:\\\1|[^\1])*?\1

it uses the backreference (\1) match exactley what is in the first group (" or ').

http://www.regular-expressions.info/backref.html


/(["\']).*?(?<!\\)(\\\\)*\1/is

should work with any quoted string


As provided by ePharaoh, the answer is

/"([^"\\]*(\\.[^"\\]*)*)"/

To have the above apply to either single quoted or double quoted strings, use

/"([^"\\]*(\\.[^"\\]*)*)"|\'([^\'\\]*(\\.[^\'\\]*)*)\'/

/"(?:[^"\\]|\\.)*"/

Works in The Regex Coach and PCRE Workbench.

Example of test in JavaScript:

_x000D_
_x000D_
    var s = ' function(){ return " Is big \\"problem\\", \\no? "; }';_x000D_
    var m = s.match(/"(?:[^"\\]|\\.)*"/);_x000D_
    if (m != null)_x000D_
        alert(m);
_x000D_
_x000D_
_x000D_


Examples related to regex

Why my regexp for hyphenated words doesn't work? grep's at sign caught as whitespace Preg_match backtrack error regex match any single character (one character only) re.sub erroring with "Expected string or bytes-like object" Only numbers. Input number in React Visual Studio Code Search and Replace with Regular Expressions Strip / trim all strings of a dataframe return string with first match Regex How to capture multiple repeated groups?

Examples related to escaping

Uses for the '&quot;' entity in HTML Javascript - How to show escape characters in a string? How to print a single backslash? How to escape special characters of a string with single backslashes Saving utf-8 texts with json.dumps as UTF8, not as \u escape sequence Properly escape a double quote in CSV How to Git stash pop specific stash in 1.8.3? In Java, should I escape a single quotation mark (') in String (double quoted)? How do I escape a single quote ( ' ) in JavaScript? Which characters need to be escaped when using Bash?

Examples related to quotes

YAML: Do I need quotes for strings in YAML? Regex to split a CSV Expansion of variables inside single quotes in a command in Bash How to call execl() in C with the proper arguments? Insert text with single quotes in PostgreSQL When to use single quotes, double quotes, and backticks in MySQL Difference between single and double quotes in Bash Remove quotes from a character vector in R How do I remove quotes from a string? How can I escape a double quote inside double quotes?