[regex] Match linebreaks - \n or \r\n?

While writing this answer, I had to match exclusively on linebreaks instead of using the s-flag (dotall - dot matches linebreaks).

The sites usually used to test regular expressions behave differently when trying to match on \n or \r\n.

I noticed

  • Regex101 matches linebreaks only on \n
    (example - delete \r and it matches)

  • RegExr matches linebreaks neither on \n nor on \r\n
    and I can't find something to make it match a linebreak, except for the m-flag and \s
    (example)

  • Debuggex behaves even more different:
    in this example it matches only on \r\n, while
    here it only matches on \n, with the same flags and engine specified

I'm fully aware of the m-flag (multiline - makes ^ match the start and $ the end of a line), but sometimes this is not an option. Same with \s, as it matches tabs and spaces, too.

My thought to use the unicode newline character (\u0085) wasn't successful, so:

  1. Is there a failsafe way to integrate the match on a linebreak (preferably regardless of the language used) into a regular expression?
  2. Why do the above mentioned sites behave differently (especially Debuggex, matching once only on \n and once only on \r\n)?

This question is related to regex language-agnostic line-breaks

The answer is


This only applies to question 1.

I have an app that runs on Windows and uses a multi-line MFC editor box.
The editor box expects CRLF linebreaks, but I need to parse the text enterred
with some really big/nasty regexs'.

I didn't want to be stressing about this while writing the regex, so
I ended up normalizing back and forth between the parser and editor so that
the regexs' just use \n. I also trap paste operations and convert them for the boxes.

This does not take much time.
This is what I use.

 boost::regex  CRLFCRtoLF (
     " \\r\\n | \\r(?!\\n) "
     , MODx);

 boost::regex  CRLFCRtoCRLF (
     " \\r\\n?+ | \\n "
     , MODx);


 // Convert (All style) linebreaks to linefeeds 
 // ---------------------------------------
 void ReplaceCRLFCRtoLF( string& strSrc, string& strDest )
 {
    strDest  = boost::regex_replace ( strSrc, CRLFCRtoLF, "\\n" );
 }

 // Convert linefeeds to linebreaks (Windows) 
 // ---------------------------------------
 void ReplaceCRLFCRtoCRLF( string& strSrc, string& strDest )
 {
    strDest  = boost::regex_replace ( strSrc, CRLFCRtoCRLF, "\\r\\n" );
 }

In PCRE \R matches \n, \r and \r\n.


You have different line endings in the example texts in Debuggex. What is especially interesting is that Debuggex seems to have identified which line ending style you used first, and it converts all additional line endings entered to that style.

I used Notepad++ to paste sample text in Unix and Windows format into Debuggex, and whichever I pasted first is what that session of Debuggex stuck with.

So, you should wash your text through your text editor before pasting it into Debuggex. Ensure that you're pasting the style you want. Debuggex defaults to Unix style (\n).

Also, NEL (\u0085) is something different entirely: https://en.wikipedia.org/wiki/Newline#Unicode

(\r?\n) will cover Unix and Windows. You'll need something more complex, like (\r\n|\r|\n), if you want to match old Mac too.


In Python:

# as Peter van der Wal's answer
re.split(r'\r\n|\r|\n', text, flags=re.M) 

or more rigorous:

# https://docs.python.org/3/library/stdtypes.html#str.splitlines
str.splitlines()

Examples related to regex

Why my regexp for hyphenated words doesn't work? grep's at sign caught as whitespace Preg_match backtrack error regex match any single character (one character only) re.sub erroring with "Expected string or bytes-like object" Only numbers. Input number in React Visual Studio Code Search and Replace with Regular Expressions Strip / trim all strings of a dataframe return string with first match Regex How to capture multiple repeated groups?

Examples related to language-agnostic

IOException: The process cannot access the file 'file path' because it is being used by another process Peak signal detection in realtime timeseries data Match linebreaks - \n or \r\n? Simple way to understand Encapsulation and Abstraction How can I pair socks from a pile efficiently? How do I determine whether my calculation of pi is accurate? What is ADT? (Abstract Data Type) How to explain callbacks in plain english? How are they different from calling one function from another function? Ukkonen's suffix tree algorithm in plain English Private vs Protected - Visibility Good-Practice Concern

Examples related to line-breaks

HTML5 tag for horizontal line break Split string in JavaScript and detect line break Match linebreaks - \n or \r\n? How to linebreak an svg text within javascript? What is the difference between a "line feed" and a "carriage return"? Bash: Strip trailing linebreak from output How to read a file without newlines? How to break lines in PowerShell? How do I specify new lines on Python, when writing on files? How to remove line breaks (no characters!) from the string?