[python] How do I split a multi-line string into multiple lines?

I have a multi-line string literal that I want to do an operation on each line, like so:

inputString = """Line 1
Line 2
Line 3"""

I want to do something like the following:

for line in inputString:
    doStuff()

This question is related to python string

The answer is


Use str.splitlines().

splitlines() handles newlines properly, unlike split("\n").

It also has the the advantage mentioned by @efotinis of optionally including the newline character in the split result when called with a True argument.


Why you shouldn't use split("\n"):

\n, in Python, represents a Unix line-break (ASCII decimal code 10), independently from the platform where you run it. However, the linebreak representation is platform-dependent. On Windows, \n is two characters, CR and LF (ASCII decimal codes 13 and 10, AKA \r and \n), while on any modern Unix (including OS X), it's the single character LF.

print, for example, works correctly even if you have a string with line endings that don't match your platform:

>>> print " a \n b \r\n c "
 a 
 b 
 c

However, explicitly splitting on "\n", will yield platform-dependent behaviour:

>>> " a \n b \r\n c ".split("\n")
[' a ', ' b \r', ' c ']

Even if you use os.linesep, it will only split according to the newline separator on your platform, and will fail if you're processing text created in other platforms, or with a bare \n:

>>> " a \n b \r\n c ".split(os.linesep)
[' a \n b ', ' c ']

splitlines solves all these problems:

>>> " a \n b \r\n c ".splitlines()
[' a ', ' b ', ' c ']

Reading files in text mode partially mitigates the newline representation problem, as it converts Python's \n into the platform's newline representation. However, text mode only exists on Windows. On Unix systems, all files are opened in binary mode, so using split('\n') in a UNIX system with a Windows file will lead to undesired behavior. Also, it's not unusual to process strings with potentially different newlines from other sources, such as from a socket.


I wish comments had proper code text formatting, because I think @1_CR 's answer needs more bumps, and I would like to augment his answer. Anyway, He led me to the following technique; it will use cStringIO if available (BUT NOTE: cStringIO and StringIO are not the same, because you cannot subclass cStringIO... it is a built-in... but for basic operations the syntax will be identical, so you can do this):

try:
    import cStringIO
    StringIO = cStringIO
except ImportError:
    import StringIO

for line in StringIO.StringIO(variable_with_multiline_string):
    pass
print line.strip()

Use str.splitlines().

splitlines() handles newlines properly, unlike split("\n").

It also has the the advantage mentioned by @efotinis of optionally including the newline character in the split result when called with a True argument.


Why you shouldn't use split("\n"):

\n, in Python, represents a Unix line-break (ASCII decimal code 10), independently from the platform where you run it. However, the linebreak representation is platform-dependent. On Windows, \n is two characters, CR and LF (ASCII decimal codes 13 and 10, AKA \r and \n), while on any modern Unix (including OS X), it's the single character LF.

print, for example, works correctly even if you have a string with line endings that don't match your platform:

>>> print " a \n b \r\n c "
 a 
 b 
 c

However, explicitly splitting on "\n", will yield platform-dependent behaviour:

>>> " a \n b \r\n c ".split("\n")
[' a ', ' b \r', ' c ']

Even if you use os.linesep, it will only split according to the newline separator on your platform, and will fail if you're processing text created in other platforms, or with a bare \n:

>>> " a \n b \r\n c ".split(os.linesep)
[' a \n b ', ' c ']

splitlines solves all these problems:

>>> " a \n b \r\n c ".splitlines()
[' a ', ' b ', ' c ']

Reading files in text mode partially mitigates the newline representation problem, as it converts Python's \n into the platform's newline representation. However, text mode only exists on Windows. On Unix systems, all files are opened in binary mode, so using split('\n') in a UNIX system with a Windows file will lead to undesired behavior. Also, it's not unusual to process strings with potentially different newlines from other sources, such as from a socket.


The original post requested for code which prints some rows (if they are true for some condition) plus the following row. My implementation would be this:

text = """1 sfasdf
asdfasdf
2 sfasdf
asdfgadfg
1 asfasdf
sdfasdgf
"""

text = text.splitlines()
rows_to_print = {}

for line in range(len(text)):
    if text[line][0] == '1':
        rows_to_print = rows_to_print | {line, line + 1}

rows_to_print = sorted(list(rows_to_print))

for i in rows_to_print:
    print(text[i])

Like the others said:

inputString.split('\n')  # --> ['Line 1', 'Line 2', 'Line 3']

This is identical to the above, but the string module's functions are deprecated and should be avoided:

import string
string.split(inputString, '\n')  # --> ['Line 1', 'Line 2', 'Line 3']

Alternatively, if you want each line to include the break sequence (CR,LF,CRLF), use the splitlines method with a True argument:

inputString.splitlines(True)  # --> ['Line 1\n', 'Line 2\n', 'Line 3']

Like the others said:

inputString.split('\n')  # --> ['Line 1', 'Line 2', 'Line 3']

This is identical to the above, but the string module's functions are deprecated and should be avoided:

import string
string.split(inputString, '\n')  # --> ['Line 1', 'Line 2', 'Line 3']

Alternatively, if you want each line to include the break sequence (CR,LF,CRLF), use the splitlines method with a True argument:

inputString.splitlines(True)  # --> ['Line 1\n', 'Line 2\n', 'Line 3']

I wish comments had proper code text formatting, because I think @1_CR 's answer needs more bumps, and I would like to augment his answer. Anyway, He led me to the following technique; it will use cStringIO if available (BUT NOTE: cStringIO and StringIO are not the same, because you cannot subclass cStringIO... it is a built-in... but for basic operations the syntax will be identical, so you can do this):

try:
    import cStringIO
    StringIO = cStringIO
except ImportError:
    import StringIO

for line in StringIO.StringIO(variable_with_multiline_string):
    pass
print line.strip()

The original post requested for code which prints some rows (if they are true for some condition) plus the following row. My implementation would be this:

text = """1 sfasdf
asdfasdf
2 sfasdf
asdfgadfg
1 asfasdf
sdfasdgf
"""

text = text.splitlines()
rows_to_print = {}

for line in range(len(text)):
    if text[line][0] == '1':
        rows_to_print = rows_to_print | {line, line + 1}

rows_to_print = sorted(list(rows_to_print))

for i in rows_to_print:
    print(text[i])

Like the others said:

inputString.split('\n')  # --> ['Line 1', 'Line 2', 'Line 3']

This is identical to the above, but the string module's functions are deprecated and should be avoided:

import string
string.split(inputString, '\n')  # --> ['Line 1', 'Line 2', 'Line 3']

Alternatively, if you want each line to include the break sequence (CR,LF,CRLF), use the splitlines method with a True argument:

inputString.splitlines(True)  # --> ['Line 1\n', 'Line 2\n', 'Line 3']

Might be overkill in this particular case but another option involves using StringIO to create a file-like object

for line in StringIO.StringIO(inputString):
    doStuff()

Like the others said:

inputString.split('\n')  # --> ['Line 1', 'Line 2', 'Line 3']

This is identical to the above, but the string module's functions are deprecated and should be avoided:

import string
string.split(inputString, '\n')  # --> ['Line 1', 'Line 2', 'Line 3']

Alternatively, if you want each line to include the break sequence (CR,LF,CRLF), use the splitlines method with a True argument:

inputString.splitlines(True)  # --> ['Line 1\n', 'Line 2\n', 'Line 3']