[python] How to split a python string on new line characters

In python3 in Win7 I read a web page into a string.

I then want to split the string into a list at newline characters.

I can't enter the newline into my code as the argument in split(), because I get a syntax error 'EOL while scanning string literal'

If I type in the characters \ and n, I get a Unicode error.

Is there any way to do it?

This question is related to python string split

The answer is


? Splitting line in Python:

Have you tried using str.splitlines() method?:

From the docs:

str.splitlines([keepends])

Return a list of the lines in the string, breaking at line boundaries. Line breaks are not included in the resulting list unless keepends is given and true.

For example:

>>> 'Line 1\n\nLine 3\rLine 4\r\n'.splitlines()
['Line 1', '', 'Line 3', 'Line 4']

>>> 'Line 1\n\nLine 3\rLine 4\r\n'.splitlines(True)
['Line 1\n', '\n', 'Line 3\r', 'Line 4\r\n']

Which delimiters are considered?

This method uses the universal newlines approach to splitting lines.

The main difference between Python 2.X and Python 3.X is that the former uses the universal newlines approach to splitting lines, so "\r", "\n", and "\r\n" are considered line boundaries for 8-bit strings, while the latter uses a superset of it that also includes:

  • \v or \x0b: Line Tabulation (added in Python 3.2).
  • \f or \x0c: Form Feed (added in Python 3.2).
  • \x1c: File Separator.
  • \x1d: Group Separator.
  • \x1e: Record Separator.
  • \x85: Next Line (C1 Control Code).
  • \u2028: Line Separator.
  • \u2029: Paragraph Separator.

splitlines VS split:

Unlike str.split() when a delimiter string sep is given, this method returns an empty list for the empty string, and a terminal line break does not result in an extra line:

>>> ''.splitlines()
[]

>>> 'Line 1\n'.splitlines()
['Line 1']

While str.split('\n') returns:

>>> ''.split('\n')
['']

>>> 'Line 1\n'.split('\n')
['Line 1', '']

?? Removing additional whitespace:

If you also need to remove additional leading or trailing whitespace, like spaces, that are ignored by str.splitlines(), you could use str.splitlines() together with str.strip():

>>> [str.strip() for str in 'Line 1  \n  \nLine 3 \rLine 4 \r\n'.splitlines()]
['Line 1', '', 'Line 3', 'Line 4']

? Removing empty strings (''):

Lastly, if you want to filter out the empty strings from the resulting list, you could use filter():

>>> # Python 2.X:
>>> filter(bool, 'Line 1\n\nLine 3\rLine 4\r\n'.splitlines())
['Line 1', 'Line 3', 'Line 4']

>>> # Python 3.X:
>>> list(filter(bool, 'Line 1\n\nLine 3\rLine 4\r\n'.splitlines()))
['Line 1', 'Line 3', 'Line 4']

Additional comment regarding the original question:

As the error you posted indicates and Burhan suggested, the problem is from the print. There's a related question about that could be useful to you: UnicodeEncodeError: 'charmap' codec can't encode - character maps to <undefined>, print function


a.txt

this is line 1
this is line 2

code:

Python 3.4.0 (default, Mar 20 2014, 22:43:40) 
[GCC 4.6.3] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> file = open('a.txt').read()
>>> file
>>> file.split('\n')
['this is line 1', 'this is line 2', '']

I'm on Linux, but I guess you just use \r\n on Windows and it would also work


Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to string

How to split a string in two and store it in a field String method cannot be found in a main class method Kotlin - How to correctly concatenate a String Replacing a character from a certain index Remove quotes from String in Python Detect whether a Python string is a number or a letter How does String substring work in Swift How does String.Index work in Swift swift 3.0 Data to String? How to parse JSON string in Typescript

Examples related to split

Parameter "stratify" from method "train_test_split" (scikit Learn) Pandas split DataFrame by column value How to split large text file in windows? Attribute Error: 'list' object has no attribute 'split' Split function in oracle to comma separated values with automatic sequence How would I get everything before a : in a string Python Split String by delimiter position using oracle SQL JavaScript split String with white space Split a String into an array in Swift? Split pandas dataframe in two if it has more than 10 rows