[python] sscanf in Python

I'm looking for an equivalent to sscanf() in Python. I want to parse /proc/net/* files, in C I could do something like this:

int matches = sscanf(
        buffer,
        "%*d: %64[0-9A-Fa-f]:%X %64[0-9A-Fa-f]:%X %*X %*X:%*X %*X:%*X %*X %*d %*d %ld %*512s\n",
        local_addr, &local_port, rem_addr, &rem_port, &inode);

I thought at first to use str.split, however it doesn't split on the given characters, but the sep string as a whole:

>>> lines = open("/proc/net/dev").readlines()
>>> for l in lines[2:]:
>>>     cols = l.split(string.whitespace + ":")
>>>     print len(cols)
1

Which should be returning 17, as explained above.

Is there a Python equivalent to sscanf (not RE), or a string splitting function in the standard library that splits on any of a range of characters that I'm not aware of?

This question is related to python parsing split scanf procfs

The answer is


You could install pandas and use pandas.read_fwf for fixed width format files. Example using /proc/net/arp:

In [230]: df = pandas.read_fwf("/proc/net/arp")

In [231]: print(df)
       IP address HW type Flags         HW address Mask Device
0   141.38.28.115     0x1   0x2  84:2b:2b:ad:e1:f4    *   eth0
1   141.38.28.203     0x1   0x2  c4:34:6b:5b:e4:7d    *   eth0
2   141.38.28.140     0x1   0x2  00:19:99:ce:00:19    *   eth0
3   141.38.28.202     0x1   0x2  90:1b:0e:14:a1:e3    *   eth0
4    141.38.28.17     0x1   0x2  90:1b:0e:1a:4b:41    *   eth0
5    141.38.28.60     0x1   0x2  00:19:99:cc:aa:58    *   eth0
6   141.38.28.233     0x1   0x2  90:1b:0e:8d:7a:c9    *   eth0
7    141.38.28.55     0x1   0x2  00:19:99:cc:ab:00    *   eth0
8   141.38.28.224     0x1   0x2  90:1b:0e:8d:7a:e2    *   eth0
9   141.38.28.148     0x1   0x0  4c:52:62:a8:08:2c    *   eth0
10  141.38.28.179     0x1   0x2  90:1b:0e:1a:4b:50    *   eth0

In [232]: df["HW address"]
Out[232]:
0     84:2b:2b:ad:e1:f4
1     c4:34:6b:5b:e4:7d
2     00:19:99:ce:00:19
3     90:1b:0e:14:a1:e3
4     90:1b:0e:1a:4b:41
5     00:19:99:cc:aa:58
6     90:1b:0e:8d:7a:c9
7     00:19:99:cc:ab:00
8     90:1b:0e:8d:7a:e2
9     4c:52:62:a8:08:2c
10    90:1b:0e:1a:4b:50

In [233]: df["HW address"][5]
Out[233]: '00:19:99:cc:aa:58'

By default it tries to figure out the format automagically, but there are options you can give for more explicit instructions (see documentation). There are also other IO routines in pandas that are powerful for other file formats.


If the separators are ':', you can split on ':', and then use x.strip() on the strings to get rid of any leading or trailing whitespace. int() will ignore the spaces.


There is an ActiveState recipe which implements a basic scanf http://code.activestate.com/recipes/502213-simple-scanf-implementation/


Upvoted orip's answer. I think it is sound advice to use re module. The Kodos application is helpful when approaching a complex regexp task with Python.

http://kodos.sourceforge.net/home.html



When I'm in a C mood, I usually use zip and list comprehensions for scanf-like behavior. Like this:

input = '1 3.0 false hello'
(a, b, c, d) = [t(s) for t,s in zip((int,float,strtobool,str),input.split())]
print (a, b, c, d)

Note that for more complex format strings, you do need to use regular expressions:

import re
input = '1:3.0 false,hello'
(a, b, c, d) = [t(s) for t,s in zip((int,float,strtobool,str),re.search('^(\d+):([\d.]+) (\w+),(\w+)$',input).groups())]
print (a, b, c, d)

Note also that you need conversion functions for all types you want to convert. For example, above I used something like:

strtobool = lambda s: {'true': True, 'false': False}[s]

you can turn the ":" to space, and do the split.eg

>>> f=open("/proc/net/dev")
>>> for line in f:
...     line=line.replace(":"," ").split()
...     print len(line)

no regex needed (for this case)


There is also the parse module.

parse() is designed to be the opposite of format() (the newer string formatting function in Python 2.6 and higher).

>>> from parse import parse
>>> parse('{} fish', '1')
>>> parse('{} fish', '1 fish')
<Result ('1',) {}>
>>> parse('{} fish', '2 fish')
<Result ('2',) {}>
>>> parse('{} fish', 'red fish')
<Result ('red',) {}>
>>> parse('{} fish', 'blue fish')
<Result ('blue',) {}>

You can parse with module re using named groups. It won't parse the substrings to their actual datatypes (e.g. int) but it's very convenient when parsing strings.

Given this sample line from /proc/net/tcp:

line="   0: 00000000:0203 00000000:0000 0A 00000000:00000000 00:00000000 00000000     0        0 335 1 c1674320 300 0 0 0"

An example mimicking your sscanf example with the variable could be:

import re
hex_digit_pattern = r"[\dA-Fa-f]"
pat = r"\d+: " + \
      r"(?P<local_addr>HEX+):(?P<local_port>HEX+) " + \
      r"(?P<rem_addr>HEX+):(?P<rem_port>HEX+) " + \
      r"HEX+ HEX+:HEX+ HEX+:HEX+ HEX+ +\d+ +\d+ " + \
      r"(?P<inode>\d+)"
pat = pat.replace("HEX", hex_digit_pattern)

values = re.search(pat, line).groupdict()

import pprint; pprint values
# prints:
# {'inode': '335',
#  'local_addr': '00000000',
#  'local_port': '0203',
#  'rem_addr': '00000000',
#  'rem_port': '0000'}

Update: The Python documentation for its regex module, re, includes a section on simulating scanf, which I found more useful than any of the answers above.

https://docs.python.org/2/library/re.html#simulating-scanf


There is an example in the official python docs about how to use sscanf from libc:

    # import libc
    from ctypes import CDLL
    if(os.name=="nt"):
        libc = cdll.msvcrt 
    else:
        # assuming Unix-like environment
        libc = cdll.LoadLibrary("libc.so.6")
        libc = CDLL("libc.so.6")  # alternative

    # allocate vars
    i = c_int()
    f = c_float()
    s = create_string_buffer(b'\000' * 32)

    # parse with sscanf
    libc.sscanf(b"1 3.14 Hello", "%d %f %s", byref(i), byref(f), s)

    # read the parsed values
    i.value  # 1
    f.value  # 3.14
    s.value # b'Hello'

You can split on a range of characters using the re module.

>>> import re
>>> r = re.compile('[ \t\n\r:]+')
>>> r.split("abc:def  ghi")
['abc', 'def', 'ghi']

Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to parsing

Got a NumberFormatException while trying to parse a text file for objects Uncaught SyntaxError: Unexpected end of JSON input at JSON.parse (<anonymous>) Python/Json:Expecting property name enclosed in double quotes Correctly Parsing JSON in Swift 3 How to get response as String using retrofit without using GSON or any other library in android UIButton action in table view cell "Expected BEGIN_OBJECT but was STRING at line 1 column 1" How to convert an XML file to nice pandas dataframe? How to extract multiple JSON objects from one file? How to sum digits of an integer in java?

Examples related to split

Parameter "stratify" from method "train_test_split" (scikit Learn) Pandas split DataFrame by column value How to split large text file in windows? Attribute Error: 'list' object has no attribute 'split' Split function in oracle to comma separated values with automatic sequence How would I get everything before a : in a string Python Split String by delimiter position using oracle SQL JavaScript split String with white space Split a String into an array in Swift? Split pandas dataframe in two if it has more than 10 rows

Examples related to scanf

How to scanf only integer? Reading numbers from a text file into an array in C How can I read an input string of unknown length? Reading in double values with scanf in c How to do scanf for single char in C printf not printing on console How to find EOF through fscanf? How to read numbers separated by space using scanf Going through a text file line by line in C What is the format specifier for unsigned short int?

Examples related to procfs

sscanf in Python