sscanf in Python

Question

I m looking for an equivalent to sscanf   in Python  I want to parse  proc net   files  in C I could do something like this   int matches   sscanf          buffer             d   64 0-9A-Fa-f   X  64 0-9A-Fa-f   X   X   X   X   X   X   X   d   d  ld   512s n           local addr   amp local port  rem addr   amp rem port   amp inode     I thought at first to use str split  however it doesn t split on the given characters  but the sep string as a whole    gt  gt  gt  lines   open   proc net dev   readlines    gt  gt  gt  for l in lines 2     gt  gt  gt      cols   l split string whitespace         gt  gt  gt      print len cols  1   Which should be returning 17  as explained above   Is there a Python equivalent to sscanf  not RE   or a string splitting function in the standard library that splits on any of a range of characters that I m not aware of

User · Accepted Answer

Python doesn t have an sscanf equivalent built-in  and most of the time it actually makes a whole lot more sense to parse the input by working with the string directly  using regexps  or using a parsing tool    Probably mostly useful for translating C  people have implemented sscanf  such as in this module  http   hkn eecs berkeley edu  dyoo python scanf   In this particular case if you just want to split the data based on multiple split characters  re split is really the right tool

User · Answer

Update  The Python documentation for its regex module  re  includes a section on simulating scanf  which I found more useful than any of the answers above   https   docs python org 2 library re html simulating-scanf

User · Answer

Upvoted orip s answer  I think it is sound advice to use re module  The Kodos application is helpful when approaching a complex regexp task with Python    http   kodos sourceforge net home html

User · Answer

You could install pandas and use pandas read fwf for fixed width format files   Example using  proc net arp   In  230   df   pandas read fwf   proc net arp    In  231   print df         IP address HW type Flags         HW address Mask Device 0   141 38 28 115     0x1   0x2  84 2b 2b ad e1 f4        eth0 1   141 38 28 203     0x1   0x2  c4 34 6b 5b e4 7d        eth0 2   141 38 28 140     0x1   0x2  00 19 99 ce 00 19        eth0 3   141 38 28 202     0x1   0x2  90 1b 0e 14 a1 e3        eth0 4    141 38 28 17     0x1   0x2  90 1b 0e 1a 4b 41        eth0 5    141 38 28 60     0x1   0x2  00 19 99 cc aa 58        eth0 6   141 38 28 233     0x1   0x2  90 1b 0e 8d 7a c9        eth0 7    141 38 28 55     0x1   0x2  00 19 99 cc ab 00        eth0 8   141 38 28 224     0x1   0x2  90 1b 0e 8d 7a e2        eth0 9   141 38 28 148     0x1   0x0  4c 52 62 a8 08 2c        eth0 10  141 38 28 179     0x1   0x2  90 1b 0e 1a 4b 50        eth0  In  232   df  HW address   Out 232   0     84 2b 2b ad e1 f4 1     c4 34 6b 5b e4 7d 2     00 19 99 ce 00 19 3     90 1b 0e 14 a1 e3 4     90 1b 0e 1a 4b 41 5     00 19 99 cc aa 58 6     90 1b 0e 8d 7a c9 7     00 19 99 cc ab 00 8     90 1b 0e 8d 7a e2 9     4c 52 62 a8 08 2c 10    90 1b 0e 1a 4b 50  In  233   df  HW address   5  Out 233    00 19 99 cc aa 58    By default it tries to figure out the format automagically  but there are options you can give for more explicit instructions  see documentation    There are also other IO routines in pandas that are powerful for other file formats

User · Answer

There is an example in the official python docs about how to use sscanf from libc         import libc     from ctypes import CDLL     if os name   nt            libc   cdll msvcrt      else            assuming Unix-like environment         libc   cdll LoadLibrary  libc so 6           libc   CDLL  libc so 6      alternative        allocate vars     i   c int       f   c float       s   create string buffer b  000    32         parse with sscanf     libc sscanf b 1 3 14 Hello     d  f  s   byref i   byref f   s         read the parsed values     i value    1     f value    3 14     s value   b Hello

User · Answer

You can split on a range of characters using the re module    gt  gt  gt  import re  gt  gt  gt  r   re compile     t n r       gt  gt  gt  r split  abc def  ghi     abc    def    ghi

User · Answer

When I m in a C mood  I usually use zip and list comprehensions for scanf-like behavior   Like this   input    1 3 0 false hello   a  b  c  d     t s  for t s in zip  int float strtobool str  input split     print  a  b  c  d    Note that for more complex format strings  you do need to use regular expressions   import re input    1 3 0 false hello   a  b  c  d     t s  for t s in zip  int float strtobool str  re search     d      d       w     w     input  groups     print  a  b  c  d    Note also that you need conversion functions for all types you want to convert  For example  above I used something like   strtobool   lambda s    true   True   false   False  s

User · Answer

you can turn the     to space  and do the split eg   gt  gt  gt  f open   proc net dev    gt  gt  gt  for line in f          line line replace          split           print len line    no regex needed  for this case

User · Answer

There is a Python 2 implementation by odiak

User · Answer

There is an ActiveState recipe which implements a basic scanf http   code activestate com recipes 502213-simple-scanf-implementation

User · Answer

If the separators are      you can split on      and then use x strip   on the strings to get rid of any leading or trailing whitespace  int   will ignore the spaces

User · Answer

There is also the parse module   parse   is designed to be the opposite of format    the newer string formatting function in Python 2 6 and higher     gt  gt  gt  from parse import parse  gt  gt  gt  parse     fish    1    gt  gt  gt  parse     fish    1 fish    lt Result   1       gt   gt  gt  gt  parse     fish    2 fish    lt Result   2       gt   gt  gt  gt  parse     fish    red fish    lt Result   red       gt   gt  gt  gt  parse     fish    blue fish    lt Result   blue       gt

User · Answer

You can parse with module re using named groups  It won t parse the substrings to their actual datatypes  e g  int  but it s very convenient when parsing strings   Given this sample line from  proc net tcp   line     0  00000000 0203 00000000 0000 0A 00000000 00000000 00 00000000 00000000     0        0 335 1 c1674320 300 0 0 0    An example mimicking your sscanf example with the variable could be   import re hex digit pattern   r   dA-Fa-f   pat   r  d               r   P lt local addr gt HEX     P lt local port gt HEX               r   P lt rem addr gt HEX     P lt rem port gt HEX               r HEX  HEX  HEX  HEX  HEX  HEX    d    d              r   P lt inode gt  d    pat   pat replace  HEX   hex digit pattern   values   re search pat  line  groupdict    import pprint  pprint values   prints      inode    335       local addr    00000000       local port    0203       rem addr    00000000       rem port    0000

[python] sscanf in Python

Examples related to python

Examples related to parsing

Examples related to split

Examples related to scanf

Examples related to procfs