Python Regex - How to Get Positions and Values of Matches

Question

How can I get the start and end positions of all matches using the re module  For example given the pattern r  a-z   and the string  a1b2c3d4  I d want to get the positions where it finds each letter  Ideally  I d like to get the text of the match back too

User · Answer

note that the span  amp  group are indexed for multi capture groups in a regex  regex with 3 groups r   a-z    0-9     A-Z    for match in re finditer regex with 3 groups  string       for idx in range 0  4           print match span idx   match group idx

User · Answer

Taken from   Regular Expression HOWTO     span   returns both start and end indexes in a single tuple  Since the   match method only checks if the RE matches at the start of a string    start   will always be zero  However  the search method of RegexObject   instances scans through the string  so the match may not start at zero   in that case     gt  gt  gt  p   re compile   a-z      gt  gt  gt  print p match      message   None  gt  gt  gt  m   p search      message     print m  lt re MatchObject instance at 80c9650 gt   gt  gt  gt  m group    message   gt  gt  gt  m span    4  11    Combine that with   In Python 2 2  the finditer   method is also available  returning a sequence of MatchObject instances as an iterator    gt  gt  gt  p   re compile         gt  gt  gt  iterator   p finditer  12 drummers drumming  11     10        gt  gt  gt  iterator  lt callable-iterator object at 0x401833ac gt   gt  gt  gt  for match in iterator          print match span        0  2   22  24   29  31    you should be able to do something on the order of  for match in re finditer r  a-z     a1b2c3d4       print match span

User · Answer

import re p   re compile   a-z    for m in p finditer  a1b2c3d4        print m start    m group

User · Answer

import re p   re compile   a-z    for m in p finditer  a1b2c3d4        print m start    m group

User · Answer

For Python 3 x   from re import finditer for match in finditer  pattern    string        print match span    match group      You shall get  n separated tuples  comprising first and last indices of the match  respectively  and the match itself  for each hit in the string

User · Answer

note that the span  amp  group are indexed for multi capture groups in a regex  regex with 3 groups r   a-z    0-9     A-Z    for match in re finditer regex with 3 groups  string       for idx in range 0  4           print match span idx   match group idx

User · Answer

import re p   re compile   a-z    for m in p finditer  a1b2c3d4        print m start    m group

User · Answer

Taken from   Regular Expression HOWTO     span   returns both start and end indexes in a single tuple  Since the   match method only checks if the RE matches at the start of a string    start   will always be zero  However  the search method of RegexObject   instances scans through the string  so the match may not start at zero   in that case     gt  gt  gt  p   re compile   a-z      gt  gt  gt  print p match      message   None  gt  gt  gt  m   p search      message     print m  lt re MatchObject instance at 80c9650 gt   gt  gt  gt  m group    message   gt  gt  gt  m span    4  11    Combine that with   In Python 2 2  the finditer   method is also available  returning a sequence of MatchObject instances as an iterator    gt  gt  gt  p   re compile         gt  gt  gt  iterator   p finditer  12 drummers drumming  11     10        gt  gt  gt  iterator  lt callable-iterator object at 0x401833ac gt   gt  gt  gt  for match in iterator          print match span        0  2   22  24   29  31    you should be able to do something on the order of  for match in re finditer r  a-z     a1b2c3d4       print match span

User · Answer

Taken from   Regular Expression HOWTO     span   returns both start and end indexes in a single tuple  Since the   match method only checks if the RE matches at the start of a string    start   will always be zero  However  the search method of RegexObject   instances scans through the string  so the match may not start at zero   in that case     gt  gt  gt  p   re compile   a-z      gt  gt  gt  print p match      message   None  gt  gt  gt  m   p search      message     print m  lt re MatchObject instance at 80c9650 gt   gt  gt  gt  m group    message   gt  gt  gt  m span    4  11    Combine that with   In Python 2 2  the finditer   method is also available  returning a sequence of MatchObject instances as an iterator    gt  gt  gt  p   re compile         gt  gt  gt  iterator   p finditer  12 drummers drumming  11     10        gt  gt  gt  iterator  lt callable-iterator object at 0x401833ac gt   gt  gt  gt  for match in iterator          print match span        0  2   22  24   29  31    you should be able to do something on the order of  for match in re finditer r  a-z     a1b2c3d4       print match span

User · Answer

For Python 3 x   from re import finditer for match in finditer  pattern    string        print match span    match group      You shall get  n separated tuples  comprising first and last indices of the match  respectively  and the match itself  for each hit in the string

User · Answer

import re p   re compile   a-z    for m in p finditer  a1b2c3d4        print m start    m group

User · Answer

Taken from   Regular Expression HOWTO     span   returns both start and end indexes in a single tuple  Since the   match method only checks if the RE matches at the start of a string    start   will always be zero  However  the search method of RegexObject   instances scans through the string  so the match may not start at zero   in that case     gt  gt  gt  p   re compile   a-z      gt  gt  gt  print p match      message   None  gt  gt  gt  m   p search      message     print m  lt re MatchObject instance at 80c9650 gt   gt  gt  gt  m group    message   gt  gt  gt  m span    4  11    Combine that with   In Python 2 2  the finditer   method is also available  returning a sequence of MatchObject instances as an iterator    gt  gt  gt  p   re compile         gt  gt  gt  iterator   p finditer  12 drummers drumming  11     10        gt  gt  gt  iterator  lt callable-iterator object at 0x401833ac gt   gt  gt  gt  for match in iterator          print match span        0  2   22  24   29  31    you should be able to do something on the order of  for match in re finditer r  a-z     a1b2c3d4       print match span

[python] Python Regex - How to Get Positions and Values of Matches

Examples related to python

Examples related to regex