How to extract the substring between two markers

Question

Let s say I have a string  gfgfdAAA1234ZZZuijjk  and I want to extract just the  1234  part   I only know what will be the few characters directly before AAA  and after ZZZ the part I am interested in 1234   With sed it is possible to do something like this with a string   echo   STRING    sed -e  s   AAA      ZZZ    1     And this will give me 1234 as a result   How to do the same thing in Python

User · Answer

Using PyParsing  import pyparsing as pp  word   pp Word pp alphanums   s    gfgfdAAA1234ZZZuijjk  rule   pp nestedExpr  AAA    ZZZ   for match in rule searchString s       print match    which yields      1234

User · Answer

You can find first substring with this function in your code  by character index   Also  you can find what is after a substring     def FindSubString strText  strSubString  Offset None       try          Start   strText find strSubString          if Start    -1              return -1   Not Found         else              if Offset    None                  Result   strText Start len strSubString                elif Offset    0                  return Start             else                  AfterSubString   Start len strSubString                  Result   strText AfterSubString AfterSubString   int Offset               return Result     except          return -1    Example   Text    Thanks for contributing an answer to Stack Overflow   subText    to   print  Start of first substring in a text    start   FindSubString Text  subText  0  print start   print      print  Exact substring in a text    print Text start start len subText     print      print  What is after substring    s       subText   print FindSubString Text  subText      Your answer   Text    gfgfdAAA1234ZZZuijjk  subText1    AAA  subText2    ZZZ   AfterText1   FindSubString Text  subText1  0    len subText1  BeforText2   FindSubString Text  subText2  0    print   nYour answer  n s    Text AfterText1 BeforText2

User · Answer

Another way of doing it is using lists  supposing the substring you are looking for is made of numbers  only     string    gfgfdAAA1234ZZZuijjk  numbersList     0    1    2    3    4    5    6    7    8    9   output       for char in string      if char in numbersList  output append char   print f output      join output         output  1234

User · Answer

gt  gt  gt  s     tmp 10508 constantstring   gt  gt  gt  s split   tmp    1  split  constantstring   0  strip

User · Answer

import re print re search  AAA     ZZZ    gfgfdAAA1234ZZZuijjk   group 1

User · Answer

Just in case somebody will have to do the same thing that I did  I had to extract everything inside parenthesis in a line  For example  if I have a line like  US president  Barack Obama  met with      and I want to get only  Barack Obama  this is solution   regex                   matches   re search regex  line  line   matches group 1      n    I e  you need to block parenthesis with slash   sign  Though it is a problem about more regular expressions that Python   Also  in some cases you may see  r  symbols before regex definition  If there is no r prefix  you need to use escape characters like in C  Here is more discussion on that

User · Answer

gt  gt  gt  s    gfgfdAAA1234ZZZuijjk   gt  gt  gt  start   s find  AAA     3  gt  gt  gt  end   s find  ZZZ   start   gt  gt  gt  s start end   1234    Then you can use regexps with the re module as well  if you want  but that s not necessary in your case

User · Answer

text    I want to find a string between two substrings  left    find a   right    between two   print text text index left  len left  text index right      Gives  string

User · Answer

Typescript  Gets string in between two other strings   Searches shortest string between prefixes and postfixes prefixes - string   array of strings   null  means search from the start   postfixes - string   array of strings   null  means search until the end   public getStringInBetween str  string  prefixes  string   string     null                            postfixes  string   string     null   string        if  typeof prefixes      string             prefixes    prefixes              if  typeof postfixes      string             postfixes    postfixes              if   str    str length  lt  1            throw new Error str     should contain     prefixes              let start   prefixes     null     pos  0  sub         this indexOf str  prefixes       const end   postfixes     null     pos  str length  sub         this indexOf str  postfixes  start pos   start sub length        let value   str substring start pos   start sub length  end pos       if   value    value length  lt  1            throw new Error str     should contain string in between     prefixes     and     postfixes              while  true            try               start   this indexOf value  prefixes             catch  e                break                    value   value substring start pos   start sub length           if   value    value length  lt  1                throw new Error str     should contain string in between     prefixes     and     postfixes                        return value

User · Answer

Using regular expressions - documentation for further reference  import re  text    gfgfdAAA1234ZZZuijjk   m   re search  AAA     ZZZ   text  if m      found   m group 1     found  1234   or   import re  text    gfgfdAAA1234ZZZuijjk   try      found   re search  AAA     ZZZ   text  group 1  except AttributeError        AAA  ZZZ not found in the original string     found        apply your error handling    found  1234

User · Answer

Surprised that nobody has mentioned this which is my quick version for one-off scripts    gt  gt  gt  x    gfgfdAAA1234ZZZuijjk   gt  gt  gt  x split  AAA   1  split  ZZZ   0   1234

User · Answer

You can use re module for that    gt  gt  gt  import re  gt  gt  gt  re compile    AAA    ZZZ     match  gfgfdAAA1234ZZZuijjk   groups     1234

User · Answer

Here s a solution without regex that also accounts for scenarios where the first substring contains the second substring  This function will only find a substring if the second marker is after the first marker   def find substring string  start  end       len until end of first match   string find start    len start      after start   string len until end of first match       return string string find start    len start  len until end of first match   after start find end

User · Answer

you can do using just one line of code   gt  gt  gt  import re   gt  gt  gt  re findall r  d 1 5    gfgfdAAA1234ZZZuijjk     gt  gt  gt    1234     result will receive list

User · Answer

With sed it is possible to do something like this with a string    echo   STRING    sed -e  s   AAA      ZZZ    1       And this will give me 1234 as a result    You could do the same  with re sub function using the same regex    gt  gt  gt  re sub r   AAA    ZZZ     r  1    gfgfdAAA1234ZZZuijjk    1234    In basic sed  capturing group are represented by         but in python it was represented by

User · Answer

regular expression  import re  re search r    lt  AAA       ZZZ    your text  group 0    The above as-is will fail with an AttributeError if there are no  AAA  and  ZZZ  in your text  string methods  your text partition  AAA   2  partition  ZZZ   0    The above will return an empty string if either  AAA  or  ZZZ  don t exist in your text   PS Python Challenge

User · Answer

One liners that return other string if there was no match  Edit  improved version uses next function  replace  not-found  with something else if needed   import re res   next   m group 1  for m in  re search  AAA     ZZZ    gfgfdAAA1234ZZZuijjk      if m    not-found      My other method to do this  less optimal  uses regex 2nd time  still didn t found a shorter way   import re res       re search  AAA     ZZZ    gfgfdAAA1234ZZZuijjk   or re search            group 1

User · Answer

In python  extracting substring form string can be done using findall method in regular expression  re  module    gt  gt  gt  import re  gt  gt  gt  s    gfgfdAAA1234ZZZuijjk   gt  gt  gt  ss   re findall  AAA    ZZZ   s   gt  gt  gt  print ss   1234

[python] How to extract the substring between two markers?

Examples related to python

Examples related to string

Examples related to substring