In Python how do I split a string and keep the separators

Question

Here s the simplest way to explain this  Here s what I m using   re split   W    foo bar spam neggs   - gt    foo    bar    spam    eggs     Here s what I want   someMethod   W    foo bar spam neggs   - gt    foo         bar         spam     n    eggs     The reason is that I want to split a string into tokens  manipulate it  then put it back together again

User · Answer

replace all seperator    W  with seperator   new seperator    W   split by the new seperator        def split and keep seperator  s     return re split      re sub seperator  lambda match  match group          s    print   W    foo bar spam neggs

User · Answer

If one wants to split string while keeping separators by regex without capturing group   def finditer with separators regex  s       matches          prev end   0     for match in regex finditer s           match start   match start           if  prev end    0 or match start  gt  0  and match start    prev end              matches append s prev end match start             matches append match group            prev end   match end       if prev end  lt  len s           matches append s prev end        return matches  regex   re compile r          matches   finditer with separators regex  s    If one assumes that regex is wrapped up into capturing group   def split with separators regex  s       matches   list filter None  regex split s        return matches  regex   re compile r            matches   split with separators regex  s    Both ways also will remove empty groups which are useless and annoying in most of the cases

User · Answer

This keeps all separators  in result                                                                             import re st     c dd e f-1523   7  sh re compile      -      lt   gt            def splitStringFull sh  st      ls sh split st     lo       start 0    for l in ls       if not l   continue      k st find l       llen len l       if k gt  start         tmp  st start k         lo append tmp         lo append l         start   k   llen      else         lo append l         start  llen    return lo                                  li  splitStringFull sh   st           c         dd         e         f    -    1523           7

User · Answer

Another no-regex solution that works well on Python 3    Split strings and keep separator test strings      lt Hello gt     Hi     lt Hi gt   lt Planet gt      lt         def split and keep s  sep      if not s  return        consistent with string split         Find replacement character that is not used in string      i e  just use the highest available character plus one      Note  This fails if ord max s     0x10FFFF  ValueError     p chr ord max s   1       return s replace sep  sep p  split p   for s in test strings     print split and keep s    lt         If the unicode limit is reached it will fail explicitly unicode max char   chr 1114111  ridiculous string     lt Hello gt   unicode max char   lt World gt   print split and keep ridiculous string    lt

User · Answer

I found this generator based approach more satisfying   def split keep string  sep          Usage       gt  gt  gt  list split keep  a b c d               a     b     c     d               start   0     while True          end   string find sep  start    1         if end    0              break         yield string start end          start   end     yield string start     It avoids the need to figure out the correct regex  while in theory should be fairly cheap  It doesn t create new string objects and  delegates most of the iteration work to the efficient find method       and in Python 3 8 it can be as short as   def split keep string  sep       start   0     while  end    string find sep  start    1   gt  0          yield string start end          start   end     yield string start

User · Answer

Here is a simple  split solution that works without regex  This is an answer for Python split   without removing the delimiter  so not exactly what the original post asks but the other question was closed as a duplicate for this one  def splitkeep s  delimiter       split   s split delimiter      return  substr   delimiter for substr in split  -1      split -1    Random tests  import random  CHARS     quot   quot    quot a quot    quot b quot    quot c quot   assert splitkeep  quot  quot    quot X quot        quot  quot      0 length test for delimiter in                  for   in range 100000           length   random randint 1  50          s    quot  quot  join random choice CHARS  for   in range length           assert  quot  quot  join splitkeep s  delimiter      s

User · Answer

If you have only 1 separator  you can employ list comprehensions   text    foo bar baz qux    sep         Appending prepending separator   result    x sep for x in text split sep      foo     bar     baz     qux      to get rid of trailing result -1    result -1  strip sep     foo     bar     baz     qux    result    sep x for x in text split sep       foo     bar     baz     qux     to get rid of trailing result 0    result 0  strip sep     foo     bar     baz     qux     Separator as it s own element   result    u for x in text split sep  for u in  x  sep      foo         bar         baz         qux        results   result  -1      to get rid of trailing

User · Answer

another example  split on non alpha-numeric and keep the separators  import re a    foo bar candy ice cream  re split     a-zA-Z0-9    a    output     foo         bar         candy         ice         cream     explanation  re split     a-zA-Z0-9    a       lt - keep the separators     lt - match everything in between  a-zA-Z0-9  lt -except alphabets  upper lower and numbers

User · Answer

One Lazy and Simple Solution  Assume your regex pattern is split pattern   r          First  you add some same character as the new separator  like   cut    new string   re sub split pattern     1 cut     your string   Then you split the new separator  new string split   cut

User · Answer

If you are splitting on newline  use splitlines True     gt  gt  gt   line 1 nline 2 nline without newline  splitlines True    line 1 n    line 2 n    line without newline      Not a general solution  but adding this here in case someone comes here not realizing this method existed

User · Answer

You can also split a string with an array of strings instead of a regular expression  like this  def tokenizeString aString  separators        separators is an array of strings that are being used to split the string       sort separators in order of descending length     separators sort key len      listToReturn          i   0     while i  lt  len aString           theSeparator    quot  quot          for current in separators              if current    aString i i len current                    theSeparator   current         if theSeparator     quot  quot               listToReturn     theSeparator              i   i   len theSeparator          else              if listToReturn                        listToReturn     quot  quot               if listToReturn -1  in separators                   listToReturn      quot  quot               listToReturn -1     aString i              i    1     return listToReturn       print tokenizeString aString    quot   quot   quot   quot hi  quot   quot   quot  hello   world     1 2 3 5     hi    quot   separators     quot     quot               quot   quot    quot   quot    quot     quot       quot     quot -  quot    quot - quot    quot   quot     quot  quot  quot     quot   quot    quot   quot

User · Answer

May I just leave it here s    foo bar spam neggs  print s replace                 replace                 replace   n        n      split            foo         bar         spam     n    eggs

User · Answer

gt  gt  gt  re split    W     foo bar spam neggs     foo         bar         spam     n    eggs

User · Answer

I had a similar issue trying to split a file path and struggled to find a simple answer  This worked for me and didn t involve having to substitute delimiters back into the split text   my path    folder1 folder2 folder3 file1   import re  re findall                 my path   returns     folder1     folder2     folder3     file1

[python] In Python, how do I split a string and keep the separators?

Examples related to python

Examples related to regex