How to split a string but also keep the delimiters

Question

I have a multiline string which is delimited by a set of different delimiters    Text1  DelimiterA  Text2  DelimiterC  Text3  DelimiterB  Text4    I can split this string into its parts  using String split  but it seems that I can t get the actual string  which matched the delimiter regex   In other words  this is what I get    Text1 Text2 Text3 Text4   This is what I want   Text1 DelimiterA Text2 DelimiterC Text3 DelimiterB Text4   Is there any JDK way to split the string using a delimiter regex but also keep the delimiters

User · Answer

You want to use lookarounds  and split on zero-width matches  Here are some examples   public class SplitNDump       static void dump String   arr            for  String s   arr                System out format    s    s                     System out println              public static void main String   args            dump  1 234 567 890  split                     1  234  567  890           dump  1 234 567 890  split                            1   234   567   890           dump  1 234 567 890  split     lt                        1   234   567   890           dump  1 234 567 890  split     lt                            1     234     567     890            dump   a bb  c   split           lt                           a     bb        c              dump   a bb  c   split                lt                         a     bb        c              dump     a    b  b  c   split               lt             lt                           a        b  b      c              dump  a bb   c  d  e  split         b                  a     bb       c      d      e            dump  ArrayIndexOutOfBoundsException  split     lt   a-z      A-Z                    Array  Index  Out  Of  Bounds  Exception           dump  1234567890  split     lt    G  4                       1234  5678  90               Split at the end of each run of letter         dump  Boooyaaaah  Yippieeee    split     lt          1     1                       Booo  yaaaa  h  Yipp  ieeee                 And yes  that is triply-nested assertion there in the last pattern   Related questions   Java split is eating my characters  Can you use zero-width matching regex in String split  How do I convert CamelCase into human-readable names in Java  Backreferences in lookbehind   See also   regular-expressions info Lookarounds

User · Answer

Fast answer  use non physical bounds like  b to split  I will try and experiment to see if it works  used that in PHP and JS    It is possible  and kind of work  but might split too much  Actually  it depends on the string you want to split and the result you need  Give more details  we will help you better   Another way is to do your own split  capturing the delimiter  supposing it is variable  and adding it afterward to the result   My quick test   String str     ab   cd   eg    String   stra   str split    b    for  String s   stra  System out print s         System out println      Result     ab     cd     eg      A bit too much     -

User · Answer

Pass the 3rd aurgument as  true   It will return delimiters as well   StringTokenizer String str  String delimiters  true

User · Answer

One of the subtleties in this question involves the  leading delimiter  question  if you are going to have a combined array of tokens and delimiters you have to know whether it starts with a token or a delimiter   You could of course just assume that a leading delim should be discarded but this seems an unjustified assumption   You might also want to know whether you have a trailing delim or not   This sets two boolean flags accordingly   Written in Groovy but a Java version should be fairly obvious               String tokenRegex      p L  p N        a String in Groovy  Unicode alphanumeric             def finder   phraseForTokenising    tokenRegex                NB in Groovy the variable  finder  is then of class java util regex Matcher             def finderIt   finder iterator      extra method added to Matcher by Groovy magic             int start   0             boolean leadingDelim  trailingDelim             def combinedTokensAndDelims         create an array in Groovy              while  finderIt hasNext                                   def token   finderIt next                   int finderStart   finder start                   String delim   phraseForTokenising  start     finderStart - 1                      Groovy  above gets slice of String array                 if  start    0   leadingDelim   finderStart    0                 if  start  gt  0    leadingDelim   combinedTokensAndDelims  lt  lt  delim                 combinedTokensAndDelims  lt  lt  token    add element to end of array                 start   finder end                                start    0 indicates no tokens found             if  start  gt  0                        finish by seeing whether there is a trailing delim                 trailingDelim   start  lt  phraseForTokenising length                   if  trailingDelim   combinedTokensAndDelims  lt  lt  phraseForTokenising  start    -1                    println   leading delim   leadingDelim  trailing delim   trailingDelim  combined array  n  combinedTokensAndDelims

User · Answer

I don t know Java too well  but if you can t find a Split method that does that  I suggest you just make your own   string   mySplit string s string delimiter        string   result   s Split delimiter       for int i 0 i lt result Length-1 i                  result i     delimiter    this one would add the delimiter to each items end except the last item                         you can modify it however you want         string   res   mySplit myString myDelimiter     Its not too elegant  but it ll do

User · Answer

I will post my working versions also first is really similar to Markus    public static String   splitIncludeDelimeter String regex  String text       List lt String gt  list   new LinkedList lt  gt         Matcher matcher   Pattern compile regex  matcher text        int now  old   0      while matcher find             now   matcher end            list add text substring old  now            old   now             if list size      0          return new String   text          adding rest of a text as last element     String finalElement   text substring old       list add finalElement        return list toArray new String list size          And here is second solution and its round 50  faster than first one   public static String   splitIncludeDelimeter2 String regex  String text       List lt String gt  list   new LinkedList lt  gt         Matcher matcher   Pattern compile regex  matcher text        StringBuffer stringBuffer   new StringBuffer        while matcher find             matcher appendReplacement stringBuffer  matcher group             list add stringBuffer toString             stringBuffer setLength 0     clear buffer            matcher appendTail stringBuffer      dodajemy reszte  ciagu     list add stringBuffer toString          return list toArray new String list size

User · Answer

I don t think it is possible with String split  but you can use a StringTokenizer  though that won t allow you to define your delimiter as a regex  but only as a class of single-digit characters   new StringTokenizer  Hello  world  Hi           true      true for returnDelims

User · Answer

A very naive solution  that doesn t involve regex would be to perform a string replace on your delimiter along the lines of  assuming comma for delimiter    string replace FullString                 Where you can replace tilda     with an appropriate unique delimiter   Then if you do a split on your new delimiter then i believe you will get the desired result

User · Answer

I suggest using Pattern and Matcher  which will almost certainly achieve what you want  Your regular expression will need to be somewhat more complicated than what you are using in String split

User · Answer

I know this is a very-very old question and answer has also been accepted  But still I would like to submit a very simple answer to original question  Consider this code   String str    Hello-World How nAre You amp doing   inputs   str split         b    for  int i 0  i lt inputs length  i         System out println  a     i              inputs i              OUTPUT   a 0     Hello  a 1     -  a 2     World  a 3        a 4     How  a 5        a 6     Are  a 7        a 8     You  a 9      amp   a 10     doing    I am just using word boundary  b to delimit the words except when it is start of text

User · Answer

Another candidate solution using a regex  Retains token order  correctly matches multiple tokens of the same type in a row  The downside is that the regex is kind of nasty   package javaapplication2   import java util ArrayList  import java util List  import java util regex Matcher  import java util regex Pattern   public class JavaApplication2                    param args the command line arguments             public static void main String   args            String num    58 5 variable- 98 78 96 a 78 7-3443 12-3               Terrifying regex               a   b   c  match a or b or c            where               a  is one or more digits optionally followed by a decimal point                  followed by one or more digits    d     d                   b  is one of the set       - occurring once       -                 c  is a sequence of one or more lowercase latin letter    a-z            Pattern tokenPattern   Pattern compile     d       d          -     a-z               Matcher tokenMatcher   tokenPattern matcher num            List lt String gt  tokens   new ArrayList lt  gt              while   tokenMatcher hitEnd                  if  tokenMatcher find                      tokens add tokenMatcher group                   else                      report error                 break                                   System out println tokens             Sample output    58 5     variable  -     98     78     96     a     78 7  -  3443     12  -  3

User · Answer

I like the idea of StringTokenizer because it is Enumerable  But it is also obsolete  and replace by String split which return a boring String    and does not includes the delimiters    So I implemented a StringTokenizerEx which is an Iterable  and which takes a true regexp to split a string   A true regexp means it is not a  Character sequence  repeated to form the delimiter   o  will only match  o   and split  ooo  into three delimiter  with two empty string inside    o        o        o    But the regexp o  will return the expected result when splitting  aooob        a    ooo    b        To use this StringTokenizerEx   final StringTokenizerEx aStringTokenizerEx   new StringTokenizerEx  boo and foo    o     final String firstDelimiter   aStringTokenizerEx getDelimiter    for String aString  aStringTokenizerEx            uses the split String detected and memorized in  aString      final nextDelimiter   aStringTokenizerEx getDelimiter        The code of this class is available at DZone Snippets   As usual for a code-challenge response  one self-contained class with test cases included   copy-paste it  in a  src test  directory  and run it  Its main   method illustrates the different usages     Note   late 2009 edit   The article Final Thoughts  Java Puzzler  Splitting Hairs does a good work explaning the bizarre behavior in String split    Josh Bloch even commented in response to that article      Yes  this is a pain  FWIW  it was done for a very good reason  compatibility with Perl    The guy who did it is Mike  madbot  McCloskey  who now works with us at Google  Mike made sure that Java s regular expressions passed virtually every one of the 30K Perl regular expression tests  and ran faster     The Google common-library Guava contains also a Splitter which is    simpler to use maintained by Google  and not by you    So it may worth being checked out  From their initial rough documentation   pdf       JDK has this    String   pieces    foo bar  split              It s fine to use this if you want exactly what it does    - regular expression   - result as an array   - its way of handling empty pieces      Mini-puzzler    a  b   split      returns       a       a        b       b  null   a   null   b   null  c   a   null   b   d   a    b   e  None of the above      Answer   e  None of the above      a  b   split      returns      a        b       Only trailing empties are skipped   Who knows the workaround to prevent the skipping  It s a fun one          In any case  our Splitter is simply more flexible  The default behavior is simplistic    Splitter on      split   foo   bar  quux    -- gt     foo         bar     quux            If you want extra features  ask for them    Splitter on       trimResults    omitEmptyStrings    split   foo   bar  quux    -- gt    foo    bar    quux        Order of config methods doesn t matter -- during splitting  trimming happens before checking for empties

User · Answer

I got here late  but returning to the original question  why not just use lookarounds   Pattern p   Pattern compile     lt    w      W     lt    W      w     System out println Arrays toString p split   ab   cd   eg       System out println Arrays toString p split  boo and foo        output       ab       cd       eg      boo     and     foo    EDIT  What you see above is what appears on the command line when I run that code  but I now see that it s a bit confusing   It s difficult to keep track of which commas are part of the result and which were added by Arrays toString     SO s syntax highlighting isn t helping either  In hopes of getting the highlighting to work with me instead of against me  here s how those arrays would look it I were declaring them in source code           ab           cd           eg            boo         and         foo      I hope that s easier to read   Thanks for the heads-up   finnw

User · Answer

An extremely naive and inefficient solution which works nevertheless Use split twice on the string and then concatenate the two arrays  String temp   str split    W    String temp2   str split    w    s    int i 0  for String string temp  System out println string   String temp3   new String temp length-1   for String string temp2            System out println string           if  string equals      true  amp  amp  string equals    s    true                             temp3 i  string                  i                      System out println temp length           System out println temp2 length     System out println temp3 length   String   temp4 new String temp length temp3 length   int j 0  for i 0 i lt temp length i              temp4 j  temp i           j j 2    j 1  for i 0 i lt temp3 length i              temp4 j  temp3 i           j  2    for String s temp4  System out println s

User · Answer

Here s a groovy version based on some of the code above  in case it helps   It s short  anyway   Conditionally includes the head and tail  if they are not empty   The last part is a demo test case   List splitWithTokens str  pat        def tokens        def lastMatch 0     def m   str  pat     while  m find            if  m start    gt  0  tokens  lt  lt  str lastMatch   lt m start          tokens  lt  lt  m group         lastMatch m end             if  lastMatch  lt  str length    tokens  lt  lt  str lastMatch   lt str length        tokens        lt html gt  lt head gt  lt title gt this is the title lt  title gt  lt  head gt     lt    gt    gt        before lt html gt  lt head gt  lt title gt this is the title lt  title gt  lt  head gt after    lt    gt    gt      each       println splitWithTokens  it

User · Answer

import java util regex    import java util LinkedList   public class Splitter       private static final Pattern DEFAULT PATTERN   Pattern compile    s          private Pattern pattern      private boolean keep delimiters       public Splitter Pattern pattern  boolean keep delimiters            this pattern   pattern          this keep delimiters   keep delimiters            public Splitter String pattern  boolean keep delimiters            this Pattern compile pattern  null    pattern   keep delimiters             public Splitter Pattern pattern    this pattern  true         public Splitter String pattern    this pattern  true         public Splitter boolean keep delimiters    this DEFAULT PATTERN  keep delimiters         public Splitter     this DEFAULT PATTERN          public String   split String text            if  text    null                text                          int last match   0          LinkedList lt String gt  splitted   new LinkedList lt String gt              Matcher m   this pattern matcher text            while  m find                   splitted add text substring last match m start                   if  this keep delimiters                    splitted add m group                                last match   m end                       splitted add text substring last match             return splitted toArray new String splitted size                 public static void main String   argv            if  argv length    2                System err println  Syntax  java Splitter  lt pattern gt   lt text gt                 return                     Pattern pattern   null          try               pattern   Pattern compile argv 0                      catch  PatternSyntaxException e                System err println e               return                     Splitter splitter   new Splitter pattern            String text   argv 1           int counter   1          for  String part   splitter split text                 System out printf  Part  d     s   n   counter    part                             Example       gt  java Splitter   W    Hello World       Part 1   Hello      Part 2          Part 3   World      Part 4          Part 5          I don t really like the other way  where you get an empty element in front and back  A delimiter is usually not at the beginning or at the end of the string  thus you most often end up wasting two good array slots   Edit  Fixed limit cases  Commented source with test cases can be found here  http   snippets dzone com posts show 6453

User · Answer

I got here late  but returning to the original question  why not just use lookarounds   Pattern p   Pattern compile     lt    w      W     lt    W      w     System out println Arrays toString p split   ab   cd   eg       System out println Arrays toString p split  boo and foo        output       ab       cd       eg      boo     and     foo    EDIT  What you see above is what appears on the command line when I run that code  but I now see that it s a bit confusing   It s difficult to keep track of which commas are part of the result and which were added by Arrays toString     SO s syntax highlighting isn t helping either  In hopes of getting the highlighting to work with me instead of against me  here s how those arrays would look it I were declaring them in source code           ab           cd           eg            boo         and         foo      I hope that s easier to read   Thanks for the heads-up   finnw

User · Answer

I don t know of an existing function in the Java API that does this  which is not to say it doesn t exist   but here s my own implementation  one or more delimiters will be returned as a single token  if you want each delimiter to be returned as a separate token  it will need a bit of adaptation    static String   splitWithDelimiters String s        if  s    null    s length      0            return new String 0             LinkedList lt String gt  result   new LinkedList lt String gt         StringBuilder sb   null      boolean wasLetterOrDigit    Character isLetterOrDigit s charAt 0        for  char c   s toCharArray              if  Character isLetterOrDigit c    wasLetterOrDigit                if  sb    null                    result add sb toString                               sb   new StringBuilder                wasLetterOrDigit    wasLetterOrDigit                    sb append c             result add sb toString         return result toArray new String 0

User · Answer

Fast answer  use non physical bounds like  b to split  I will try and experiment to see if it works  used that in PHP and JS    It is possible  and kind of work  but might split too much  Actually  it depends on the string you want to split and the result you need  Give more details  we will help you better   Another way is to do your own split  capturing the delimiter  supposing it is variable  and adding it afterward to the result   My quick test   String str     ab   cd   eg    String   stra   str split    b    for  String s   stra  System out print s         System out println      Result     ab     cd     eg      A bit too much     -

User · Answer

You can use Lookahead and Lookbehind  Like this   System out println Arrays toString  a b c d  split     lt          System out println Arrays toString  a b c d  split             System out println Arrays toString  a b c d  split      lt                  And you will get    a   b   c   d   a   b   c   d   a     b     c     d   The last one is what you want       lt            equals to select an empty character before   or after     Hope this helps   EDIT Fabian Steeg comments on Readability is valid  Readability is always the problem for RegEx  One thing  I do to help easing this is to create a variable whose name represent what the regex does and use Java String format to help that  Like this   static public final String WITH DELIMITER        lt   1 s      1 s         public void someMethod         final String   aEach    a b c d  split String format WITH DELIMITER                     This helps a little bit   -D

User · Answer

String expression      A B  C-D  E       expression   expression replaceAll                    expression   expression replaceAll                    expression   expression replaceAll  -     -         expression   expression replaceAll                   expression   expression replaceAll                  also you can use     instead of         expression   expression replaceAll                  also you can use     instead of         expression   expression replaceAll                 if expression startsWith                 expression   expression substring 1              String   expressionArray   expression split           System out println Arrays toString expressionArray

User · Answer

If you want keep character then use split method with loophole in  split   method  See this example  public class SplitExample         public static void main String   args              String str    quot Javathomettt quot             System out println  quot method 1 quot            System out println  quot Returning words  quot              String   arr   str split  quot t quot   40             for  String w   arr                  System out println w  quot t quot                          System out println  quot Split array length   quot  arr length             System out println  quot method 2 quot            System out println str replaceAll  quot t quot    quot  n quot   quot t quot

User · Answer

Here is a simple clean implementation which is consistent with Pattern split and works with variable length patterns  which look behind cannot support  and it is easier to use  It is similar to the solution provided by  cletus   public static String   split CharSequence input  String pattern        return split input  Pattern compile pattern       public static String   split CharSequence input  Pattern pattern        Matcher matcher   pattern matcher input       int start   0      List lt String gt  result   new ArrayList lt  gt         while  matcher find              result add input subSequence start  matcher start    toString             result add matcher group             start   matcher end              if  start    input length    result add input subSequence start  input length    toString         return result toArray new String 0        I don t do null checks here  Pattern split doesn t  why should I  I don t like the if at the end but it is required for consistency with the Pattern split   Otherwise I would unconditionally append  resulting in an empty string as the last element of the result if the input string ends with the pattern   I convert to String   for consistency with Pattern split  I use new String 0  rather than new String result size     see here for why   Here are my tests    Test public void splitsVariableLengthPattern         String   result   Split split   foo  bar bas         w         Assert assertArrayEquals new String       foo      bar     bas     result       Test public void splitsEndingWithPattern         String   result   Split split   foo  bar         w         Assert assertArrayEquals new String       foo      bar     result       Test public void splitsStartingWithPattern         String   result   Split split   foo bar         w         Assert assertArrayEquals new String           foo     bar     result       Test public void splitsNoMatchesPattern         String   result   Split split   foo bar         w         Assert assertArrayEquals new String       foo bar     result

User · Answer

import java util regex    import java util LinkedList   public class Splitter       private static final Pattern DEFAULT PATTERN   Pattern compile    s          private Pattern pattern      private boolean keep delimiters       public Splitter Pattern pattern  boolean keep delimiters            this pattern   pattern          this keep delimiters   keep delimiters            public Splitter String pattern  boolean keep delimiters            this Pattern compile pattern  null    pattern   keep delimiters             public Splitter Pattern pattern    this pattern  true         public Splitter String pattern    this pattern  true         public Splitter boolean keep delimiters    this DEFAULT PATTERN  keep delimiters         public Splitter     this DEFAULT PATTERN          public String   split String text            if  text    null                text                          int last match   0          LinkedList lt String gt  splitted   new LinkedList lt String gt              Matcher m   this pattern matcher text            while  m find                   splitted add text substring last match m start                   if  this keep delimiters                    splitted add m group                                last match   m end                       splitted add text substring last match             return splitted toArray new String splitted size                 public static void main String   argv            if  argv length    2                System err println  Syntax  java Splitter  lt pattern gt   lt text gt                 return                     Pattern pattern   null          try               pattern   Pattern compile argv 0                      catch  PatternSyntaxException e                System err println e               return                     Splitter splitter   new Splitter pattern            String text   argv 1           int counter   1          for  String part   splitter split text                 System out printf  Part  d     s   n   counter    part                             Example       gt  java Splitter   W    Hello World       Part 1   Hello      Part 2          Part 3   World      Part 4          Part 5          I don t really like the other way  where you get an empty element in front and back  A delimiter is usually not at the beginning or at the end of the string  thus you most often end up wasting two good array slots   Edit  Fixed limit cases  Commented source with test cases can be found here  http   snippets dzone com posts show 6453

User · Answer

I don t know of an existing function in the Java API that does this  which is not to say it doesn t exist   but here s my own implementation  one or more delimiters will be returned as a single token  if you want each delimiter to be returned as a separate token  it will need a bit of adaptation    static String   splitWithDelimiters String s        if  s    null    s length      0            return new String 0             LinkedList lt String gt  result   new LinkedList lt String gt         StringBuilder sb   null      boolean wasLetterOrDigit    Character isLetterOrDigit s charAt 0        for  char c   s toCharArray              if  Character isLetterOrDigit c    wasLetterOrDigit                if  sb    null                    result add sb toString                               sb   new StringBuilder                wasLetterOrDigit    wasLetterOrDigit                    sb append c             result add sb toString         return result toArray new String 0

User · Answer

I had a look at the above answers and honestly none of them I find satisfactory   What you want to do is essentially mimic the Perl split functionality   Why Java doesn t allow this and have a join   method somewhere is beyond me but I digress   You don t even need a class for this really   Its just a function   Run this sample program   Some of the earlier answers have excessive null-checking  which I recently wrote a response to a question here   https   stackoverflow com users 18393 cletus  Anyway  the code   public class Split       public static List lt String gt  split String s  String pattern            assert s    null          assert pattern    null          return split s  Pattern compile pattern               public static List lt String gt  split String s  Pattern pattern            assert s    null          assert pattern    null          Matcher m   pattern matcher s           List lt String gt  ret   new ArrayList lt String gt             int start   0          while  m find                  ret add s substring start  m start                  ret add m group                 start   m end                      ret add start  gt   s length          s substring start            return ret             private static void testSplit String s  String pattern            System out printf  Splitting   s  with pattern   s  n   s  pattern           List lt String gt  tokens   split s  pattern           System out printf  Found  d matches n   tokens size             int i   0          for  String token   tokens                System out printf     d  d    s  n     i  tokens size    token                     System out println               public static void main String args              testSplit  abcdefghij    z        abcdefghij          testSplit  abcdefghij    f        abcde    f    ghi          testSplit  abcdefghij    j        abcdefghi    j              testSplit  abcdefghij    a            a    bcdefghij          testSplit  abcdefghij     bdfh         a    b    c    d    e    f    g    h    ij

User · Answer

If you can afford  use Java s replace CharSequence target  CharSequence replacement  method and fill in another delimiter to split with  Example  I want to split the string  boo and foo  and keep     at its righthand String   String str    boo and foo   str   str replace      newdelimiter     String   tokens   str split  newdelimiter      Important note  This only works if you have no further  newdelimiter  in your String  Thus  it is not a general solution   But if you know a CharSequence of which you can be sure that it will never appear in the String  this is a very simple solution

User · Answer

Tweaked Pattern split   to include matched pattern to the list  Added      add match to the list         matchList add input subSequence start  end  toString       Full source  public static String   inclusiveSplit String input  String re  int limit        int index   0      boolean matchLimited   limit  gt  0      ArrayList lt String gt  matchList   new ArrayList lt String gt          Pattern pattern   Pattern compile re       Matcher m   pattern matcher input           Add segments before each match found     while  m find              int end   m end            if   matchLimited    matchList size    lt  limit - 1                int start   m start                String match   input subSequence index  start  toString                matchList add match                  add match to the list             matchList add input subSequence start  end  toString                 index   end            else if  matchList size      limit - 1       last one             String match   input subSequence index  input length                         toString                matchList add match               index   end                          If no match was found  return this     if  index    0          return new String     input toString              Add remaining segment     if   matchLimited    matchList size    lt  limit          matchList add input subSequence index  input length    toString             Construct result     int resultSize   matchList size        if  limit    0          while  resultSize  gt  0  amp  amp  matchList get resultSize - 1  equals                  resultSize--      String   result   new String resultSize       return matchList subList 0  resultSize  toArray result

User · Answer

I had a look at the above answers and honestly none of them I find satisfactory   What you want to do is essentially mimic the Perl split functionality   Why Java doesn t allow this and have a join   method somewhere is beyond me but I digress   You don t even need a class for this really   Its just a function   Run this sample program   Some of the earlier answers have excessive null-checking  which I recently wrote a response to a question here   https   stackoverflow com users 18393 cletus  Anyway  the code   public class Split       public static List lt String gt  split String s  String pattern            assert s    null          assert pattern    null          return split s  Pattern compile pattern               public static List lt String gt  split String s  Pattern pattern            assert s    null          assert pattern    null          Matcher m   pattern matcher s           List lt String gt  ret   new ArrayList lt String gt             int start   0          while  m find                  ret add s substring start  m start                  ret add m group                 start   m end                      ret add start  gt   s length          s substring start            return ret             private static void testSplit String s  String pattern            System out printf  Splitting   s  with pattern   s  n   s  pattern           List lt String gt  tokens   split s  pattern           System out printf  Found  d matches n   tokens size             int i   0          for  String token   tokens                System out printf     d  d    s  n     i  tokens size    token                     System out println               public static void main String args              testSplit  abcdefghij    z        abcdefghij          testSplit  abcdefghij    f        abcde    f    ghi          testSplit  abcdefghij    j        abcdefghi    j              testSplit  abcdefghij    a            a    bcdefghij          testSplit  abcdefghij     bdfh         a    b    c    d    e    f    g    h    ij

User · Answer

I like the idea of StringTokenizer because it is Enumerable  But it is also obsolete  and replace by String split which return a boring String    and does not includes the delimiters    So I implemented a StringTokenizerEx which is an Iterable  and which takes a true regexp to split a string   A true regexp means it is not a  Character sequence  repeated to form the delimiter   o  will only match  o   and split  ooo  into three delimiter  with two empty string inside    o        o        o    But the regexp o  will return the expected result when splitting  aooob        a    ooo    b        To use this StringTokenizerEx   final StringTokenizerEx aStringTokenizerEx   new StringTokenizerEx  boo and foo    o     final String firstDelimiter   aStringTokenizerEx getDelimiter    for String aString  aStringTokenizerEx            uses the split String detected and memorized in  aString      final nextDelimiter   aStringTokenizerEx getDelimiter        The code of this class is available at DZone Snippets   As usual for a code-challenge response  one self-contained class with test cases included   copy-paste it  in a  src test  directory  and run it  Its main   method illustrates the different usages     Note   late 2009 edit   The article Final Thoughts  Java Puzzler  Splitting Hairs does a good work explaning the bizarre behavior in String split    Josh Bloch even commented in response to that article      Yes  this is a pain  FWIW  it was done for a very good reason  compatibility with Perl    The guy who did it is Mike  madbot  McCloskey  who now works with us at Google  Mike made sure that Java s regular expressions passed virtually every one of the 30K Perl regular expression tests  and ran faster     The Google common-library Guava contains also a Splitter which is    simpler to use maintained by Google  and not by you    So it may worth being checked out  From their initial rough documentation   pdf       JDK has this    String   pieces    foo bar  split              It s fine to use this if you want exactly what it does    - regular expression   - result as an array   - its way of handling empty pieces      Mini-puzzler    a  b   split      returns       a       a        b       b  null   a   null   b   null  c   a   null   b   d   a    b   e  None of the above      Answer   e  None of the above      a  b   split      returns      a        b       Only trailing empties are skipped   Who knows the workaround to prevent the skipping  It s a fun one          In any case  our Splitter is simply more flexible  The default behavior is simplistic    Splitter on      split   foo   bar  quux    -- gt     foo         bar     quux            If you want extra features  ask for them    Splitter on       trimResults    omitEmptyStrings    split   foo   bar  quux    -- gt    foo    bar    quux        Order of config methods doesn t matter -- during splitting  trimming happens before checking for empties

User · Answer

I don t know Java too well  but if you can t find a Split method that does that  I suggest you just make your own   string   mySplit string s string delimiter        string   result   s Split delimiter       for int i 0 i lt result Length-1 i                  result i     delimiter    this one would add the delimiter to each items end except the last item                         you can modify it however you want         string   res   mySplit myString myDelimiter     Its not too elegant  but it ll do

[java] How to split a string, but also keep the delimiters?

Related questions

See also

Examples related to java

Examples related to regex