[java] How to split a string between letters and digits (or between digits and letters)?

I'm trying to work out a way of splitting up a string in java that follows a pattern like so:

String a = "123abc345def";

The results from this should be the following:

x[0] = "123";
x[1] = "abc";
x[2] = "345";
x[3] = "def";

However I'm completely stumped as to how I can achieve this. Please can someone help me out? I have tried searching online for a similar problem, however it's very difficult to phrase it correctly in a search.

Please note: The number of letters & numbers may vary (e.g. There could be a string like so '1234a5bcdef')

This question is related to java regex string

The answer is


I was doing this sort of thing for mission critical code. Like every fraction of a second counts because I need to process 180k entries in an unnoticeable amount of time. So I skipped the regex and split altogether and allowed for inline processing of each element (though adding them to an ArrayList<String> would be fine). If you want to do this exact thing but need it to be something like 20x faster...

void parseGroups(String text) {
    int last = 0;
    int state = 0;
    for (int i = 0, s = text.length(); i < s; i++) {
        switch (text.charAt(i)) {
            case '0':
            case '1':
            case '2':
            case '3':
            case '4':
            case '5':
            case '6':
            case '7':
            case '8':
            case '9':
                if (state == 2) {
                    processElement(text.substring(last, i));
                    last = i;
                }
                state = 1;
                break;
            default:
                if (state == 1) {
                    processElement(text.substring(last, i));
                    last = i;
                }
                state = 2;
                break;
        }
    }
    processElement(text.substring(last));
}

You can try this:

Pattern p = Pattern.compile("[a-z]+|\\d+");
Matcher m = p.matcher("123abc345def");
ArrayList<String> allMatches = new ArrayList<>();
while (m.find()) {
    allMatches.add(m.group());
}

The result (allMatches) will be:

["123", "abc", "345", "def"]

Use two different patterns: [0-9]* and [a-zA-Z]* and split twice by each of them.


If you are looking for solution without using Java String functionality (i.e. split, match, etc.) then the following should help:

List<String> splitString(String string) {
        List<String> list = new ArrayList<String>();
        String token = "";
        char curr;
        for (int e = 0; e < string.length() + 1; e++) {
            if (e == 0)
                curr = string.charAt(0);
            else {
                curr = string.charAt(--e);
            }

            if (isNumber(curr)) {
                while (e < string.length() && isNumber(string.charAt(e))) {
                    token += string.charAt(e++);
                }
                list.add(token);
                token = "";
            } else {
                while (e < string.length() && !isNumber(string.charAt(e))) {
                    token += string.charAt(e++);
                }
                list.add(token);
                token = "";
            }

        }

        return list;
    }

boolean isNumber(char c) {
        return c >= '0' && c <= '9';
    }

This solution will split numbers and 'words', where 'words' are strings that don't contain numbers. However, if you like to have only 'words' containing English letters then you can easily modify it by adding more conditions (like isNumber method call) depending on your requirements (for example you may wish to skip words that contain non English letters). Also note that the splitString method returns ArrayList which later can be converted to String array.


Didn't use Java for ages, so just some pseudo code, that should help get you started (faster for me than looking up everything :) ).

 string a = "123abc345def";
 string[] result;
 while(a.Length > 0)
 {
      string part;
      if((part = a.Match(/\d+/)).Length) // match digits
           ;
      else if((part = a.Match(/\a+/)).Length) // match letters
           ;
      else
           break; // something invalid - neither digit nor letter
      result.append(part);
      a = a.SubStr(part.Length - 1); // remove the part we've found
 }

How about:

private List<String> Parse(String str) {
    List<String> output = new ArrayList<String>();
    Matcher match = Pattern.compile("[0-9]+|[a-z]+|[A-Z]+").matcher(str);
    while (match.find()) {
        output.add(match.group());
    }
    return output;
}

Wouldn't this "d+|D+" do the job instead of the cumbersome: "(?<=\\D)(?=\\d)|(?<=\\d)(?=\\D)" ?


Examples related to java

Under what circumstances can I call findViewById with an Options Menu / Action Bar item? How much should a function trust another function How to implement a simple scenario the OO way Two constructors How do I get some variable from another class in Java? this in equals method How to split a string in two and store it in a field How to do perspective fixing? String index out of range: 4 My eclipse won't open, i download the bundle pack it keeps saying error log

Examples related to regex

Why my regexp for hyphenated words doesn't work? grep's at sign caught as whitespace Preg_match backtrack error regex match any single character (one character only) re.sub erroring with "Expected string or bytes-like object" Only numbers. Input number in React Visual Studio Code Search and Replace with Regular Expressions Strip / trim all strings of a dataframe return string with first match Regex How to capture multiple repeated groups?

Examples related to string

How to split a string in two and store it in a field String method cannot be found in a main class method Kotlin - How to correctly concatenate a String Replacing a character from a certain index Remove quotes from String in Python Detect whether a Python string is a number or a letter How does String substring work in Swift How does String.Index work in Swift swift 3.0 Data to String? How to parse JSON string in Typescript