[java] How to split a string with any whitespace chars as delimiters

What regex pattern would need I to pass to java.lang.String.split() to split a String into an Array of substrings using all whitespace characters (' ', '\t', '\n', etc.) as delimiters?

This question is related to java string whitespace split

The answer is


String str = "Hello   World";
String res[] = str.split("\\s+");

Apache Commons Lang has a method to split a string with whitespace characters as delimiters:

StringUtils.split("abc def")

http://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html#split(java.lang.String)

This might be easier to use than a regex pattern.


To get this working in Javascript, I had to do the following:

myString.split(/\s+/g)

"\\s+" should do the trick


Apache Commons Lang has a method to split a string with whitespace characters as delimiters:

StringUtils.split("abc def")

http://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html#split(java.lang.String)

This might be easier to use than a regex pattern.


String string = "Ram is going to school";
String[] arrayOfString = string.split("\\s+");

you can split a string by line break by using the following statement :

 String textStr[] = yourString.split("\\r?\\n");

you can split a string by Whitespace by using the following statement :

String textStr[] = yourString.split("\\s+");

Study this code.. good luck

    import java.util.*;
class Demo{
    public static void main(String args[]){
        Scanner input = new Scanner(System.in);
        System.out.print("Input String : ");
        String s1 = input.nextLine();   
        String[] tokens = s1.split("[\\s\\xA0]+");      
        System.out.println(tokens.length);      
        for(String s : tokens){
            System.out.println(s);

        } 
    }
}

To split a string with any Unicode whitespace, you need to use

s.split("(?U)\\s+")
         ^^^^

The (?U) inline embedded flag option is the equivalent of Pattern.UNICODE_CHARACTER_CLASS that enables \s shorthand character class to match any characters from the whitespace Unicode category.

If you want to split with whitespace and keep the whitespaces in the resulting array, use

s.split("(?U)(?<=\\s)(?=\\S)|(?<=\\S)(?=\\s)")

See the regex demo. See Java demo:

String s = "Hello\t World\u00A0»";
System.out.println(Arrays.toString(s.split("(?U)\\s+"))); // => [Hello, World, »]
System.out.println(Arrays.toString(s.split("(?U)(?<=\\s)(?=\\S)|(?<=\\S)(?=\\s)")));
// => [Hello,    , World,  , »]

In most regex dialects there are a set of convenient character summaries you can use for this kind of thing - these are good ones to remember:

\w - Matches any word character.

\W - Matches any nonword character.

\s - Matches any white-space character.

\S - Matches anything but white-space characters.

\d - Matches any digit.

\D - Matches anything except digits.

A search for "Regex Cheatsheets" should reward you with a whole lot of useful summaries.


String string = "Ram is going to school";
String[] arrayOfString = string.split("\\s+");

Also you may have a UniCode non-breaking space xA0...

String[] elements = s.split("[\\s\\xA0]+"); //include uniCode non-breaking

"\\s+" should do the trick


Study this code.. good luck

    import java.util.*;
class Demo{
    public static void main(String args[]){
        Scanner input = new Scanner(System.in);
        System.out.print("Input String : ");
        String s1 = input.nextLine();   
        String[] tokens = s1.split("[\\s\\xA0]+");      
        System.out.println(tokens.length);      
        for(String s : tokens){
            System.out.println(s);

        } 
    }
}

"\\s+" should do the trick


In most regex dialects there are a set of convenient character summaries you can use for this kind of thing - these are good ones to remember:

\w - Matches any word character.

\W - Matches any nonword character.

\s - Matches any white-space character.

\S - Matches anything but white-space characters.

\d - Matches any digit.

\D - Matches anything except digits.

A search for "Regex Cheatsheets" should reward you with a whole lot of useful summaries.


Also you may have a UniCode non-breaking space xA0...

String[] elements = s.split("[\\s\\xA0]+"); //include uniCode non-breaking

Since it is a regular expression, and i'm assuming u would also not want non-alphanumeric chars like commas, dots, etc that could be surrounded by blanks (e.g. "one , two" should give [one][two]), it should be:

myString.split(/[\s\W]+/)

In most regex dialects there are a set of convenient character summaries you can use for this kind of thing - these are good ones to remember:

\w - Matches any word character.

\W - Matches any nonword character.

\s - Matches any white-space character.

\S - Matches anything but white-space characters.

\d - Matches any digit.

\D - Matches anything except digits.

A search for "Regex Cheatsheets" should reward you with a whole lot of useful summaries.


To split a string with any Unicode whitespace, you need to use

s.split("(?U)\\s+")
         ^^^^

The (?U) inline embedded flag option is the equivalent of Pattern.UNICODE_CHARACTER_CLASS that enables \s shorthand character class to match any characters from the whitespace Unicode category.

If you want to split with whitespace and keep the whitespaces in the resulting array, use

s.split("(?U)(?<=\\s)(?=\\S)|(?<=\\S)(?=\\s)")

See the regex demo. See Java demo:

String s = "Hello\t World\u00A0»";
System.out.println(Arrays.toString(s.split("(?U)\\s+"))); // => [Hello, World, »]
System.out.println(Arrays.toString(s.split("(?U)(?<=\\s)(?=\\S)|(?<=\\S)(?=\\s)")));
// => [Hello,    , World,  , »]

To get this working in Javascript, I had to do the following:

myString.split(/\s+/g)

Since it is a regular expression, and i'm assuming u would also not want non-alphanumeric chars like commas, dots, etc that could be surrounded by blanks (e.g. "one , two" should give [one][two]), it should be:

myString.split(/[\s\W]+/)

All you need is to split using the one of the special character of Java Ragex Engine,

and that is- WhiteSpace Character

  • \d Represents a digit: [0-9]
  • \D Represents a non-digit: [^0-9]
  • \s Represents a whitespace character including [ \t\n\x0B\f\r]
  • \S Represents a non-whitespace character as [^\s]
  • \v Represents a vertical whitespace character as [\n\x0B\f\r\x85\u2028\u2029]
  • \V Represents a non-vertical whitespace character as [^\v]
  • \w Represents a word character as [a-zA-Z_0-9]
  • \W Represents a non-word character as [^\w]

Here, the key point to remember is that the small leter character \s represents all types of white spaces including a single space [ ] , tab characters [ ] or anything similar.

So, if you'll try will something like this-

String theString = "Java<a space><a tab>Programming"
String []allParts = theString.split("\\s+");

You will get the desired output.


Some Very Useful Links:


Hope, this might help you the best!!!


"\\s+" should do the trick


String str = "Hello   World";
String res[] = str.split("\\s+");

you can split a string by line break by using the following statement :

 String textStr[] = yourString.split("\\r?\\n");

you can split a string by Whitespace by using the following statement :

String textStr[] = yourString.split("\\s+");

Examples related to java

Under what circumstances can I call findViewById with an Options Menu / Action Bar item? How much should a function trust another function How to implement a simple scenario the OO way Two constructors How do I get some variable from another class in Java? this in equals method How to split a string in two and store it in a field How to do perspective fixing? String index out of range: 4 My eclipse won't open, i download the bundle pack it keeps saying error log

Examples related to string

How to split a string in two and store it in a field String method cannot be found in a main class method Kotlin - How to correctly concatenate a String Replacing a character from a certain index Remove quotes from String in Python Detect whether a Python string is a number or a letter How does String substring work in Swift How does String.Index work in Swift swift 3.0 Data to String? How to parse JSON string in Typescript

Examples related to whitespace

How to create string with multiple spaces in JavaScript git: fatal: I don't handle protocol '??http' Show whitespace characters in Visual Studio Code What is the symbol for whitespace in C? How to print variables without spaces between values Trim whitespace from a String How do I escape spaces in path for scp copy in Linux? Avoid line break between html elements Remove "whitespace" between div element How to remove all white spaces in java

Examples related to split

Parameter "stratify" from method "train_test_split" (scikit Learn) Pandas split DataFrame by column value How to split large text file in windows? Attribute Error: 'list' object has no attribute 'split' Split function in oracle to comma separated values with automatic sequence How would I get everything before a : in a string Python Split String by delimiter position using oracle SQL JavaScript split String with white space Split a String into an array in Swift? Split pandas dataframe in two if it has more than 10 rows