What regex pattern would need I to pass to java.lang.String.split()
to split a String into an Array of substrings using all whitespace characters (' '
, '\t'
, '\n'
, etc.) as delimiters?
This question is related to
java
string
whitespace
split
String str = "Hello World";
String res[] = str.split("\\s+");
Apache Commons Lang has a method to split a string with whitespace characters as delimiters:
StringUtils.split("abc def")
This might be easier to use than a regex pattern.
To get this working in Javascript, I had to do the following:
myString.split(/\s+/g)
"\\s+" should do the trick
Apache Commons Lang has a method to split a string with whitespace characters as delimiters:
StringUtils.split("abc def")
This might be easier to use than a regex pattern.
String string = "Ram is going to school";
String[] arrayOfString = string.split("\\s+");
you can split a string by line break by using the following statement :
String textStr[] = yourString.split("\\r?\\n");
you can split a string by Whitespace by using the following statement :
String textStr[] = yourString.split("\\s+");
Study this code.. good luck
import java.util.*;
class Demo{
public static void main(String args[]){
Scanner input = new Scanner(System.in);
System.out.print("Input String : ");
String s1 = input.nextLine();
String[] tokens = s1.split("[\\s\\xA0]+");
System.out.println(tokens.length);
for(String s : tokens){
System.out.println(s);
}
}
}
To split a string with any Unicode whitespace, you need to use
s.split("(?U)\\s+")
^^^^
The (?U)
inline embedded flag option is the equivalent of Pattern.UNICODE_CHARACTER_CLASS
that enables \s
shorthand character class to match any characters from the whitespace Unicode category.
If you want to split with whitespace and keep the whitespaces in the resulting array, use
s.split("(?U)(?<=\\s)(?=\\S)|(?<=\\S)(?=\\s)")
See the regex demo. See Java demo:
String s = "Hello\t World\u00A0»";
System.out.println(Arrays.toString(s.split("(?U)\\s+"))); // => [Hello, World, »]
System.out.println(Arrays.toString(s.split("(?U)(?<=\\s)(?=\\S)|(?<=\\S)(?=\\s)")));
// => [Hello, , World, , »]
In most regex dialects there are a set of convenient character summaries you can use for this kind of thing - these are good ones to remember:
\w
- Matches any word character.
\W
- Matches any nonword character.
\s
- Matches any white-space character.
\S
- Matches anything but white-space characters.
\d
- Matches any digit.
\D
- Matches anything except digits.
A search for "Regex Cheatsheets" should reward you with a whole lot of useful summaries.
String string = "Ram is going to school";
String[] arrayOfString = string.split("\\s+");
Also you may have a UniCode non-breaking space xA0...
String[] elements = s.split("[\\s\\xA0]+"); //include uniCode non-breaking
"\\s+" should do the trick
Study this code.. good luck
import java.util.*;
class Demo{
public static void main(String args[]){
Scanner input = new Scanner(System.in);
System.out.print("Input String : ");
String s1 = input.nextLine();
String[] tokens = s1.split("[\\s\\xA0]+");
System.out.println(tokens.length);
for(String s : tokens){
System.out.println(s);
}
}
}
"\\s+" should do the trick
In most regex dialects there are a set of convenient character summaries you can use for this kind of thing - these are good ones to remember:
\w
- Matches any word character.
\W
- Matches any nonword character.
\s
- Matches any white-space character.
\S
- Matches anything but white-space characters.
\d
- Matches any digit.
\D
- Matches anything except digits.
A search for "Regex Cheatsheets" should reward you with a whole lot of useful summaries.
Also you may have a UniCode non-breaking space xA0...
String[] elements = s.split("[\\s\\xA0]+"); //include uniCode non-breaking
Since it is a regular expression, and i'm assuming u would also not want non-alphanumeric chars like commas, dots, etc that could be surrounded by blanks (e.g. "one , two" should give [one][two]), it should be:
myString.split(/[\s\W]+/)
In most regex dialects there are a set of convenient character summaries you can use for this kind of thing - these are good ones to remember:
\w
- Matches any word character.
\W
- Matches any nonword character.
\s
- Matches any white-space character.
\S
- Matches anything but white-space characters.
\d
- Matches any digit.
\D
- Matches anything except digits.
A search for "Regex Cheatsheets" should reward you with a whole lot of useful summaries.
To split a string with any Unicode whitespace, you need to use
s.split("(?U)\\s+")
^^^^
The (?U)
inline embedded flag option is the equivalent of Pattern.UNICODE_CHARACTER_CLASS
that enables \s
shorthand character class to match any characters from the whitespace Unicode category.
If you want to split with whitespace and keep the whitespaces in the resulting array, use
s.split("(?U)(?<=\\s)(?=\\S)|(?<=\\S)(?=\\s)")
See the regex demo. See Java demo:
String s = "Hello\t World\u00A0»";
System.out.println(Arrays.toString(s.split("(?U)\\s+"))); // => [Hello, World, »]
System.out.println(Arrays.toString(s.split("(?U)(?<=\\s)(?=\\S)|(?<=\\S)(?=\\s)")));
// => [Hello, , World, , »]
To get this working in Javascript, I had to do the following:
myString.split(/\s+/g)
Since it is a regular expression, and i'm assuming u would also not want non-alphanumeric chars like commas, dots, etc that could be surrounded by blanks (e.g. "one , two" should give [one][two]), it should be:
myString.split(/[\s\W]+/)
All you need is to split using the one of the special character of Java Ragex Engine,
and that is- WhiteSpace Character
[0-9]
[^0-9]
[ \t\n\x0B\f\r]
[^\s]
[\n\x0B\f\r\x85\u2028\u2029]
[^\v]
[a-zA-Z_0-9]
[^\w]
Here, the key point to remember is that the small leter character \s
represents all types of white spaces including a single space [ ]
, tab characters [ ]
or anything similar.
So, if you'll try will something like this-
String theString = "Java<a space><a tab>Programming"
String []allParts = theString.split("\\s+");
You will get the desired output.
Some Very Useful Links:
Hope, this might help you the best!!!
"\\s+" should do the trick
String str = "Hello World";
String res[] = str.split("\\s+");
you can split a string by line break by using the following statement :
String textStr[] = yourString.split("\\r?\\n");
you can split a string by Whitespace by using the following statement :
String textStr[] = yourString.split("\\s+");
Source: Stackoverflow.com