[c#] Best way to specify whitespace in a String.Split operation

I am splitting a string based on whitespace as follows:

string myStr = "The quick brown fox jumps over the lazy dog";

char[] whitespace = new char[] { ' ', '\t' };
string[] ssizes = myStr.Split(whitespace);

It's irksome to define the char[] array everywhere in my code I want to do this. Is there more efficent way that doesn't require the creation of the character array (which is prone to error if copied in different places)?

This question is related to c# string

The answer is


you can use

var FirstString = YourString.Split().First();

to split string .


According to the documentation :

If the separator parameter is null or contains no characters, white-space characters are assumed to be the delimiters. White-space characters are defined by the Unicode standard and return true if they are passed to the Char.IsWhiteSpace method.

So just call myStr.Split(); There's no need to pass in anything because separator is a params array.


Why dont you use?:

string[] ssizes = myStr.Split(' ', '\t');

Note that adjacent whitespace will NOT be treated as a single delimiter, even when using String.Split(null). If any of your tokens are separated with multiple spaces or tabs, you'll get empty strings returned in your array.

From the documentation:

Each element of separator defines a separate delimiter character. If two delimiters are adjacent, or a delimiter is found at the beginning or end of this instance, the corresponding array element contains Empty.


Why don't you just do this:

var ssizes = myStr.Split(" \t".ToCharArray());

It seems there is a method String.ToCharArray() in .NET 4.0!

EDIT: As VMAtm has pointed out, the method already existed in .NET 2.0!


So don't copy and paste! Extract a function to do your splitting and reuse it.

public static string[] SplitWhitespace (string input)
{
    char[] whitespace = new char[] { ' ', '\t' };
    return input.Split(whitespace);
}

Code reuse is your friend.


Yes, There is need for one more answer here!

All the solutions thus far address the rather limited domain of canonical input, to wit: a single whitespace character between elements (though tip of the hat to @cherno for at least mentioning the problem). But I submit that in all but the most obscure scenarios, splitting all of these should yield identical results:

string myStrA = "The quick brown fox jumps over the lazy dog";
string myStrB = "The  quick  brown  fox  jumps  over  the  lazy  dog";
string myStrC = "The quick brown fox      jumps over the lazy dog";
string myStrD = "   The quick brown fox jumps over the lazy dog";

String.Split (in any of the flavors shown throughout the other answers here) simply does not work well unless you attach the RemoveEmptyEntries option with either of these:

myStr.Split(new char[0], StringSplitOptions.RemoveEmptyEntries)
myStr.Split(new char[] {' ','\t'}, StringSplitOptions.RemoveEmptyEntries)

As the illustration reveals, omitting the option yields four different results (labeled A, B, C, and D) vs. the single result from all four inputs when you use RemoveEmptyEntries:

String.Split vs Regex.Split

Of course, if you don't like using options, just use the regex alternative :-)

Regex.Split(myStr, @"\s+").Where(s => s != string.Empty)

If repeating the same code is the issue, write an extension method on the String class that encapsulates the splitting logic.


You can just do:

string myStr = "The quick brown fox jumps over the lazy dog";
string[] ssizes = myStr.Split(' ');

MSDN has more examples and references:

http://msdn.microsoft.com/en-us/library/b873y76a.aspx


Can't you do it inline?

var sizes = subject.Split(new char[] { ' ', '\t' });

Otherwise, if you do this exact thing often, you could always create constant or something containing that char array.

As others have noted you can according to the documentation also use null or an empty array. When you do that it will use whitespace characters automatically.

var sizes = subject.Split(null);