[c#] Does C# have a String Tokenizer like Java's?

I'm doing simple string input parsing and I am in need of a string tokenizer. I am new to C# but have programmed Java, and it seems natural that C# should have a string tokenizer. Does it? Where is it? How do I use it?

This question is related to c# string parsing

The answer is


read this, split function has an overload takes an array consist of seperators http://msdn.microsoft.com/en-us/library/system.stringsplitoptions.aspx


If you are using C# 3.5 you could write an extension method to System.String that does the splitting you need. You then can then use syntax:

string.SplitByMyTokens();

More info and a useful example from MS here http://msdn.microsoft.com/en-us/library/bb383977.aspx


The split method of a string is what you need. In fact the tokenizer class in Java is deprecated in favor of Java's string split method.


If you're trying to do something like splitting command line arguments in a .NET Console app, you're going to have issues because .NET is either broken or is trying to be clever (which means it's as good as broken). I needed to be able to split arguments by the space character, preserving any literals that were quoted so they didn't get split in the middle. This is the code I wrote to do the job:

private static List<String> Tokenise(string value, char seperator)
{
    List<string> result = new List<string>();
    value = value.Replace("  ", " ").Replace("  ", " ").Trim();
    StringBuilder sb = new StringBuilder();
    bool insideQuote = false;
    foreach(char c in value.ToCharArray())
    {
        if(c == '"')
        {
            insideQuote = !insideQuote;
        }
        if((c == seperator) && !insideQuote)
        {
            if (sb.ToString().Trim().Length > 0)
            {
                result.Add(sb.ToString().Trim());
                sb.Clear();
            }
        }
        else
        {
            sb.Append(c);
        }
    }
    if (sb.ToString().Trim().Length > 0)
    {
        result.Add(sb.ToString().Trim());
    }

    return result;
}

The split method of a string is what you need. In fact the tokenizer class in Java is deprecated in favor of Java's string split method.


I just want to highlight the power of C#'s Split method and give a more detailed comparison, particularly from someone who comes from a Java background.

Whereas StringTokenizer in Java only allows a single delimiter, we can actually split on multiple delimiters making regular expressions less necessary (although if one needs regex, use regex by all means!) Take for example this:

str.Split(new char[] { ' ', '.', '?' })

This splits on three different delimiters returning an array of tokens. We can also remove empty arrays with what would be a second parameter for the above example:

str.Split(new char[] { ' ', '.', '?' }, StringSplitOptions.RemoveEmptyEntries)

One thing Java's String tokenizer does have that I believe C# is lacking (at least Java 7 has this feature) is the ability to keep the delimiter(s) as tokens. C#'s Split will discard the tokens. This could be important in say some NLP applications, but for more general purpose applications this might not be a problem.


For complex splitting you could use a regex creating a match collection.


I just want to highlight the power of C#'s Split method and give a more detailed comparison, particularly from someone who comes from a Java background.

Whereas StringTokenizer in Java only allows a single delimiter, we can actually split on multiple delimiters making regular expressions less necessary (although if one needs regex, use regex by all means!) Take for example this:

str.Split(new char[] { ' ', '.', '?' })

This splits on three different delimiters returning an array of tokens. We can also remove empty arrays with what would be a second parameter for the above example:

str.Split(new char[] { ' ', '.', '?' }, StringSplitOptions.RemoveEmptyEntries)

One thing Java's String tokenizer does have that I believe C# is lacking (at least Java 7 has this feature) is the ability to keep the delimiter(s) as tokens. C#'s Split will discard the tokens. This could be important in say some NLP applications, but for more general purpose applications this might not be a problem.


If you are using C# 3.5 you could write an extension method to System.String that does the splitting you need. You then can then use syntax:

string.SplitByMyTokens();

More info and a useful example from MS here http://msdn.microsoft.com/en-us/library/bb383977.aspx


The split method of a string is what you need. In fact the tokenizer class in Java is deprecated in favor of Java's string split method.


I think the nearest in the .NET Framework is

string.Split()

If you're trying to do something like splitting command line arguments in a .NET Console app, you're going to have issues because .NET is either broken or is trying to be clever (which means it's as good as broken). I needed to be able to split arguments by the space character, preserving any literals that were quoted so they didn't get split in the middle. This is the code I wrote to do the job:

private static List<String> Tokenise(string value, char seperator)
{
    List<string> result = new List<string>();
    value = value.Replace("  ", " ").Replace("  ", " ").Trim();
    StringBuilder sb = new StringBuilder();
    bool insideQuote = false;
    foreach(char c in value.ToCharArray())
    {
        if(c == '"')
        {
            insideQuote = !insideQuote;
        }
        if((c == seperator) && !insideQuote)
        {
            if (sb.ToString().Trim().Length > 0)
            {
                result.Add(sb.ToString().Trim());
                sb.Clear();
            }
        }
        else
        {
            sb.Append(c);
        }
    }
    if (sb.ToString().Trim().Length > 0)
    {
        result.Add(sb.ToString().Trim());
    }

    return result;
}

For complex splitting you could use a regex creating a match collection.


I think the nearest in the .NET Framework is

string.Split()

If you are using C# 3.5 you could write an extension method to System.String that does the splitting you need. You then can then use syntax:

string.SplitByMyTokens();

More info and a useful example from MS here http://msdn.microsoft.com/en-us/library/bb383977.aspx


For complex splitting you could use a regex creating a match collection.


I think the nearest in the .NET Framework is

string.Split()

For complex splitting you could use a regex creating a match collection.


read this, split function has an overload takes an array consist of seperators http://msdn.microsoft.com/en-us/library/system.stringsplitoptions.aspx


The similar to Java's method is:

Regex.Split(string, pattern);

where

  • string - the text you need to split
  • pattern - string type pattern, what is splitting the text

I think the nearest in the .NET Framework is

string.Split()

_words = new List<string>(YourText.ToLower().Trim('\n', '\r').Split(' ').
            Select(x => new string(x.Where(Char.IsLetter).ToArray()))); 

Or

_words = new List<string>(YourText.Trim('\n', '\r').Split(' ').
            Select(x => new string(x.Where(Char.IsLetterOrDigit).ToArray()))); 

use Regex.Split(string,"#|#");


The split method of a string is what you need. In fact the tokenizer class in Java is deprecated in favor of Java's string split method.


The similar to Java's method is:

Regex.Split(string, pattern);

where

  • string - the text you need to split
  • pattern - string type pattern, what is splitting the text

If you are using C# 3.5 you could write an extension method to System.String that does the splitting you need. You then can then use syntax:

string.SplitByMyTokens();

More info and a useful example from MS here http://msdn.microsoft.com/en-us/library/bb383977.aspx


use Regex.Split(string,"#|#");


_words = new List<string>(YourText.ToLower().Trim('\n', '\r').Split(' ').
            Select(x => new string(x.Where(Char.IsLetter).ToArray()))); 

Or

_words = new List<string>(YourText.Trim('\n', '\r').Split(' ').
            Select(x => new string(x.Where(Char.IsLetterOrDigit).ToArray()))); 

Examples related to c#

How can I convert this one line of ActionScript to C#? Microsoft Advertising SDK doesn't deliverer ads How to use a global array in C#? How to correctly write async method? C# - insert values from file into two arrays Uploading into folder in FTP? Are these methods thread safe? dotnet ef not found in .NET Core 3 HTTP Error 500.30 - ANCM In-Process Start Failure Best way to "push" into C# array

Examples related to string

How to split a string in two and store it in a field String method cannot be found in a main class method Kotlin - How to correctly concatenate a String Replacing a character from a certain index Remove quotes from String in Python Detect whether a Python string is a number or a letter How does String substring work in Swift How does String.Index work in Swift swift 3.0 Data to String? How to parse JSON string in Typescript

Examples related to parsing

Got a NumberFormatException while trying to parse a text file for objects Uncaught SyntaxError: Unexpected end of JSON input at JSON.parse (<anonymous>) Python/Json:Expecting property name enclosed in double quotes Correctly Parsing JSON in Swift 3 How to get response as String using retrofit without using GSON or any other library in android UIButton action in table view cell "Expected BEGIN_OBJECT but was STRING at line 1 column 1" How to convert an XML file to nice pandas dataframe? How to extract multiple JSON objects from one file? How to sum digits of an integer in java?