[c#] Regular expression replace in C#

I'm fairly new to using regular expressions, and, based on a few tutorials I've read, I'm unable to get this step in my Regex.Replace formatted properly.

Here's the scenario I'm working on... When I pull my data from the listbox, I want to format it into a CSV like format, and then save the file. Is using the Replace option an ideal solution for this scenario?

Before the regular expression formatting example.

FirstName LastName Salary    Position
-------------------------------------
John      Smith    $100,000.00  M

Proposed format after regular expression replace

John Smith,100000,M

Current formatting status output:

John,Smith,100000,M

*Note - is there a way I can replace the first comma with a whitespace?

Snippet of my code

using(var fs = new FileStream(filepath, FileMode.OpenOrCreate, FileAccess.Write))
{
    using(var sw = new StreamWriter(fs))
    {
        foreach (string stw in listBox1.Items)
        {
            StringBuilder sb = new StringBuilder();
            sb.AppendLine(stw);

            //Piecing the list back to the original format
            sb_trim = Regex.Replace(stw, @"[$,]", "");
            sb_trim = Regex.Replace(sb_trim, @"[.][0-9]+", "");
            sb_trim = Regex.Replace(sb_trim, @"\s", ",");
            sw.WriteLine(sb_trim);
        }
    }
}

This question is related to c# regex

The answer is


Try this::

sb_trim = Regex.Replace(stw, @"(\D+)\s+\$([\d,]+)\.\d+\s+(.)",
    m => string.Format(
        "{0},{1},{2}",
        m.Groups[1].Value,
        m.Groups[2].Value.Replace(",", string.Empty),
        m.Groups[3].Value));

This is about as clean an answer as you'll get, at least with regexes.

  • (\D+): First capture group. One or more non-digit characters.
  • \s+\$: One or more spacing characters, then a literal dollar sign ($).
  • ([\d,]+): Second capture group. One or more digits and/or commas.
  • \.\d+: Decimal point, then at least one digit.
  • \s+: One or more spacing characters.
  • (.): Third capture group. Any non-line-breaking character.

The second capture group additionally needs to have its commas stripped. You could do this with another regex, but it's really unnecessary and bad for performance. This is why we need to use a lambda expression and string format to piece together the replacement. If it weren't for that, we could just use this as the replacement, in place of the lambda expression:

"$1,$2,$3"

Add the following 2 lines

var regex = new Regex(Regex.Escape(","));
sb_trim = regex.Replace(sb_trim, " ", 1);

If sb_trim= John,Smith,100000,M the above code will return "John Smith,100000,M"