[c#] Escape invalid XML characters in C#

I have a string that contains invalid XML characters. How can I escape (or remove) invalid XML characters before I parse the string?

This question is related to c# .net xml escaping

The answer is


// Replace invalid characters with empty strings.
   Regex.Replace(inputString, @"[^\w\.@-]", ""); 

The regular expression pattern [^\w.@-] matches any character that is not a word character, a period, an @ symbol, or a hyphen. A word character is any letter, decimal digit, or punctuation connector such as an underscore. Any character that matches this pattern is replaced by String.Empty, which is the string defined by the replacement pattern. To allow additional characters in user input, add those characters to the character class in the regular expression pattern. For example, the regular expression pattern [^\w.@-\%] also allows a percentage symbol and a backslash in an input string.

Regex.Replace(inputString, @"[!@#$%_]", "");

Refer this too :

Removing Invalid Characters from XML Name Tag - RegEx C#

Here is a function to remove the characters from a specified XML string:

using System;
using System.IO;
using System.Text;
using System.Text.RegularExpressions;

namespace XMLUtils
{
    class Standards
    {
        /// <summary>
        /// Strips non-printable ascii characters 
        /// Refer to http://www.w3.org/TR/xml11/#charsets for XML 1.1
        /// Refer to http://www.w3.org/TR/2006/REC-xml-20060816/#charsets for XML 1.0
        /// </summary>
        /// <param name="content">contents</param>
        /// <param name="XMLVersion">XML Specification to use. Can be 1.0 or 1.1</param>
        private void StripIllegalXMLChars(string tmpContents, string XMLVersion)
        {    
            string pattern = String.Empty;
            switch (XMLVersion)
            {
                case "1.0":
                    pattern = @"#x((10?|[2-F])FFF[EF]|FDD[0-9A-F]|7F|8[0-46-9A-F]9[0-9A-F])";
                    break;
                case "1.1":
                    pattern = @"#x((10?|[2-F])FFF[EF]|FDD[0-9A-F]|[19][0-9A-F]|7F|8[0-46-9A-F]|0?[1-8BCEF])";
                    break;
                default:
                    throw new Exception("Error: Invalid XML Version!");
            }

            Regex regex = new Regex(pattern, RegexOptions.IgnoreCase);
            if (regex.IsMatch(tmpContents))
            {
                tmpContents = regex.Replace(tmpContents, String.Empty);
            }
            tmpContents = string.Empty;
        }
    }
}

string XMLWriteStringWithoutIllegalCharacters(string UnfilteredString)
{
    if (UnfilteredString == null)
        return string.Empty;

    return XmlConvert.EncodeName(UnfilteredString);
}

string XMLReadStringWithoutIllegalCharacters(string FilteredString)
{
    if (UnfilteredString == null)
        return string.Empty;

    return XmlConvert.DecodeName(UnfilteredString);
}

This simple method replace the invalid characters with the same value but accepted in the XML context.


To write string use XMLWriteStringWithoutIllegalCharacters(string UnfilteredString).
To read string use XMLReadStringWithoutIllegalCharacters(string FilteredString).


If you are writing xml, just use the classes provided by the framework to create the xml. You won't have to bother with escaping or anything.

Console.Write(new XElement("Data", "< > &"));

Will output

<Data>&lt; &gt; &amp;</Data>

If you need to read an XML file that is malformed, do not use regular expression. Instead, use the Html Agility Pack.


The RemoveInvalidXmlChars method provided by Irishman does not support surrogate characters. To test it, use the following example:

static void Main()
{
    const string content = "\v\U00010330";

    string newContent = RemoveInvalidXmlChars(content);

    Console.WriteLine(newContent);
}

This returns an empty string but it shouldn't! It should return "\U00010330" because the character U+10330 is a valid XML character.

To support surrogate characters, I suggest using the following method:

public static string RemoveInvalidXmlChars(string text)
{
    if (string.IsNullOrEmpty(text))
        return text;

    int length = text.Length;
    StringBuilder stringBuilder = new StringBuilder(length);

    for (int i = 0; i < length; ++i)
    {
        if (XmlConvert.IsXmlChar(text[i]))
        {
            stringBuilder.Append(text[i]);
        }
        else if (i + 1 < length && XmlConvert.IsXmlSurrogatePair(text[i + 1], text[i]))
        {
            stringBuilder.Append(text[i]);
            stringBuilder.Append(text[i + 1]);
            ++i;
        }
    }

    return stringBuilder.ToString();
}

Here is an optimized version of the above method RemoveInvalidXmlChars which doesn't create a new array on every call, thus stressing the GC unnecessarily:

public static string RemoveInvalidXmlChars(string text)
{
    if (text == null)
        return text;
    if (text.Length == 0)
        return text;

    // a bit complicated, but avoids memory usage if not necessary
    StringBuilder result = null;
    for (int i = 0; i < text.Length; i++)
    {
        var ch = text[i];
        if (XmlConvert.IsXmlChar(ch))
        {
            result?.Append(ch);
        }
        else if (result == null)
        {
            result = new StringBuilder();
            result.Append(text.Substring(0, i));
        }
    }

    if (result == null)
        return text; // no invalid xml chars detected - return original text
    else
        return result.ToString();

}

Use SecurityElement.Escape

using System;
using System.Security;

class Sample {
  static void Main() {
    string text = "Escape characters : < > & \" \'";
    string xmlText = SecurityElement.Escape(text);
//output:
//Escape characters : &lt; &gt; &amp; &quot; &apos;
    Console.WriteLine(xmlText);
  }
}

Examples related to c#

How can I convert this one line of ActionScript to C#? Microsoft Advertising SDK doesn't deliverer ads How to use a global array in C#? How to correctly write async method? C# - insert values from file into two arrays Uploading into folder in FTP? Are these methods thread safe? dotnet ef not found in .NET Core 3 HTTP Error 500.30 - ANCM In-Process Start Failure Best way to "push" into C# array

Examples related to .net

You must add a reference to assembly 'netstandard, Version=2.0.0.0 How to use Bootstrap 4 in ASP.NET Core No authenticationScheme was specified, and there was no DefaultChallengeScheme found with default authentification and custom authorization .net Core 2.0 - Package was restored using .NetFramework 4.6.1 instead of target framework .netCore 2.0. The package may not be fully compatible Update .NET web service to use TLS 1.2 EF Core add-migration Build Failed What is the difference between .NET Core and .NET Standard Class Library project types? Visual Studio 2017 - Could not load file or assembly 'System.Runtime, Version=4.1.0.0' or one of its dependencies Nuget connection attempt failed "Unable to load the service index for source" Token based authentication in Web API without any user interface

Examples related to xml

strange error in my Animation Drawable How do I POST XML data to a webservice with Postman? PHP XML Extension: Not installed How to add a Hint in spinner in XML Generating Request/Response XML from a WSDL Manifest Merger failed with multiple errors in Android Studio How to set menu to Toolbar in Android How to add colored border on cardview? Android: ScrollView vs NestedScrollView WARNING: Exception encountered during context initialization - cancelling refresh attempt

Examples related to escaping

Uses for the '&quot;' entity in HTML Javascript - How to show escape characters in a string? How to print a single backslash? How to escape special characters of a string with single backslashes Saving utf-8 texts with json.dumps as UTF8, not as \u escape sequence Properly escape a double quote in CSV How to Git stash pop specific stash in 1.8.3? In Java, should I escape a single quotation mark (') in String (double quoted)? How do I escape a single quote ( ' ) in JavaScript? Which characters need to be escaped when using Bash?