[.net] Best way to encode text data for XML

Depending on how much you know about the input, you may have to take into account that not all Unicode characters are valid XML characters.

Both Server.HtmlEncode and System.Security.SecurityElement.Escape seem to ignore illegal XML characters, while System.XML.XmlWriter.WriteString throws an ArgumentException when it encounters illegal characters (unless you disable that check in which case it ignores them). An overview of library functions is available here.

Edit 2011/8/14: seeing that at least a few people have consulted this answer in the last couple years, I decided to completely rewrite the original code, which had numerous issues, including horribly mishandling UTF-16.

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;

/// <summary>
/// Encodes data so that it can be safely embedded as text in XML documents.
/// </summary>
public class XmlTextEncoder : TextReader {
    public static string Encode(string s) {
        using (var stream = new StringReader(s))
        using (var encoder = new XmlTextEncoder(stream)) {
            return encoder.ReadToEnd();
        }
    }

    /// <param name="source">The data to be encoded in UTF-16 format.</param>
    /// <param name="filterIllegalChars">It is illegal to encode certain
    /// characters in XML. If true, silently omit these characters from the
    /// output; if false, throw an error when encountered.</param>
    public XmlTextEncoder(TextReader source, bool filterIllegalChars=true) {
        _source = source;
        _filterIllegalChars = filterIllegalChars;
    }

    readonly Queue<char> _buf = new Queue<char>();
    readonly bool _filterIllegalChars;
    readonly TextReader _source;

    public override int Peek() {
        PopulateBuffer();
        if (_buf.Count == 0) return -1;
        return _buf.Peek();
    }

    public override int Read() {
        PopulateBuffer();
        if (_buf.Count == 0) return -1;
        return _buf.Dequeue();
    }

    void PopulateBuffer() {
        const int endSentinel = -1;
        while (_buf.Count == 0 && _source.Peek() != endSentinel) {
            // Strings in .NET are assumed to be UTF-16 encoded [1].
            var c = (char) _source.Read();
            if (Entities.ContainsKey(c)) {
                // Encode all entities defined in the XML spec [2].
                foreach (var i in Entities[c]) _buf.Enqueue(i);
            } else if (!(0x0 <= c && c <= 0x8) &&
                       !new[] { 0xB, 0xC }.Contains(c) &&
                       !(0xE <= c && c <= 0x1F) &&
                       !(0x7F <= c && c <= 0x84) &&
                       !(0x86 <= c && c <= 0x9F) &&
                       !(0xD800 <= c && c <= 0xDFFF) &&
                       !new[] { 0xFFFE, 0xFFFF }.Contains(c)) {
                // Allow if the Unicode codepoint is legal in XML [3].
                _buf.Enqueue(c);
            } else if (char.IsHighSurrogate(c) &&
                       _source.Peek() != endSentinel &&
                       char.IsLowSurrogate((char) _source.Peek())) {
                // Allow well-formed surrogate pairs [1].
                _buf.Enqueue(c);
                _buf.Enqueue((char) _source.Read());
            } else if (!_filterIllegalChars) {
                // Note that we cannot encode illegal characters as entity
                // references due to the "Legal Character" constraint of
                // XML [4]. Nor are they allowed in CDATA sections [5].
                throw new ArgumentException(
                    String.Format("Illegal character: '{0:X}'", (int) c));
            }
        }
    }

    static readonly Dictionary<char,string> Entities =
        new Dictionary<char,string> {
            { '"', "&quot;" }, { '&', "&amp;"}, { '\'', "&apos;" },
            { '<', "&lt;" }, { '>', "&gt;" },
        };

    // References:
    // [1] http://en.wikipedia.org/wiki/UTF-16/UCS-2
    // [2] http://www.w3.org/TR/xml11/#sec-predefined-ent
    // [3] http://www.w3.org/TR/xml11/#charsets
    // [4] http://www.w3.org/TR/xml11/#sec-references
    // [5] http://www.w3.org/TR/xml11/#sec-cdata-sect
}

Unit tests and full code can be found here.

Examples related to .net

You must add a reference to assembly 'netstandard, Version=2.0.0.0 How to use Bootstrap 4 in ASP.NET Core No authenticationScheme was specified, and there was no DefaultChallengeScheme found with default authentification and custom authorization .net Core 2.0 - Package was restored using .NetFramework 4.6.1 instead of target framework .netCore 2.0. The package may not be fully compatible Update .NET web service to use TLS 1.2 EF Core add-migration Build Failed What is the difference between .NET Core and .NET Standard Class Library project types? Visual Studio 2017 - Could not load file or assembly 'System.Runtime, Version=4.1.0.0' or one of its dependencies Nuget connection attempt failed "Unable to load the service index for source" Token based authentication in Web API without any user interface

Examples related to xml

strange error in my Animation Drawable How do I POST XML data to a webservice with Postman? PHP XML Extension: Not installed How to add a Hint in spinner in XML Generating Request/Response XML from a WSDL Manifest Merger failed with multiple errors in Android Studio How to set menu to Toolbar in Android How to add colored border on cardview? Android: ScrollView vs NestedScrollView WARNING: Exception encountered during context initialization - cancelling refresh attempt

Examples related to encoding

How to check encoding of a CSV file UnicodeEncodeError: 'ascii' codec can't encode character at special name Using Javascript's atob to decode base64 doesn't properly decode utf-8 strings What is the difference between utf8mb4 and utf8 charsets in MySQL? The character encoding of the plain text document was not declared - mootool script UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 23: ordinal not in range(128) How to encode text to base64 in python UTF-8 output from PowerShell Set Encoding of File to UTF8 With BOM in Sublime Text 3 Replace non-ASCII characters with a single space

Examples related to .net-2.0

"This assembly is built by a runtime newer than the currently loaded runtime and cannot be loaded" Debugging doesn't start How to show text in combobox when no item selected? Compression/Decompression string with C# Best way to Bulk Insert from a C# DataTable Get domain name How to open a new form from another form Maximize a window programmatically and prevent the user from changing the windows state How should you diagnose the error SEHException - External component has thrown an exception Editing dictionary values in a foreach loop