[c#] How to convert (transliterate) a string from utf8 to ASCII (single byte) in c#?

I have a string object

"with multiple characters and even special characters"

I am trying to use

UTF8Encoding utf8 = new UTF8Encoding();
ASCIIEncoding ascii = new ASCIIEncoding();

objects in order to convert that string to ascii. May I ask someone to bring some light to this simple task, that is hunting my afternoon.

EDIT 1: What we are trying to accomplish is getting rid of special characters like some of the special windows apostrophes. The code that I posted below as an answer will not take care of that. Basically

O'Brian will become O?Brian. where ' is one of the special apostrophes

This question is related to c# encoding utf-8 ascii transliteration

The answer is


If you want 8 bit representation of characters that used in many encoding, this may help you.

You must change variable targetEncoding to whatever encoding you want.

Encoding targetEncoding = Encoding.GetEncoding(874); // Your target encoding
Encoding utf8 = Encoding.UTF8;

var stringBytes = utf8.GetBytes(Name);
var stringTargetBytes = Encoding.Convert(utf8, targetEncoding, stringBytes);
var ascii8BitRepresentAsCsString = Encoding.GetEncoding("Latin1").GetString(stringTargetBytes);

Based on Mark's answer above (and Geo's comment), I created a two liner version to remove all ASCII exception cases from a string. Provided for people searching for this answer (as I did).

using System.Text;

// Create encoder with a replacing encoder fallback
var encoder = ASCIIEncoding.GetEncoding("us-ascii", 
    new EncoderReplacementFallback(string.Empty), 
    new DecoderExceptionFallback());

string cleanString = encoder.GetString(encoder.GetBytes(dirtyString)); 

I was able to figure it out. In case someone wants to know below the code that worked for me:

ASCIIEncoding ascii = new ASCIIEncoding();
byte[] byteArray = Encoding.UTF8.GetBytes(sOriginal);
byte[] asciiArray = Encoding.Convert(Encoding.UTF8, Encoding.ASCII, byteArray);
string finalString = ascii.GetString(asciiArray);

Let me know if there is a simpler way o doing it.


For anyone who likes Extension methods, this one does the trick for us.

using System.Text;

namespace System
{
    public static class StringExtension
    {
        private static readonly ASCIIEncoding asciiEncoding = new ASCIIEncoding();

        public static string ToAscii(this string dirty)
        {
            byte[] bytes = asciiEncoding.GetBytes(dirty);
            string clean = asciiEncoding.GetString(bytes);
            return clean;
        }
    }
}

(System namespace so it's available pretty much automatically for all of our strings.)


I was able to figure it out. In case someone wants to know below the code that worked for me:

ASCIIEncoding ascii = new ASCIIEncoding();
byte[] byteArray = Encoding.UTF8.GetBytes(sOriginal);
byte[] asciiArray = Encoding.Convert(Encoding.UTF8, Encoding.ASCII, byteArray);
string finalString = ascii.GetString(asciiArray);

Let me know if there is a simpler way o doing it.


I was able to figure it out. In case someone wants to know below the code that worked for me:

ASCIIEncoding ascii = new ASCIIEncoding();
byte[] byteArray = Encoding.UTF8.GetBytes(sOriginal);
byte[] asciiArray = Encoding.Convert(Encoding.UTF8, Encoding.ASCII, byteArray);
string finalString = ascii.GetString(asciiArray);

Let me know if there is a simpler way o doing it.


Based on Mark's answer above (and Geo's comment), I created a two liner version to remove all ASCII exception cases from a string. Provided for people searching for this answer (as I did).

using System.Text;

// Create encoder with a replacing encoder fallback
var encoder = ASCIIEncoding.GetEncoding("us-ascii", 
    new EncoderReplacementFallback(string.Empty), 
    new DecoderExceptionFallback());

string cleanString = encoder.GetString(encoder.GetBytes(dirtyString)); 

For anyone who likes Extension methods, this one does the trick for us.

using System.Text;

namespace System
{
    public static class StringExtension
    {
        private static readonly ASCIIEncoding asciiEncoding = new ASCIIEncoding();

        public static string ToAscii(this string dirty)
        {
            byte[] bytes = asciiEncoding.GetBytes(dirty);
            string clean = asciiEncoding.GetString(bytes);
            return clean;
        }
    }
}

(System namespace so it's available pretty much automatically for all of our strings.)


Examples related to c#

How can I convert this one line of ActionScript to C#? Microsoft Advertising SDK doesn't deliverer ads How to use a global array in C#? How to correctly write async method? C# - insert values from file into two arrays Uploading into folder in FTP? Are these methods thread safe? dotnet ef not found in .NET Core 3 HTTP Error 500.30 - ANCM In-Process Start Failure Best way to "push" into C# array

Examples related to encoding

How to check encoding of a CSV file UnicodeEncodeError: 'ascii' codec can't encode character at special name Using Javascript's atob to decode base64 doesn't properly decode utf-8 strings What is the difference between utf8mb4 and utf8 charsets in MySQL? The character encoding of the plain text document was not declared - mootool script UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 23: ordinal not in range(128) How to encode text to base64 in python UTF-8 output from PowerShell Set Encoding of File to UTF8 With BOM in Sublime Text 3 Replace non-ASCII characters with a single space

Examples related to utf-8

error UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte Changing PowerShell's default output encoding to UTF-8 'Malformed UTF-8 characters, possibly incorrectly encoded' in Laravel Encoding Error in Panda read_csv Using Javascript's atob to decode base64 doesn't properly decode utf-8 strings What is the difference between utf8mb4 and utf8 charsets in MySQL? what is <meta charset="utf-8">? Pandas df.to_csv("file.csv" encode="utf-8") still gives trash characters for minus sign UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 23: ordinal not in range(128) Android Studio : unmappable character for encoding UTF-8

Examples related to ascii

Detect whether a Python string is a number or a letter Is there any ASCII character for <br>? UnicodeEncodeError: 'ascii' codec can't encode character at special name Replace non-ASCII characters with a single space Convert ascii value to char What's the difference between ASCII and Unicode? Invisible characters - ASCII How To Convert A Number To an ASCII Character? Convert ascii char[] to hexadecimal char[] in C Convert character to ASCII numeric value in java

Examples related to transliteration

How to convert (transliterate) a string from utf8 to ASCII (single byte) in c#?