[c#] How can I transform string to UTF-8 in C#?

I have a string that I receive from a third party app and I would like to display it correctly in any language using C# on my Windows Surface.

Due to incorrect encoding, a piece of my string looks like this in Spanish:

Acción

whereas it should look like this:

Acción

According to the answer on this question: How to know string encoding in C#, the encoding I am receiving should be coming on UTF-8 already, but it is read on Encoding.Default (probably ANSI?).

I am trying to transform this string into real UTF-8, but one of the problems is that I can only see a subset of the Encoding class (UTF8 and Unicode properties only), probably because I'm limited to the windows surface API.

I have tried some snippets I've found on the internet, but none of them have proved successful so far for eastern languages (i.e. korean). One example is as follows:

var utf8 = Encoding.UTF8;
byte[] utfBytes = utf8.GetBytes(myString);
myString= utf8.GetString(utfBytes, 0, utfBytes.Length);     

I also tried extracting the string into a byte array and then using UTF8.GetString:

byte[] myByteArray = new byte[myString.Length];
for (int ix = 0; ix < myString.Length; ++ix)
{
    char ch = myString[ix];
    myByteArray[ix] = (byte) ch;
}

myString = Encoding.UTF8.GetString(myByteArray, 0, myString.Length);

Do you guys have any other ideas that I could try?

This question is related to c# string encoding utf-8 character-encoding

The answer is


 Encoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(mystring));

string utf8String = "Acción";
string propEncodeString = string.Empty;

byte[] utf8_Bytes = new byte[utf8String.Length];
for (int i = 0; i < utf8String.Length; ++i)
{
   utf8_Bytes[i] = (byte)utf8String[i];
}

propEncodeString = Encoding.UTF8.GetString(utf8_Bytes, 0, utf8_Bytes.Length);

Output should look like

Acción

day’s displays day's

call DecodeFromUtf8();

private static void DecodeFromUtf8()
{
    string utf8_String = "day’s";
    byte[] bytes = Encoding.Default.GetBytes(utf8_String);
    utf8_String = Encoding.UTF8.GetString(bytes);
}

As you know the string is coming in as Encoding.Default you could simply use:

byte[] bytes = Encoding.Default.GetBytes(myString);
myString = Encoding.UTF8.GetString(bytes);

Another thing you may have to remember: If you are using Console.WriteLine to output some strings, then you should also write Console.OutputEncoding = System.Text.Encoding.UTF8;!!! Or all utf8 strings will be outputed as gbk...


If you want to save any string to mysql database do this:->

Your database field structure i phpmyadmin [ or any other control panel] should set to utf8-gerneral-ci

2) you should change your string [Ex. textbox1.text] to byte, therefor

2-1) define byte[] st2;

2-2) convert your string [textbox1.text] to unicode [ mmultibyte string] by :

byte[] st2 = System.Text.Encoding.UTF8.GetBytes(textBox1.Text);

3) execute this sql command before any query:

string mysql_query2 = "SET NAMES 'utf8'";
cmd.CommandText = mysql_query2;
cmd.ExecuteNonQuery();

3-2) now you should insert this value in to for example name field by :

cmd.CommandText = "INSERT INTO customer (`name`) values (@name)";

4) the main job that many solution didn't attention to it is the below line: you should use addwithvalue instead of add in command parameter like below:

cmd.Parameters.AddWithValue("@name",ut);

++++++++++++++++++++++++++++++++++ enjoy real data in your database server instead of ????


Your code is reading a sequence of UTF8-encoded bytes, and decoding them using an 8-bit encoding.

You need to fix that code to decode the bytes as UTF8.

Alternatively (not ideal), you could convert the bad string back to the original byte array—by encoding it using the incorrect encoding—then re-decode the bytes as UTF8.


@anothershrubery answer worked for me. I've made an enhancement using StringEntensions Class so I can easily convert any string at all in my program.

Method:

public static class StringExtensions
{
    public static string ToUTF8(this string text)
    {
        return Encoding.UTF8.GetString(Encoding.Default.GetBytes(text));
    }
}

Usage:

string myString = "Acción";
string strConverted = myString.ToUTF8();

Or simply:

string strConverted = "Acción".ToUTF8();

Use the below code snippet to get bytes from csv file

protected byte[] GetCSVFileContent(string fileName)
    {
        StringBuilder sb = new StringBuilder();
        using (StreamReader sr = new StreamReader(fileName, Encoding.Default, true))
        {
            String line;
            // Read and display lines from the file until the end of 
            // the file is reached.
            while ((line = sr.ReadLine()) != null)
            {
                sb.AppendLine(line);
            }
        }
        string allines = sb.ToString();


        UTF8Encoding utf8 = new UTF8Encoding();


        var preamble = utf8.GetPreamble();

        var data = utf8.GetBytes(allines);


        return data;
    }

Call the below and save it as an attachment

           Encoding csvEncoding = Encoding.UTF8;
                   //byte[] csvFile = GetCSVFileContent(FileUpload1.PostedFile.FileName);
          byte[] csvFile = GetCSVFileContent("Your_CSV_File_NAme");


        string attachment = String.Format("attachment; filename={0}.csv", "uomEncoded");

        Response.Clear();
        Response.ClearHeaders();
        Response.ClearContent();
        Response.ContentType = "text/csv";
        Response.ContentEncoding = csvEncoding;
        Response.AppendHeader("Content-Disposition", attachment);
        //Response.BinaryWrite(csvEncoding.GetPreamble());
        Response.BinaryWrite(csvFile);
        Response.Flush();
        Response.End();

Examples related to c#

How can I convert this one line of ActionScript to C#? Microsoft Advertising SDK doesn't deliverer ads How to use a global array in C#? How to correctly write async method? C# - insert values from file into two arrays Uploading into folder in FTP? Are these methods thread safe? dotnet ef not found in .NET Core 3 HTTP Error 500.30 - ANCM In-Process Start Failure Best way to "push" into C# array

Examples related to string

How to split a string in two and store it in a field String method cannot be found in a main class method Kotlin - How to correctly concatenate a String Replacing a character from a certain index Remove quotes from String in Python Detect whether a Python string is a number or a letter How does String substring work in Swift How does String.Index work in Swift swift 3.0 Data to String? How to parse JSON string in Typescript

Examples related to encoding

How to check encoding of a CSV file UnicodeEncodeError: 'ascii' codec can't encode character at special name Using Javascript's atob to decode base64 doesn't properly decode utf-8 strings What is the difference between utf8mb4 and utf8 charsets in MySQL? The character encoding of the plain text document was not declared - mootool script UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 23: ordinal not in range(128) How to encode text to base64 in python UTF-8 output from PowerShell Set Encoding of File to UTF8 With BOM in Sublime Text 3 Replace non-ASCII characters with a single space

Examples related to utf-8

error UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte Changing PowerShell's default output encoding to UTF-8 'Malformed UTF-8 characters, possibly incorrectly encoded' in Laravel Encoding Error in Panda read_csv Using Javascript's atob to decode base64 doesn't properly decode utf-8 strings What is the difference between utf8mb4 and utf8 charsets in MySQL? what is <meta charset="utf-8">? Pandas df.to_csv("file.csv" encode="utf-8") still gives trash characters for minus sign UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 23: ordinal not in range(128) Android Studio : unmappable character for encoding UTF-8

Examples related to character-encoding

Changing PowerShell's default output encoding to UTF-8 JsonParseException : Illegal unquoted character ((CTRL-CHAR, code 10) Change the encoding of a file in Visual Studio Code What is the difference between utf8mb4 and utf8 charsets in MySQL? How to open html file? All inclusive Charset to avoid "java.nio.charset.MalformedInputException: Input length = 1"? UTF-8 output from PowerShell ERROR 1115 (42000): Unknown character set: 'utf8mb4' "for line in..." results in UnicodeDecodeError: 'utf-8' codec can't decode byte How to make php display \t \n as tab and new line instead of characters