Convert String UTF-16 to UTF-8 in C

Question

I need to convert a string to UTF-8 in C   I ve already try many ways but none works as I wanted  I converted my string into a byte array and then to try to write it to an XML file  which encoding is UTF-8      but either I got the same string  not encoded at all  either I got a list of byte which is useless     Does someone face the same issue    Edit   This is some of the code I used    str   test     byte   utf8Bytes   Encoding UTF8 GetBytes str   return Encoding UTF8 GetString utf8Bytes     The result is  test    or I expected something like  test

User · Answer

does this example help     using System  using System IO  using System Text   class Test      public static void Main                     using  StreamWriter output   new StreamWriter  practice txt                      Create and write a string containing the symbol for Pi          string srcString    Area    u03A0r 2               Convert the UTF-16 encoded source string to UTF-8 and ASCII          byte   utf8String   Encoding UTF8 GetBytes srcString           byte   asciiString   Encoding ASCII GetBytes srcString               Write the UTF-8 and ASCII encoded byte arrays           output WriteLine  UTF-8  Bytes   0    BitConverter ToString utf8String            output WriteLine  ASCII  Bytes   0    BitConverter ToString asciiString                 Convert UTF-8 and ASCII encoded bytes back to UTF-16 encoded              string and write          output WriteLine  UTF-8  Text    0    Encoding UTF8 GetString utf8String            output WriteLine  ASCII  Text    0    Encoding ASCII GetString asciiString             Console WriteLine Encoding UTF8 GetString utf8String            Console WriteLine Encoding ASCII GetString asciiString

User · Answer

If you want a UTF8 string  where every byte is correct       -   195  0     150  0    you can use the followed   public static string Utf16ToUtf8 string utf16String                                                                             Every  NET string will store text with the UTF16 encoding          known as Encoding Unicode  Other encodings may exist as            Byte-Array or incorrectly stored with the UTF16 encoding                                                                              UTF8   1 bytes per char                                                 100  for the ansi  d                                               206  and  186  for the russian                                                                                                   UTF16   2 bytes per char                                                100  0  for the ansi  d                                            186  3  for the russian                                                                                                          UTF8 inside UTF16                                                       100  0  for the ansi  d                                            206  0  and  186  0  for the russian                                                                                             We can use the convert encoding function to convert an             UTF16 Byte-Array to an UTF8 Byte-Array  When we use UTF8           encoding to string method now  we will get a UTF16 string                                                                             So we imitate UTF16 by filling the second byte of a char           with a 0 byte  binary 0  while creating the string                                                                                       Storage for the UTF8 string     string utf8String   String Empty          Get UTF16 bytes and convert UTF16 bytes to UTF8 bytes     byte   utf16Bytes   Encoding Unicode GetBytes utf16String       byte   utf8Bytes   Encoding Convert Encoding Unicode  Encoding UTF8  utf16Bytes           Fill UTF8 bytes inside UTF8 string     for  int i   0  i  lt  utf8Bytes Length  i                     Because char always saves 2 bytes  fill char with 0         byte   utf8Container   new byte 2    utf8Bytes i   0            utf8String    BitConverter ToChar utf8Container  0                 Return UTF8     return utf8String      In my case the DLL request is a UTF8 string too  but unfortunately the UTF8 string must be interpreted with UTF16 encoding       -   195  0    19  32    So the ANSI       which is 150 has to be converted to the UTF16       which is 8211  If you have this case too  you can use the following instead   public static string Utf16ToUtf8 string utf16String           Get UTF16 bytes and convert UTF16 bytes to UTF8 bytes     byte   utf16Bytes   Encoding Unicode GetBytes utf16String       byte   utf8Bytes   Encoding Convert Encoding Unicode  Encoding UTF8  utf16Bytes           Return UTF8 bytes as ANSI string     return Encoding Default GetString utf8Bytes       Or the Native-Method    DllImport  kernel32 dll    private static extern Int32 WideCharToMultiByte UInt32 CodePage  UInt32 dwFlags   MarshalAs UnmanagedType LPWStr   String lpWideCharStr  Int32 cchWideChar   Out  MarshalAs UnmanagedType LPStr   StringBuilder lpMultiByteStr  Int32 cbMultiByte  IntPtr lpDefaultChar  IntPtr lpUsedDefaultChar    public static string Utf16ToUtf8 string utf16String        Int32 iNewDataLen   WideCharToMultiByte Convert ToUInt32 Encoding UTF8 CodePage   0  utf16String  utf16String Length  null  0  IntPtr Zero  IntPtr Zero       if  iNewDataLen  gt  1                StringBuilder utf8String   new StringBuilder iNewDataLen           WideCharToMultiByte Convert ToUInt32 Encoding UTF8 CodePage   0  utf16String  -1  utf8String  utf8String Capacity  IntPtr Zero  IntPtr Zero            return utf8String ToString              else               return String Empty            If you need it the other way around  see Utf8ToUtf16  Hope I could be of help

User · Answer

A string in C  is always UTF-16  there is no way to  convert  it  The encoding is irrelevant as long as you manipulate the string in memory  it only matters if you write the string to a stream  file  memory stream  network stream       If you want to write the string to a XML file  just specify the encoding when you create the XmlWriter

User · Answer

class Program       static void Main string   args                String unicodeString            This Unicode string contains two characters              with codes outside the traditional ASCII code range               Pi   u03a0  and Sigma   u03a3              Console WriteLine  Original string             Console WriteLine unicodeString           UnicodeEncoding unicodeEncoding   new UnicodeEncoding            byte   utf16Bytes   unicodeEncoding GetBytes unicodeString           char   chars   unicodeEncoding GetChars utf16Bytes  2  utf16Bytes Length - 2           string s   new string chars           Console WriteLine            Console WriteLine  Char Array             foreach  char c in chars  Console Write c           Console WriteLine            Console WriteLine            Console WriteLine  String from Char Array             Console WriteLine s            Console ReadKey

User · Answer

private static string Utf16ToUtf8 string utf16String                                                                                           Every  NET string will store text with the UTF16 encoding               known as Encoding Unicode  Other encodings may exist as                 Byte-Array or incorrectly stored with the UTF16 encoding                                                                                        UTF8   1 bytes per char                                                      100  for the ansi  d                                                    206  and  186  for the russian                                                                                                             UTF16   2 bytes per char                                                     100  0  for the ansi  d                                                 186  3  for the russian                                                                                                                    UTF8 inside UTF16                                                            100  0  for the ansi  d                                                 206  0  and  186  0  for the russian                                                                                                       We can use the convert encoding function to convert an                  UTF16 Byte-Array to an UTF8 Byte-Array  When we use UTF8                encoding to string method now  we will get a UTF16 string                                                                                       So we imitate UTF16 by filling the second byte of a char                with a 0 byte  binary 0  while creating the string                                                                                                Get UTF16 bytes and convert UTF16 bytes to UTF8 bytes         byte   utf16Bytes   Encoding Unicode GetBytes utf16String           byte   utf8Bytes   Encoding Convert Encoding Unicode  Encoding UTF8  utf16Bytes           char   chars    char   Array CreateInstance typeof char   utf8Bytes Length            for  int i   0  i  lt  utf8Bytes Length  i                          chars i    BitConverter ToChar new byte 2    utf8Bytes i   0    0                         Return UTF8         return new String chars           In the original post author concatenated strings  Every sting operation will result in string recreation in  Net  String is effectively a reference type  As a result  the function provided will be visibly slow  Don t do that  Use array of chars instead  write there directly and then convert result to string  In my case of processing 500 kb of text difference is almost 5 minutes

User · Answer

Check the Jon Skeet answer to this other question  UTF-16 to UTF-8 conversion  for scripting in Windows   It contains the source code that you need   Hope it helps

[c#] Convert String (UTF-16) to UTF-8 in C#

Examples related to c#

Examples related to .net

Examples related to encoding

Examples related to utf-8