std wstring VS std string

Question

I am not able to understand the differences between std  string and std  wstring  I know wstring supports wide characters such as Unicode characters  I have got the following questions    When should I use std  wstring over std  string  Can std  string hold the entire ASCII character set  including the special characters  Is std  wstring supported by all popular C   compilers  What is exactly a  wide character

User · Answer

when you want to use Unicode strings and not just ascii  helpful for internationalisation yes  but it doesn t play well with 0 not aware of any that don t wide character is the compiler specific way of handling the fixed length representation of a unicode character  for MSVC it is a 2 byte character  for gcc I understand it is 4 bytes  and a  1 for http   www joelonsoftware com articles Unicode html

User · Answer

string  wstring   std  string is a basic string templated on a char  and std  wstring on a wchar t   char vs  wchar t  char is supposed to hold a character  usually an 8-bit character  wchar t is supposed to hold a wide character  and then  things get tricky  On Linux  a wchar t is 4 bytes  while on Windows  it s 2 bytes   What about Unicode  then   The problem is that neither char nor wchar t is directly tied to unicode   On Linux   Let s take a Linux OS  My Ubuntu system is already unicode aware  When I work with a char string  it is natively encoded in UTF-8  i e  Unicode string of chars   The following code    include  lt cstring gt   include  lt iostream gt   int main int argc  char  argv         const char text      ol           std  cout  lt  lt   sizeof char          lt  lt  sizeof char   lt  lt  std  endl      std  cout  lt  lt   text                 lt  lt  text  lt  lt  std  endl      std  cout  lt  lt   sizeof text          lt  lt  sizeof text   lt  lt  std  endl      std  cout  lt  lt   strlen text          lt  lt  strlen text   lt  lt  std  endl       std  cout  lt  lt   text ordinals            for size t i   0  iMax   strlen text   i  lt  iMax    i             std  cout  lt  lt       lt  lt  static cast lt unsigned int gt                                 static cast lt unsigned char gt  text i                                         std  cout  lt  lt  std  endl  lt  lt  std  endl          - - -      const wchar t wtext     L ol          std  cout  lt  lt   sizeof wchar t       lt  lt  sizeof wchar t   lt  lt  std  endl        std  cout  lt  lt   wtext                lt  lt  wtext  lt  lt  std  endl    lt - error    std  cout  lt  lt   wtext             UNABLE TO CONVERT NATIVELY    lt  lt  std  endl      std  wcout  lt  lt  L wtext                lt  lt  wtext  lt  lt  std  endl      std  cout  lt  lt   sizeof wtext         lt  lt  sizeof wtext   lt  lt  std  endl      std  cout  lt  lt   wcslen wtext         lt  lt  wcslen wtext   lt  lt  std  endl       std  cout  lt  lt   wtext ordinals           for size t i   0  iMax   wcslen wtext   i  lt  iMax    i             std  cout  lt  lt       lt  lt  static cast lt unsigned int gt                                 static cast lt unsigned short gt  wtext i                                             std  cout  lt  lt  std  endl  lt  lt  std  endl       return 0      outputs the following text   sizeof char       1 text              ol   sizeof text       5 strlen text       4 text ordinals     111 108 195 169  sizeof wchar t    4 wtext             UNABLE TO CONVERT NATIVELY  wtext             ol  sizeof wtext      16 wcslen wtext      3 wtext ordinals    111 108 233   You ll see the  ol    text in char is really constructed by four chars  110  108  195 and 169  not counting the trailing zero    I ll let you study the wchar t code as an exercise   So  when working with a char on Linux  you should usually end up using Unicode without even knowing it  And as std  string works with char  so std  string is already unicode-ready   Note that std  string  like the C string API  will consider the  ol    string to have 4 characters  not three  So you should be cautious when truncating playing with unicode chars because some combination of chars is forbidden in UTF-8   On Windows   On Windows  this is a bit different  Win32 had to support a lot of application working with char and on different charsets codepages produced in all the world  before the advent of Unicode   So their solution was an interesting one  If an application works with char  then the char strings are encoded printed shown on GUI labels using the local charset codepage on the machine  For example   ol    would be  ol    in a French-localized Windows  but would be something different on an cyrillic-localized Windows   ol   if you use Windows-1251   Thus   historical apps  will usually still work the same old way   For Unicode based applications  Windows uses wchar t  which is 2-bytes wide  and is encoded in UTF-16  which is Unicode encoded on 2-bytes characters  or at the very least  the mostly compatible UCS-2  which is almost the same thing IIRC    Applications using char are said  multibyte   because each glyph is composed of one or more chars   while applications using wchar t are said  widechar   because each glyph is composed of one or two wchar t  See MultiByteToWideChar and WideCharToMultiByte Win32 conversion API for more info   Thus  if you work on Windows  you badly want to use wchar t  unless you use a framework hiding that  like GTK  or QT      The fact is that behind the scenes  Windows works with wchar t strings  so even historical applications will have their char strings converted in wchar t when using API like SetWindowText    low level API function to set the label on a Win32 GUI    Memory issues   UTF-32 is 4 bytes per characters  so there is no much to add  if only that a UTF-8 text and UTF-16 text will always use less or the same amount of memory than an UTF-32 text  and usually less    If there is a memory issue  then you should know than for most western languages  UTF-8 text will use less memory than the same UTF-16 one   Still  for other languages  chinese  japanese  etc    the memory used will be either the same  or slightly larger for UTF-8 than for UTF-16   All in all  UTF-16 will mostly use 2 and occassionally 4 bytes per characters  unless you re dealing with some kind of esoteric language glyphs  Klingon  Elvish    while UTF-8 will spend from 1 to 4 bytes   See http   en wikipedia org wiki UTF-8 Compared to UTF-16 for more info   Conclusion   When I should use std  wstring over std  string   On Linux  Almost never       On Windows  Almost always       On cross-platform code  Depends on your toolkit            unless you use a toolkit framework saying otherwise Can std  string hold all the ASCII character set including special characters   Notice  A std  string is suitable for holding a  binary  buffer  where a std  wstring is not   On Linux  Yes  On Windows  Only special characters available for the current locale of the Windows user   Edit  After a comment from Johann Gerell   a std  string will be enough to handle all char-based strings  each char being a number from 0 to 255   But    ASCII is supposed to go from 0 to 127  Higher chars are NOT ASCII  a char from 0 to 127 will be held correctly a char from 128 to 255 will have a signification depending on your encoding  unicode  non-unicode  etc    but it will be able to hold all Unicode glyphs as long as they are encoded in UTF-8   Is std  wstring supported by almost all popular C   compilers   Mostly  with the exception of GCC based compilers that are ported to Windows  It works on my g   4 3 2  under Linux   and I used Unicode API on Win32 since Visual C   6  What is exactly a wide character   On C C    it s a character type written wchar t which is larger than the simple char character type  It is supposed to be used to put inside characters whose indices  like Unicode glyphs  are larger than 255  or 127  depending

User · Answer

When should you NOT use wide-characters     When you re writing code before the year 1990   Obviously  I m being flip  but really  it s the 21st century now  127 characters have long since ceased to be sufficient   Yes  you can use UTF8  but why bother with the headaches

User · Answer

When should you NOT use wide-characters     When you re writing code before the year 1990   Obviously  I m being flip  but really  it s the 21st century now  127 characters have long since ceased to be sufficient   Yes  you can use UTF8  but why bother with the headaches

User · Answer

when you want to use Unicode strings and not just ascii  helpful for internationalisation yes  but it doesn t play well with 0 not aware of any that don t wide character is the compiler specific way of handling the fixed length representation of a unicode character  for MSVC it is a 2 byte character  for gcc I understand it is 4 bytes  and a  1 for http   www joelonsoftware com articles Unicode html

User · Answer

There are some very good answers here  but I think there are a couple of things I can add regarding Windows Visual Studio  Tis is based on my experience with VS2015  On Linux  basically the answer is to use UTF-8 encoded std  string everywhere  On Windows VS it gets more complex  Here is why  Windows expects strings stored using chars to be encoded using the locale codepage  This is almost always the ASCII character set followed by 128 other special characters depending on your location  Let me just state that this in not just when using the Windows API  there are three other major places where these strings interact with standard C    These are string literals  output to std  cout using  lt  lt  and passing a filename to std  fstream   I will be up front here that I am a programmer  not a language specialist  I appreciate that USC2 and UTF-16 are not the same  but for my purposes they are close enough to be interchangeable and I use them as such here  I m not actually sure which Windows uses  but I generally don t need to know either  I ve stated UCS2 in this answer  so sorry in advance if I upset anyone with my ignorance of this matter and I m happy to change it if I have things wrong   String literals  If you enter string literals that contain only characters that can be represented by your codepage then VS stores them in your file with 1 byte per character encoding based on your codepage  Note that if you change your codepage or give your source to another developer using a different code page then I think  but haven t tested  that the character will end up different  If you run your code on a computer using a different code page then I m not sure if the character will change too   If you enter any string literals that cannot be represented by your codepage then VS will ask you to save the file as Unicode  The file will then be encoded as UTF-8  This means that all Non ASCII characters  including those which are on your codepage  will be represented by 2 or more bytes  This means if you give your source to someone else the source will look the same  However  before passing the source to the compiler  VS converts the UTF-8 encoded text to code page encoded text and any characters missing from the code page are replaced with     The only way to guarantee correctly representing a Unicode string literal in VS is to precede the string literal with an L making it a wide string literal  In this case VS will convert the UTF-8 encoded text from the file into UCS2  You then need to pass this string literal into a std  wstring constructor or you need to convert it to utf-8 and put it in a std  string  Or if you want you can use the Windows API functions to encode it using your code page to put it in a std  string  but then you may as well have not used a wide string literal   std  cout  When outputting to the console using  lt  lt  you can only use std  string  not std  wstring and the text must be encoded using your locale codepage  If you have a std  wstring then you must convert it using one of the Windows API functions and any characters not on your codepage get replaced by    maybe you can change the character  I can t remember    std  fstream filenames  Windows OS uses UCS2 UTF-16 for its filenames so whatever your codepage  you can have files with any Unicode character  But this means that to access or create files with characters not on your codepage you must use std  wstring  There is no other way  This is a Microsoft specific extension to std  fstream so probably won t compile on other systems  If you use std  string then you can only utilise filenames that only include characters on your codepage   Your options  If you are just working on Linux then you probably didn t get this far  Just use UTF-8 std  string everywhere   If you are just working on Windows just use UCS2 std  wstring everywhere  Some purists may say use UTF8 then convert when needed  but why bother with the hassle   If you are cross platform then it s a mess to be frank  If you try to use UTF-8 everywhere on Windows then you need to be really careful with your string literals and output to the console  You can easily corrupt your strings there  If you use std  wstring everywhere on Linux then you may not have access to the wide version of std  fstream  so you have to do the conversion  but there is no risk of corruption  So personally I think this is a better option  Many would disagree  but I m not alone - it s the path taken by wxWidgets for example   Another option could be to typedef unicodestring as std  string on Linux and std  wstring on Windows  and have a macro called UNI   which prefixes L on Windows and nothing on Linux  then the code   include  lt fstream gt   include  lt string gt   include  lt iostream gt   include  lt Windows h gt    ifdef  WIN32 typedef std  wstring unicodestring   define UNI text  L    text std  string formatForConsole const unicodestring  amp str        std  string result        Call WideCharToMultiByte to do the conversion     return result     else typedef std  string unicodestring   define UNI text  text std  string formatForConsole const unicodestring  amp str        return str     endif  int main          unicodestring fileName UNI  fileName         std  ofstream fout      fout open fileName       std  cout  lt  lt  formatForConsole fileName   lt  lt  std  endl      return 0      would be fine on either platform I think   Answers  So To answer your questions  1  If you are programming for Windows  then all the time  if cross platform then maybe all the time  unless you want to deal with possible corruption issues on Windows or write some code with platform specific  ifdefs to work around the differences  if just using Linux then never   2 Yes  In addition on Linux you can use it for all Unicode too  On Windows you can only use it for all unicode if you choose to manually encode using UTF-8  But the Windows API and standard C   classes will expect the std  string to be encoded using the locale codepage  This includes all ASCII plus another 128 characters which change depending on the codepage your computer is setup to use   3 I believe so  but if not then it is just a simple typedef of a  std  basic string  using wchar t instead of char  4 A wide character is a character type which is bigger than the 1 byte standard char type  On Windows it is 2 bytes  on Linux it is 4 bytes

User · Answer

When you want to store  wide   Unicode  characters  Yes  255 of them  excluding 0   Yes  Here s an introductory article  http   www joelonsoftware com articles Unicode html

User · Answer

When you want to have wide characters stored in your string  wide depends on the implementation  Visual C   defaults to 16 bit if i remember correctly  while GCC defaults depending on the target  It s 32 bits long here  Please note wchar t  wide character type  has nothing to do with unicode  It s merely guaranteed that it can store all the members of the largest character set that the implementation supports by its locales  and at least as long as char  You can store unicode strings fine into std  string using the utf-8 encoding too  But it won t understand the meaning of unicode code points  So str size   won t give you the amount of logical characters in your string  but merely the amount of char or wchar t elements stored in that string wstring  For that reason  the gtk glib C   wrapper folks have developed a Glib  ustring class that can handle utf-8     If your wchar t is 32 bits long  then you can use utf-32 as an unicode encoding  and you can store and handle unicode strings using a fixed  utf-32 is fixed length  encoding  This means your wstring s s size   function will then return the right amount of wchar t elements and logical characters   Yes  char is always at least 8 bit long  which means it can store all ASCII values   Yes  all major compilers support it

User · Answer

I frequently use std  string to hold utf-8 characters without any problems at all   I heartily recommend doing this when interfacing with API s which use utf-8 as the native string type as well   For example  I use utf-8 when interfacing my code with the Tcl interpreter   The major caveat is the length of the std  string  is no longer the number of characters in the string

User · Answer

Applications that are not satisfied with only 256 different characters have the options of either using wide characters  more than 8 bits  or a variable-length encoding  a multibyte encoding in C   terminology  such as UTF-8  Wide characters generally require more space than a variable-length encoding  but are faster to process  Multi-language applications that process large amounts of text usually use wide characters when processing the text  but convert it to UTF-8 when storing it to disk   The only difference between a string and a wstring is the data type of the characters they store  A string stores chars whose size is guaranteed to be at least 8 bits  so you can use strings for processing e g  ASCII  ISO-8859-15  or UTF-8 text  The standard says nothing about the character set or encoding   Practically every compiler uses a character set whose first 128 characters correspond with ASCII  This is also the case with compilers that use UTF-8 encoding  The important thing to be aware of when using strings in UTF-8 or some other variable-length encoding  is that the indices and lengths are measured in bytes  not characters   The data type of a wstring is wchar t  whose size is not defined in the standard  except that it has to be at least as large as a char  usually 16 bits or 32 bits  wstring can be used for processing text in the implementation defined wide-character encoding  Because the encoding is not defined in the standard  it is not straightforward to convert between strings and wstrings  One cannot assume wstrings to have a fixed-length encoding either   If you don t need multi-language support  you might be fine with using only regular strings  On the other hand  if you re writing a graphical application  it is often the case that the API supports only wide characters  Then you probably want to use the same wide characters when processing the text  Keep in mind that UTF-16 is a variable-length encoding  meaning that you cannot assume length   to return the number of characters  If the API uses a fixed-length encoding  such as UCS-2  processing becomes easy  Converting between wide characters and UTF-8 is difficult to do in a portable way  but then again  your user interface API probably supports the conversion

User · Answer

When you want to have wide characters stored in your string  wide depends on the implementation  Visual C   defaults to 16 bit if i remember correctly  while GCC defaults depending on the target  It s 32 bits long here  Please note wchar t  wide character type  has nothing to do with unicode  It s merely guaranteed that it can store all the members of the largest character set that the implementation supports by its locales  and at least as long as char  You can store unicode strings fine into std  string using the utf-8 encoding too  But it won t understand the meaning of unicode code points  So str size   won t give you the amount of logical characters in your string  but merely the amount of char or wchar t elements stored in that string wstring  For that reason  the gtk glib C   wrapper folks have developed a Glib  ustring class that can handle utf-8     If your wchar t is 32 bits long  then you can use utf-32 as an unicode encoding  and you can store and handle unicode strings using a fixed  utf-32 is fixed length  encoding  This means your wstring s s size   function will then return the right amount of wchar t elements and logical characters   Yes  char is always at least 8 bit long  which means it can store all ASCII values   Yes  all major compilers support it

User · Answer

1  As mentioned by Greg  wstring is helpful for internationalization  that s when you will be releasing your product in languages other than english  4  Check this out for wide character http   en wikipedia org wiki Wide character

User · Answer

I frequently use std  string to hold utf-8 characters without any problems at all   I heartily recommend doing this when interfacing with API s which use utf-8 as the native string type as well   For example  I use utf-8 when interfacing my code with the Tcl interpreter   The major caveat is the length of the std  string  is no longer the number of characters in the string

User · Answer

I recommend avoiding std  wstring on Windows or elsewhere  except when required by the interface  or anywhere near Windows API calls and respective encoding conversions as a syntactic sugar    My view is summarized in http   utf8everywhere org of which I am a co-author    Unless your application is API-call-centric  e g  mainly UI application  the suggestion is to store Unicode strings in std  string and encoded in UTF-8  performing conversion near API calls  The benefits outlined in the article outweigh the apparent annoyance of conversion  especially in complex applications  This is doubly so for multi-platform and library development    And now  answering your questions    A few weak reasons  It exists for historical reasons  where widechars were believed to be the proper way of supporting Unicode  It is now used to interface APIs that prefer UTF-16 strings  I use them only in the direct vicinity of such API calls  This has nothing to do with std  string  It can hold whatever encoding you put in it  The only question is how You treat its content  My recommendation is UTF-8  so it will be able to hold all Unicode characters correctly  It s a common practice on Linux  but I think Windows programs should do it also  No   Wide character is a confusing name  In the early days of Unicode  there was a belief that a character can be encoded in two bytes  hence the name  Today  it stands for  any part of the character that is two bytes long   UTF-16 is seen as a sequence of such byte pairs  aka Wide characters   A character in UTF-16 takes either one or two pairs

User · Answer

Applications that are not satisfied with only 256 different characters have the options of either using wide characters  more than 8 bits  or a variable-length encoding  a multibyte encoding in C   terminology  such as UTF-8  Wide characters generally require more space than a variable-length encoding  but are faster to process  Multi-language applications that process large amounts of text usually use wide characters when processing the text  but convert it to UTF-8 when storing it to disk   The only difference between a string and a wstring is the data type of the characters they store  A string stores chars whose size is guaranteed to be at least 8 bits  so you can use strings for processing e g  ASCII  ISO-8859-15  or UTF-8 text  The standard says nothing about the character set or encoding   Practically every compiler uses a character set whose first 128 characters correspond with ASCII  This is also the case with compilers that use UTF-8 encoding  The important thing to be aware of when using strings in UTF-8 or some other variable-length encoding  is that the indices and lengths are measured in bytes  not characters   The data type of a wstring is wchar t  whose size is not defined in the standard  except that it has to be at least as large as a char  usually 16 bits or 32 bits  wstring can be used for processing text in the implementation defined wide-character encoding  Because the encoding is not defined in the standard  it is not straightforward to convert between strings and wstrings  One cannot assume wstrings to have a fixed-length encoding either   If you don t need multi-language support  you might be fine with using only regular strings  On the other hand  if you re writing a graphical application  it is often the case that the API supports only wide characters  Then you probably want to use the same wide characters when processing the text  Keep in mind that UTF-16 is a variable-length encoding  meaning that you cannot assume length   to return the number of characters  If the API uses a fixed-length encoding  such as UCS-2  processing becomes easy  Converting between wide characters and UTF-8 is difficult to do in a portable way  but then again  your user interface API probably supports the conversion

User · Answer

1  As mentioned by Greg  wstring is helpful for internationalization  that s when you will be releasing your product in languages other than english  4  Check this out for wide character http   en wikipedia org wiki Wide character

User · Answer

I frequently use std  string to hold utf-8 characters without any problems at all   I heartily recommend doing this when interfacing with API s which use utf-8 as the native string type as well   For example  I use utf-8 when interfacing my code with the Tcl interpreter   The major caveat is the length of the std  string  is no longer the number of characters in the string

User · Answer

When you want to have wide characters stored in your string  wide depends on the implementation  Visual C   defaults to 16 bit if i remember correctly  while GCC defaults depending on the target  It s 32 bits long here  Please note wchar t  wide character type  has nothing to do with unicode  It s merely guaranteed that it can store all the members of the largest character set that the implementation supports by its locales  and at least as long as char  You can store unicode strings fine into std  string using the utf-8 encoding too  But it won t understand the meaning of unicode code points  So str size   won t give you the amount of logical characters in your string  but merely the amount of char or wchar t elements stored in that string wstring  For that reason  the gtk glib C   wrapper folks have developed a Glib  ustring class that can handle utf-8     If your wchar t is 32 bits long  then you can use utf-32 as an unicode encoding  and you can store and handle unicode strings using a fixed  utf-32 is fixed length  encoding  This means your wstring s s size   function will then return the right amount of wchar t elements and logical characters   Yes  char is always at least 8 bit long  which means it can store all ASCII values   Yes  all major compilers support it

User · Answer

when you want to use Unicode strings and not just ascii  helpful for internationalisation yes  but it doesn t play well with 0 not aware of any that don t wide character is the compiler specific way of handling the fixed length representation of a unicode character  for MSVC it is a 2 byte character  for gcc I understand it is 4 bytes  and a  1 for http   www joelonsoftware com articles Unicode html

User · Answer

When you want to have wide characters stored in your string  wide depends on the implementation  Visual C   defaults to 16 bit if i remember correctly  while GCC defaults depending on the target  It s 32 bits long here  Please note wchar t  wide character type  has nothing to do with unicode  It s merely guaranteed that it can store all the members of the largest character set that the implementation supports by its locales  and at least as long as char  You can store unicode strings fine into std  string using the utf-8 encoding too  But it won t understand the meaning of unicode code points  So str size   won t give you the amount of logical characters in your string  but merely the amount of char or wchar t elements stored in that string wstring  For that reason  the gtk glib C   wrapper folks have developed a Glib  ustring class that can handle utf-8     If your wchar t is 32 bits long  then you can use utf-32 as an unicode encoding  and you can store and handle unicode strings using a fixed  utf-32 is fixed length  encoding  This means your wstring s s size   function will then return the right amount of wchar t elements and logical characters   Yes  char is always at least 8 bit long  which means it can store all ASCII values   Yes  all major compilers support it

User · Answer

A good question  I think DATA ENCODING  sometimes a CHARSET also involved  is a MEMORY EXPRESSION MECHANISM in order to save data to a file or transfer data via a network  so I answer this question as   1  When should I use std  wstring over std  string   If the programming platform or API function is a single-byte one  and we want to process or parse some Unicode data  e g read from Windows  REG file or network 2-byte stream  we should declare std  wstring variable to easily process them  e g   wstring ws L   a  6 octets memory  0x4E2D 0x56FD 0x0061   we can use ws 0  to get character     and  ws 1  to get character     and  ws 2  to get character  a   etc   2  Can std  string hold the entire ASCII character set  including the special characters   Yes  But notice  American ASCII  means each 0x00 0xFF octet stands for one character  including printable text such as  123abc amp    amp   and you said special one  mostly print it as a     avoid confusing editors or terminals  And some other countries extend their own  ASCII  charset  e g  Chinese  use 2 octets to stand for one character    3 Is std  wstring supported by all popular C   compilers   Maybe  or mostly  I have used  VC  6 and GCC 3 3  YES  4  What is exactly a  wide character    a wide character mostly indicates using 2 octets or 4 octets to hold all countries  characters  2 octet UCS2 is a representative sample  and further e g  English  a   its memory is 2 octet of 0x0061 vs in ASCII  a s memory is 1 octet 0x61

User · Answer

when you want to use Unicode strings and not just ascii  helpful for internationalisation yes  but it doesn t play well with 0 not aware of any that don t wide character is the compiler specific way of handling the fixed length representation of a unicode character  for MSVC it is a 2 byte character  for gcc I understand it is 4 bytes  and a  1 for http   www joelonsoftware com articles Unicode html

User · Answer

A good question  I think DATA ENCODING  sometimes a CHARSET also involved  is a MEMORY EXPRESSION MECHANISM in order to save data to a file or transfer data via a network  so I answer this question as   1  When should I use std  wstring over std  string   If the programming platform or API function is a single-byte one  and we want to process or parse some Unicode data  e g read from Windows  REG file or network 2-byte stream  we should declare std  wstring variable to easily process them  e g   wstring ws L   a  6 octets memory  0x4E2D 0x56FD 0x0061   we can use ws 0  to get character     and  ws 1  to get character     and  ws 2  to get character  a   etc   2  Can std  string hold the entire ASCII character set  including the special characters   Yes  But notice  American ASCII  means each 0x00 0xFF octet stands for one character  including printable text such as  123abc amp    amp   and you said special one  mostly print it as a     avoid confusing editors or terminals  And some other countries extend their own  ASCII  charset  e g  Chinese  use 2 octets to stand for one character    3 Is std  wstring supported by all popular C   compilers   Maybe  or mostly  I have used  VC  6 and GCC 3 3  YES  4  What is exactly a  wide character    a wide character mostly indicates using 2 octets or 4 octets to hold all countries  characters  2 octet UCS2 is a representative sample  and further e g  English  a   its memory is 2 octet of 0x0061 vs in ASCII  a s memory is 1 octet 0x61

User · Answer

When you want to store  wide   Unicode  characters  Yes  255 of them  excluding 0   Yes  Here s an introductory article  http   www joelonsoftware com articles Unicode html

User · Answer

When you want to store  wide   Unicode  characters  Yes  255 of them  excluding 0   Yes  Here s an introductory article  http   www joelonsoftware com articles Unicode html

User · Answer

So  every reader here now should have a clear understanding about the facts  the situation  If not  then you must read paercebal s outstandingly comprehensive answer  btw  thanks     My pragmatical conclusion is shockingly simple  all that C    and STL   character encoding  stuff is substantially broken and useless  Blame it on Microsoft or not  that will not help anyway   My solution  after in-depth investigation  much frustration and the consequential experiences is the following    accept  that you have to be responsible on your own for the encoding and conversion stuff  and you will see that much of it is rather trivial  use std  string for any UTF-8 encoded strings  just a typedef std  string UTF8String  accept that such an UTF8String object is just a dumb  but cheap container  Do never ever access and or manipulate characters in it directly  no search  replace  and so on   You could  but you really just really  really do not want to waste your time writing text manipulation algorithms for multi-byte strings  Even if other people already did such stupid things  don t do that  Let it be   Well  there are scenarios where it makes sense    just use the ICU library for those   use std  wstring for UCS-2 encoded strings  typedef std  wstring UCS2String  - this is a compromise  and a concession to the mess that the WIN32 API introduced   UCS-2 is sufficient for most of us  more on that later      use UCS2String instances whenever a character-by-character access is required  read  manipulate  and so on   Any character-based processing should be done in a NON-multibyte-representation  It is simple  fast  easy  add two utility functions to convert back  amp  forth between UTF-8 and UCS-2   UCS2String ConvertToUCS2  const UTF8String  amp str    UTF8String ConvertToUTF8  const UCS2String  amp str       The conversions are straightforward  google should help here      That s it  Use UTF8String wherever memory is precious and for all UTF-8 I O  Use UCS2String wherever the string must be parsed and or manipulated  You can convert between those two representations any time   Alternatives  amp  Improvements   conversions from  amp  to single-byte character encodings  e g  ISO-8859-1  can be realized with help of plain translation tables  e g  const wchar t tt iso88951 256     0 1 2       and appropriate code for conversion to  amp  from UCS2  if UCS-2 is not sufficient  than switch to UCS-4  typedef std  basic string lt uint32 t gt  UCS2String    ICU or other unicode libraries   For advanced stuff

User · Answer

I frequently use std  string to hold utf-8 characters without any problems at all   I heartily recommend doing this when interfacing with API s which use utf-8 as the native string type as well   For example  I use utf-8 when interfacing my code with the Tcl interpreter   The major caveat is the length of the std  string  is no longer the number of characters in the string

User · Answer

string  wstring   std  string is a basic string templated on a char  and std  wstring on a wchar t   char vs  wchar t  char is supposed to hold a character  usually an 8-bit character  wchar t is supposed to hold a wide character  and then  things get tricky  On Linux  a wchar t is 4 bytes  while on Windows  it s 2 bytes   What about Unicode  then   The problem is that neither char nor wchar t is directly tied to unicode   On Linux   Let s take a Linux OS  My Ubuntu system is already unicode aware  When I work with a char string  it is natively encoded in UTF-8  i e  Unicode string of chars   The following code    include  lt cstring gt   include  lt iostream gt   int main int argc  char  argv         const char text      ol           std  cout  lt  lt   sizeof char          lt  lt  sizeof char   lt  lt  std  endl      std  cout  lt  lt   text                 lt  lt  text  lt  lt  std  endl      std  cout  lt  lt   sizeof text          lt  lt  sizeof text   lt  lt  std  endl      std  cout  lt  lt   strlen text          lt  lt  strlen text   lt  lt  std  endl       std  cout  lt  lt   text ordinals            for size t i   0  iMax   strlen text   i  lt  iMax    i             std  cout  lt  lt       lt  lt  static cast lt unsigned int gt                                 static cast lt unsigned char gt  text i                                         std  cout  lt  lt  std  endl  lt  lt  std  endl          - - -      const wchar t wtext     L ol          std  cout  lt  lt   sizeof wchar t       lt  lt  sizeof wchar t   lt  lt  std  endl        std  cout  lt  lt   wtext                lt  lt  wtext  lt  lt  std  endl    lt - error    std  cout  lt  lt   wtext             UNABLE TO CONVERT NATIVELY    lt  lt  std  endl      std  wcout  lt  lt  L wtext                lt  lt  wtext  lt  lt  std  endl      std  cout  lt  lt   sizeof wtext         lt  lt  sizeof wtext   lt  lt  std  endl      std  cout  lt  lt   wcslen wtext         lt  lt  wcslen wtext   lt  lt  std  endl       std  cout  lt  lt   wtext ordinals           for size t i   0  iMax   wcslen wtext   i  lt  iMax    i             std  cout  lt  lt       lt  lt  static cast lt unsigned int gt                                 static cast lt unsigned short gt  wtext i                                             std  cout  lt  lt  std  endl  lt  lt  std  endl       return 0      outputs the following text   sizeof char       1 text              ol   sizeof text       5 strlen text       4 text ordinals     111 108 195 169  sizeof wchar t    4 wtext             UNABLE TO CONVERT NATIVELY  wtext             ol  sizeof wtext      16 wcslen wtext      3 wtext ordinals    111 108 233   You ll see the  ol    text in char is really constructed by four chars  110  108  195 and 169  not counting the trailing zero    I ll let you study the wchar t code as an exercise   So  when working with a char on Linux  you should usually end up using Unicode without even knowing it  And as std  string works with char  so std  string is already unicode-ready   Note that std  string  like the C string API  will consider the  ol    string to have 4 characters  not three  So you should be cautious when truncating playing with unicode chars because some combination of chars is forbidden in UTF-8   On Windows   On Windows  this is a bit different  Win32 had to support a lot of application working with char and on different charsets codepages produced in all the world  before the advent of Unicode   So their solution was an interesting one  If an application works with char  then the char strings are encoded printed shown on GUI labels using the local charset codepage on the machine  For example   ol    would be  ol    in a French-localized Windows  but would be something different on an cyrillic-localized Windows   ol   if you use Windows-1251   Thus   historical apps  will usually still work the same old way   For Unicode based applications  Windows uses wchar t  which is 2-bytes wide  and is encoded in UTF-16  which is Unicode encoded on 2-bytes characters  or at the very least  the mostly compatible UCS-2  which is almost the same thing IIRC    Applications using char are said  multibyte   because each glyph is composed of one or more chars   while applications using wchar t are said  widechar   because each glyph is composed of one or two wchar t  See MultiByteToWideChar and WideCharToMultiByte Win32 conversion API for more info   Thus  if you work on Windows  you badly want to use wchar t  unless you use a framework hiding that  like GTK  or QT      The fact is that behind the scenes  Windows works with wchar t strings  so even historical applications will have their char strings converted in wchar t when using API like SetWindowText    low level API function to set the label on a Win32 GUI    Memory issues   UTF-32 is 4 bytes per characters  so there is no much to add  if only that a UTF-8 text and UTF-16 text will always use less or the same amount of memory than an UTF-32 text  and usually less    If there is a memory issue  then you should know than for most western languages  UTF-8 text will use less memory than the same UTF-16 one   Still  for other languages  chinese  japanese  etc    the memory used will be either the same  or slightly larger for UTF-8 than for UTF-16   All in all  UTF-16 will mostly use 2 and occassionally 4 bytes per characters  unless you re dealing with some kind of esoteric language glyphs  Klingon  Elvish    while UTF-8 will spend from 1 to 4 bytes   See http   en wikipedia org wiki UTF-8 Compared to UTF-16 for more info   Conclusion   When I should use std  wstring over std  string   On Linux  Almost never       On Windows  Almost always       On cross-platform code  Depends on your toolkit            unless you use a toolkit framework saying otherwise Can std  string hold all the ASCII character set including special characters   Notice  A std  string is suitable for holding a  binary  buffer  where a std  wstring is not   On Linux  Yes  On Windows  Only special characters available for the current locale of the Windows user   Edit  After a comment from Johann Gerell   a std  string will be enough to handle all char-based strings  each char being a number from 0 to 255   But    ASCII is supposed to go from 0 to 127  Higher chars are NOT ASCII  a char from 0 to 127 will be held correctly a char from 128 to 255 will have a signification depending on your encoding  unicode  non-unicode  etc    but it will be able to hold all Unicode glyphs as long as they are encoded in UTF-8   Is std  wstring supported by almost all popular C   compilers   Mostly  with the exception of GCC based compilers that are ported to Windows  It works on my g   4 3 2  under Linux   and I used Unicode API on Win32 since Visual C   6  What is exactly a wide character   On C C    it s a character type written wchar t which is larger than the simple char character type  It is supposed to be used to put inside characters whose indices  like Unicode glyphs  are larger than 255  or 127  depending

User · Answer

string  wstring   std  string is a basic string templated on a char  and std  wstring on a wchar t   char vs  wchar t  char is supposed to hold a character  usually an 8-bit character  wchar t is supposed to hold a wide character  and then  things get tricky  On Linux  a wchar t is 4 bytes  while on Windows  it s 2 bytes   What about Unicode  then   The problem is that neither char nor wchar t is directly tied to unicode   On Linux   Let s take a Linux OS  My Ubuntu system is already unicode aware  When I work with a char string  it is natively encoded in UTF-8  i e  Unicode string of chars   The following code    include  lt cstring gt   include  lt iostream gt   int main int argc  char  argv         const char text      ol           std  cout  lt  lt   sizeof char          lt  lt  sizeof char   lt  lt  std  endl      std  cout  lt  lt   text                 lt  lt  text  lt  lt  std  endl      std  cout  lt  lt   sizeof text          lt  lt  sizeof text   lt  lt  std  endl      std  cout  lt  lt   strlen text          lt  lt  strlen text   lt  lt  std  endl       std  cout  lt  lt   text ordinals            for size t i   0  iMax   strlen text   i  lt  iMax    i             std  cout  lt  lt       lt  lt  static cast lt unsigned int gt                                 static cast lt unsigned char gt  text i                                         std  cout  lt  lt  std  endl  lt  lt  std  endl          - - -      const wchar t wtext     L ol          std  cout  lt  lt   sizeof wchar t       lt  lt  sizeof wchar t   lt  lt  std  endl        std  cout  lt  lt   wtext                lt  lt  wtext  lt  lt  std  endl    lt - error    std  cout  lt  lt   wtext             UNABLE TO CONVERT NATIVELY    lt  lt  std  endl      std  wcout  lt  lt  L wtext                lt  lt  wtext  lt  lt  std  endl      std  cout  lt  lt   sizeof wtext         lt  lt  sizeof wtext   lt  lt  std  endl      std  cout  lt  lt   wcslen wtext         lt  lt  wcslen wtext   lt  lt  std  endl       std  cout  lt  lt   wtext ordinals           for size t i   0  iMax   wcslen wtext   i  lt  iMax    i             std  cout  lt  lt       lt  lt  static cast lt unsigned int gt                                 static cast lt unsigned short gt  wtext i                                             std  cout  lt  lt  std  endl  lt  lt  std  endl       return 0      outputs the following text   sizeof char       1 text              ol   sizeof text       5 strlen text       4 text ordinals     111 108 195 169  sizeof wchar t    4 wtext             UNABLE TO CONVERT NATIVELY  wtext             ol  sizeof wtext      16 wcslen wtext      3 wtext ordinals    111 108 233   You ll see the  ol    text in char is really constructed by four chars  110  108  195 and 169  not counting the trailing zero    I ll let you study the wchar t code as an exercise   So  when working with a char on Linux  you should usually end up using Unicode without even knowing it  And as std  string works with char  so std  string is already unicode-ready   Note that std  string  like the C string API  will consider the  ol    string to have 4 characters  not three  So you should be cautious when truncating playing with unicode chars because some combination of chars is forbidden in UTF-8   On Windows   On Windows  this is a bit different  Win32 had to support a lot of application working with char and on different charsets codepages produced in all the world  before the advent of Unicode   So their solution was an interesting one  If an application works with char  then the char strings are encoded printed shown on GUI labels using the local charset codepage on the machine  For example   ol    would be  ol    in a French-localized Windows  but would be something different on an cyrillic-localized Windows   ol   if you use Windows-1251   Thus   historical apps  will usually still work the same old way   For Unicode based applications  Windows uses wchar t  which is 2-bytes wide  and is encoded in UTF-16  which is Unicode encoded on 2-bytes characters  or at the very least  the mostly compatible UCS-2  which is almost the same thing IIRC    Applications using char are said  multibyte   because each glyph is composed of one or more chars   while applications using wchar t are said  widechar   because each glyph is composed of one or two wchar t  See MultiByteToWideChar and WideCharToMultiByte Win32 conversion API for more info   Thus  if you work on Windows  you badly want to use wchar t  unless you use a framework hiding that  like GTK  or QT      The fact is that behind the scenes  Windows works with wchar t strings  so even historical applications will have their char strings converted in wchar t when using API like SetWindowText    low level API function to set the label on a Win32 GUI    Memory issues   UTF-32 is 4 bytes per characters  so there is no much to add  if only that a UTF-8 text and UTF-16 text will always use less or the same amount of memory than an UTF-32 text  and usually less    If there is a memory issue  then you should know than for most western languages  UTF-8 text will use less memory than the same UTF-16 one   Still  for other languages  chinese  japanese  etc    the memory used will be either the same  or slightly larger for UTF-8 than for UTF-16   All in all  UTF-16 will mostly use 2 and occassionally 4 bytes per characters  unless you re dealing with some kind of esoteric language glyphs  Klingon  Elvish    while UTF-8 will spend from 1 to 4 bytes   See http   en wikipedia org wiki UTF-8 Compared to UTF-16 for more info   Conclusion   When I should use std  wstring over std  string   On Linux  Almost never       On Windows  Almost always       On cross-platform code  Depends on your toolkit            unless you use a toolkit framework saying otherwise Can std  string hold all the ASCII character set including special characters   Notice  A std  string is suitable for holding a  binary  buffer  where a std  wstring is not   On Linux  Yes  On Windows  Only special characters available for the current locale of the Windows user   Edit  After a comment from Johann Gerell   a std  string will be enough to handle all char-based strings  each char being a number from 0 to 255   But    ASCII is supposed to go from 0 to 127  Higher chars are NOT ASCII  a char from 0 to 127 will be held correctly a char from 128 to 255 will have a signification depending on your encoding  unicode  non-unicode  etc    but it will be able to hold all Unicode glyphs as long as they are encoded in UTF-8   Is std  wstring supported by almost all popular C   compilers   Mostly  with the exception of GCC based compilers that are ported to Windows  It works on my g   4 3 2  under Linux   and I used Unicode API on Win32 since Visual C   6  What is exactly a wide character   On C C    it s a character type written wchar t which is larger than the simple char character type  It is supposed to be used to put inside characters whose indices  like Unicode glyphs  are larger than 255  or 127  depending

User · Answer

string  wstring   std  string is a basic string templated on a char  and std  wstring on a wchar t   char vs  wchar t  char is supposed to hold a character  usually an 8-bit character  wchar t is supposed to hold a wide character  and then  things get tricky  On Linux  a wchar t is 4 bytes  while on Windows  it s 2 bytes   What about Unicode  then   The problem is that neither char nor wchar t is directly tied to unicode   On Linux   Let s take a Linux OS  My Ubuntu system is already unicode aware  When I work with a char string  it is natively encoded in UTF-8  i e  Unicode string of chars   The following code    include  lt cstring gt   include  lt iostream gt   int main int argc  char  argv         const char text      ol           std  cout  lt  lt   sizeof char          lt  lt  sizeof char   lt  lt  std  endl      std  cout  lt  lt   text                 lt  lt  text  lt  lt  std  endl      std  cout  lt  lt   sizeof text          lt  lt  sizeof text   lt  lt  std  endl      std  cout  lt  lt   strlen text          lt  lt  strlen text   lt  lt  std  endl       std  cout  lt  lt   text ordinals            for size t i   0  iMax   strlen text   i  lt  iMax    i             std  cout  lt  lt       lt  lt  static cast lt unsigned int gt                                 static cast lt unsigned char gt  text i                                         std  cout  lt  lt  std  endl  lt  lt  std  endl          - - -      const wchar t wtext     L ol          std  cout  lt  lt   sizeof wchar t       lt  lt  sizeof wchar t   lt  lt  std  endl        std  cout  lt  lt   wtext                lt  lt  wtext  lt  lt  std  endl    lt - error    std  cout  lt  lt   wtext             UNABLE TO CONVERT NATIVELY    lt  lt  std  endl      std  wcout  lt  lt  L wtext                lt  lt  wtext  lt  lt  std  endl      std  cout  lt  lt   sizeof wtext         lt  lt  sizeof wtext   lt  lt  std  endl      std  cout  lt  lt   wcslen wtext         lt  lt  wcslen wtext   lt  lt  std  endl       std  cout  lt  lt   wtext ordinals           for size t i   0  iMax   wcslen wtext   i  lt  iMax    i             std  cout  lt  lt       lt  lt  static cast lt unsigned int gt                                 static cast lt unsigned short gt  wtext i                                             std  cout  lt  lt  std  endl  lt  lt  std  endl       return 0      outputs the following text   sizeof char       1 text              ol   sizeof text       5 strlen text       4 text ordinals     111 108 195 169  sizeof wchar t    4 wtext             UNABLE TO CONVERT NATIVELY  wtext             ol  sizeof wtext      16 wcslen wtext      3 wtext ordinals    111 108 233   You ll see the  ol    text in char is really constructed by four chars  110  108  195 and 169  not counting the trailing zero    I ll let you study the wchar t code as an exercise   So  when working with a char on Linux  you should usually end up using Unicode without even knowing it  And as std  string works with char  so std  string is already unicode-ready   Note that std  string  like the C string API  will consider the  ol    string to have 4 characters  not three  So you should be cautious when truncating playing with unicode chars because some combination of chars is forbidden in UTF-8   On Windows   On Windows  this is a bit different  Win32 had to support a lot of application working with char and on different charsets codepages produced in all the world  before the advent of Unicode   So their solution was an interesting one  If an application works with char  then the char strings are encoded printed shown on GUI labels using the local charset codepage on the machine  For example   ol    would be  ol    in a French-localized Windows  but would be something different on an cyrillic-localized Windows   ol   if you use Windows-1251   Thus   historical apps  will usually still work the same old way   For Unicode based applications  Windows uses wchar t  which is 2-bytes wide  and is encoded in UTF-16  which is Unicode encoded on 2-bytes characters  or at the very least  the mostly compatible UCS-2  which is almost the same thing IIRC    Applications using char are said  multibyte   because each glyph is composed of one or more chars   while applications using wchar t are said  widechar   because each glyph is composed of one or two wchar t  See MultiByteToWideChar and WideCharToMultiByte Win32 conversion API for more info   Thus  if you work on Windows  you badly want to use wchar t  unless you use a framework hiding that  like GTK  or QT      The fact is that behind the scenes  Windows works with wchar t strings  so even historical applications will have their char strings converted in wchar t when using API like SetWindowText    low level API function to set the label on a Win32 GUI    Memory issues   UTF-32 is 4 bytes per characters  so there is no much to add  if only that a UTF-8 text and UTF-16 text will always use less or the same amount of memory than an UTF-32 text  and usually less    If there is a memory issue  then you should know than for most western languages  UTF-8 text will use less memory than the same UTF-16 one   Still  for other languages  chinese  japanese  etc    the memory used will be either the same  or slightly larger for UTF-8 than for UTF-16   All in all  UTF-16 will mostly use 2 and occassionally 4 bytes per characters  unless you re dealing with some kind of esoteric language glyphs  Klingon  Elvish    while UTF-8 will spend from 1 to 4 bytes   See http   en wikipedia org wiki UTF-8 Compared to UTF-16 for more info   Conclusion   When I should use std  wstring over std  string   On Linux  Almost never       On Windows  Almost always       On cross-platform code  Depends on your toolkit            unless you use a toolkit framework saying otherwise Can std  string hold all the ASCII character set including special characters   Notice  A std  string is suitable for holding a  binary  buffer  where a std  wstring is not   On Linux  Yes  On Windows  Only special characters available for the current locale of the Windows user   Edit  After a comment from Johann Gerell   a std  string will be enough to handle all char-based strings  each char being a number from 0 to 255   But    ASCII is supposed to go from 0 to 127  Higher chars are NOT ASCII  a char from 0 to 127 will be held correctly a char from 128 to 255 will have a signification depending on your encoding  unicode  non-unicode  etc    but it will be able to hold all Unicode glyphs as long as they are encoded in UTF-8   Is std  wstring supported by almost all popular C   compilers   Mostly  with the exception of GCC based compilers that are ported to Windows  It works on my g   4 3 2  under Linux   and I used Unicode API on Win32 since Visual C   6  What is exactly a wide character   On C C    it s a character type written wchar t which is larger than the simple char character type  It is supposed to be used to put inside characters whose indices  like Unicode glyphs  are larger than 255  or 127  depending

User · Answer

There are some very good answers here  but I think there are a couple of things I can add regarding Windows Visual Studio  Tis is based on my experience with VS2015  On Linux  basically the answer is to use UTF-8 encoded std  string everywhere  On Windows VS it gets more complex  Here is why  Windows expects strings stored using chars to be encoded using the locale codepage  This is almost always the ASCII character set followed by 128 other special characters depending on your location  Let me just state that this in not just when using the Windows API  there are three other major places where these strings interact with standard C    These are string literals  output to std  cout using  lt  lt  and passing a filename to std  fstream   I will be up front here that I am a programmer  not a language specialist  I appreciate that USC2 and UTF-16 are not the same  but for my purposes they are close enough to be interchangeable and I use them as such here  I m not actually sure which Windows uses  but I generally don t need to know either  I ve stated UCS2 in this answer  so sorry in advance if I upset anyone with my ignorance of this matter and I m happy to change it if I have things wrong   String literals  If you enter string literals that contain only characters that can be represented by your codepage then VS stores them in your file with 1 byte per character encoding based on your codepage  Note that if you change your codepage or give your source to another developer using a different code page then I think  but haven t tested  that the character will end up different  If you run your code on a computer using a different code page then I m not sure if the character will change too   If you enter any string literals that cannot be represented by your codepage then VS will ask you to save the file as Unicode  The file will then be encoded as UTF-8  This means that all Non ASCII characters  including those which are on your codepage  will be represented by 2 or more bytes  This means if you give your source to someone else the source will look the same  However  before passing the source to the compiler  VS converts the UTF-8 encoded text to code page encoded text and any characters missing from the code page are replaced with     The only way to guarantee correctly representing a Unicode string literal in VS is to precede the string literal with an L making it a wide string literal  In this case VS will convert the UTF-8 encoded text from the file into UCS2  You then need to pass this string literal into a std  wstring constructor or you need to convert it to utf-8 and put it in a std  string  Or if you want you can use the Windows API functions to encode it using your code page to put it in a std  string  but then you may as well have not used a wide string literal   std  cout  When outputting to the console using  lt  lt  you can only use std  string  not std  wstring and the text must be encoded using your locale codepage  If you have a std  wstring then you must convert it using one of the Windows API functions and any characters not on your codepage get replaced by    maybe you can change the character  I can t remember    std  fstream filenames  Windows OS uses UCS2 UTF-16 for its filenames so whatever your codepage  you can have files with any Unicode character  But this means that to access or create files with characters not on your codepage you must use std  wstring  There is no other way  This is a Microsoft specific extension to std  fstream so probably won t compile on other systems  If you use std  string then you can only utilise filenames that only include characters on your codepage   Your options  If you are just working on Linux then you probably didn t get this far  Just use UTF-8 std  string everywhere   If you are just working on Windows just use UCS2 std  wstring everywhere  Some purists may say use UTF8 then convert when needed  but why bother with the hassle   If you are cross platform then it s a mess to be frank  If you try to use UTF-8 everywhere on Windows then you need to be really careful with your string literals and output to the console  You can easily corrupt your strings there  If you use std  wstring everywhere on Linux then you may not have access to the wide version of std  fstream  so you have to do the conversion  but there is no risk of corruption  So personally I think this is a better option  Many would disagree  but I m not alone - it s the path taken by wxWidgets for example   Another option could be to typedef unicodestring as std  string on Linux and std  wstring on Windows  and have a macro called UNI   which prefixes L on Windows and nothing on Linux  then the code   include  lt fstream gt   include  lt string gt   include  lt iostream gt   include  lt Windows h gt    ifdef  WIN32 typedef std  wstring unicodestring   define UNI text  L    text std  string formatForConsole const unicodestring  amp str        std  string result        Call WideCharToMultiByte to do the conversion     return result     else typedef std  string unicodestring   define UNI text  text std  string formatForConsole const unicodestring  amp str        return str     endif  int main          unicodestring fileName UNI  fileName         std  ofstream fout      fout open fileName       std  cout  lt  lt  formatForConsole fileName   lt  lt  std  endl      return 0      would be fine on either platform I think   Answers  So To answer your questions  1  If you are programming for Windows  then all the time  if cross platform then maybe all the time  unless you want to deal with possible corruption issues on Windows or write some code with platform specific  ifdefs to work around the differences  if just using Linux then never   2 Yes  In addition on Linux you can use it for all Unicode too  On Windows you can only use it for all unicode if you choose to manually encode using UTF-8  But the Windows API and standard C   classes will expect the std  string to be encoded using the locale codepage  This includes all ASCII plus another 128 characters which change depending on the codepage your computer is setup to use   3 I believe so  but if not then it is just a simple typedef of a  std  basic string  using wchar t instead of char  4 A wide character is a character type which is bigger than the 1 byte standard char type  On Windows it is 2 bytes  on Linux it is 4 bytes

User · Answer

So  every reader here now should have a clear understanding about the facts  the situation  If not  then you must read paercebal s outstandingly comprehensive answer  btw  thanks     My pragmatical conclusion is shockingly simple  all that C    and STL   character encoding  stuff is substantially broken and useless  Blame it on Microsoft or not  that will not help anyway   My solution  after in-depth investigation  much frustration and the consequential experiences is the following    accept  that you have to be responsible on your own for the encoding and conversion stuff  and you will see that much of it is rather trivial  use std  string for any UTF-8 encoded strings  just a typedef std  string UTF8String  accept that such an UTF8String object is just a dumb  but cheap container  Do never ever access and or manipulate characters in it directly  no search  replace  and so on   You could  but you really just really  really do not want to waste your time writing text manipulation algorithms for multi-byte strings  Even if other people already did such stupid things  don t do that  Let it be   Well  there are scenarios where it makes sense    just use the ICU library for those   use std  wstring for UCS-2 encoded strings  typedef std  wstring UCS2String  - this is a compromise  and a concession to the mess that the WIN32 API introduced   UCS-2 is sufficient for most of us  more on that later      use UCS2String instances whenever a character-by-character access is required  read  manipulate  and so on   Any character-based processing should be done in a NON-multibyte-representation  It is simple  fast  easy  add two utility functions to convert back  amp  forth between UTF-8 and UCS-2   UCS2String ConvertToUCS2  const UTF8String  amp str    UTF8String ConvertToUTF8  const UCS2String  amp str       The conversions are straightforward  google should help here      That s it  Use UTF8String wherever memory is precious and for all UTF-8 I O  Use UCS2String wherever the string must be parsed and or manipulated  You can convert between those two representations any time   Alternatives  amp  Improvements   conversions from  amp  to single-byte character encodings  e g  ISO-8859-1  can be realized with help of plain translation tables  e g  const wchar t tt iso88951 256     0 1 2       and appropriate code for conversion to  amp  from UCS2  if UCS-2 is not sufficient  than switch to UCS-4  typedef std  basic string lt uint32 t gt  UCS2String    ICU or other unicode libraries   For advanced stuff

User · Answer

When you want to store  wide   Unicode  characters  Yes  255 of them  excluding 0   Yes  Here s an introductory article  http   www joelonsoftware com articles Unicode html

User · Answer

1  As mentioned by Greg  wstring is helpful for internationalization  that s when you will be releasing your product in languages other than english  4  Check this out for wide character http   en wikipedia org wiki Wide character

User · Answer

I recommend avoiding std  wstring on Windows or elsewhere  except when required by the interface  or anywhere near Windows API calls and respective encoding conversions as a syntactic sugar    My view is summarized in http   utf8everywhere org of which I am a co-author    Unless your application is API-call-centric  e g  mainly UI application  the suggestion is to store Unicode strings in std  string and encoded in UTF-8  performing conversion near API calls  The benefits outlined in the article outweigh the apparent annoyance of conversion  especially in complex applications  This is doubly so for multi-platform and library development    And now  answering your questions    A few weak reasons  It exists for historical reasons  where widechars were believed to be the proper way of supporting Unicode  It is now used to interface APIs that prefer UTF-16 strings  I use them only in the direct vicinity of such API calls  This has nothing to do with std  string  It can hold whatever encoding you put in it  The only question is how You treat its content  My recommendation is UTF-8  so it will be able to hold all Unicode characters correctly  It s a common practice on Linux  but I think Windows programs should do it also  No   Wide character is a confusing name  In the early days of Unicode  there was a belief that a character can be encoded in two bytes  hence the name  Today  it stands for  any part of the character that is two bytes long   UTF-16 is seen as a sequence of such byte pairs  aka Wide characters   A character in UTF-16 takes either one or two pairs

User · Answer

1  As mentioned by Greg  wstring is helpful for internationalization  that s when you will be releasing your product in languages other than english  4  Check this out for wide character http   en wikipedia org wiki Wide character

[c++] std::wstring VS std::string

Examples related to c++

Examples related to string

Examples related to unicode

Examples related to c++-faq

Examples related to wstring