What encoding code page is cmd exe using

Question

When I open cmd exe in Windows  what encoding is it using   How can I check which encoding it is currently using  Does it depend on my regional setting or are there any environment variables to check   What happens when you type a file with a certain encoding  Sometimes I get garbled characters  incorrect encoding used  and sometimes it kind of works  However I don t trust anything as long as I don t know what s going on  Can anyone explain

User · Accepted Answer

Yes  it   s frustrating   sometimes type and other programs print gibberish  and sometimes they do not   First of all  Unicode characters will only display if the current console font contains the characters  So use a TrueType font like Lucida Console instead of the default Raster Font   But if the console font doesn   t contain the character you   re trying to display  you   ll see question marks instead of gibberish  When you get gibberish  there   s more going on than just font settings   When programs use standard C-library I O functions like printf  the program   s output encoding must match the console   s output encoding  or you will get gibberish  chcp shows and sets the current codepage  All output using standard C-library I O functions is treated as if it is in the codepage displayed by chcp   Matching the program   s output encoding with the console   s output encoding can be accomplished in two different ways    A program can get the console   s current codepage using chcp or GetConsoleOutputCP  and configure itself to output in that encoding  or You or a program can set the console   s current codepage using chcp or SetConsoleOutputCP to match the default output encoding of the program    However  programs that use Win32 APIs can write UTF-16LE strings directly to the console with WriteConsoleW  This is the only way to get correct output without setting codepages  And even when using that function  if a string is not in the UTF-16LE encoding to begin with  a Win32 program must pass the correct codepage to MultiByteToWideChar  Also  WriteConsoleW will not work if the program   s output is redirected  more fiddling is needed in that case   type works some of the time because it checks the start of each file for a  UTF-16LE Byte Order Mark  BOM   i e  the bytes 0xFF 0xFE  If it finds such a mark  it displays the Unicode characters in the file using WriteConsoleW regardless of the current codepage  But when typeing any file without a UTF-16LE BOM  or for using non-ASCII characters with any command that doesn   t call WriteConsoleW   you will need to set the console codepage and program output encoding to match each other     How can we find this out   Here   s a test file containing Unicode characters   ASCII     abcde xyz German                     Polish    aezznl Russian               CJK            Here   s a Java program to print out the test file in a bunch of different Unicode encodings  It could be in any programming language  it only prints ASCII characters or encoded bytes to stdout   import java io     public class Foo        private static final String BOM     ufeff       private static final String TEST STRING            ASCII     abcde xyz n             German                     n             Polish    aezznl n             Russian               n             CJK          n        public static void main String   args          throws Exception               String   encodings   new String                  UTF-8    UTF-16LE    UTF-16BE    UTF-32LE    UTF-32BE              for  String encoding  encodings                System out println         encoding                for  boolean writeBom  new Boolean    false  true                     System out println writeBom      bom       no bom                     String output    writeBom   BOM         TEST STRING                  byte   bytes   output getBytes encoding                   System out write bytes                   FileOutputStream out   new FileOutputStream  uc-test-                        encoding    writeBom    -bom txt     -nobom txt                     out write bytes                   out close                                      The output in the default codepage  Total garbage   Z  andrew projects sx 1259084 gt chcp Active code page  850  Z  andrew projects sx 1259084 gt java Foo    UTF-8   no bom ASCII     abcde xyz German                           Polish    -  -              Russian                                             CJK                      bom     ASCII     abcde xyz German                           Polish    -  -              Russian                                             CJK                       UTF-16LE   no bom A S C I I           a b c d e   x y z  G e r m a n                    -            P o l i s h             z   D B   R u s s i a n       0 1 2 3 4 5 6   M N O   C J K                O Y    bom     A S C I I           a b c d e   x y z  G e r m a n                    -            P o l i s h             z   D B   R u s s i a n       0 1 2 3 4 5 6   M N O   C J K                O Y     UTF-16BE   no bom  A S C I I           a b c d e   x y z  G e r m a n                    -            P o l i s h             z   D B  R u s s i a n       0 1 2 3 4 5 6   M N O  C J K              O Y    bom      A S C I I           a b c d e   x y z  G e r m a n                    -            P o l i s h             z   D B  R u s s i a n       0 1 2 3 4 5 6   M N O  C J K              O Y     UTF-32LE   no bom A   S   C   I   I                       a   b   c   d   e       x   y   z    G   e   r   m   a   n                                      -                      P   o   l   i   s   h                           z       D   B     R   u   s   s   i   a   n               0   1   2   3   4   5   6       M   N    O     C   J   K                                O   Y      bom       A   S   C   I   I                       a   b   c   d   e       x   y   z     G   e   r   m   a   n                                      -                      P   o   l   i   s   h                           z       D   B     R   u   s   s   i   a   n               0   1   2   3   4   5   6       M   N    O     C   J   K                                O   Y       UTF-32BE   no bom    A   S   C   I   I                       a   b   c   d   e       x   y   z    G   e   r   m   a   n                                      -                      P   o   l   i   s   h                           z       D   B    R   u   s   s   i   a   n               0   1   2   3   4   5   6       M   N    O    C   J   K                              O   Y    bom          A   S   C   I   I                       a   b   c   d   e       x   y   z     G   e   r   m   a   n                                      -                      P   o   l   i   s   h                           z       D   B    R   u   s   s   i   a   n               0   1   2   3   4   5   6       M   N    O    C   J   K                              O   Y    However  what if we type the files that got saved  They contain the exact same bytes that were printed to the console   Z  andrew projects sx 1259084 gt type   txt  uc-test-UTF-16BE-bom txt        A S C I I           a b c d e   x y z  G e r m a n                    -            P o l i s h             z   D B  R u s s i a n       0 1 2 3 4 5 6   M N O  C J K              O Y   uc-test-UTF-16BE-nobom txt    A S C I I           a b c d e   x y z  G e r m a n                    -            P o l i s h             z   D B  R u s s i a n       0 1 2 3 4 5 6   M N O  C J K              O Y   uc-test-UTF-16LE-bom txt   ASCII     abcde xyz German                     Polish    aezznl Russian               CJK           uc-test-UTF-16LE-nobom txt   A S C I I           a b c d e   x y z  G e r m a n                    -            P o l i s h             z   D B   R u s s i a n       0 1 2 3 4 5 6   M N O   C J K                O Y  uc-test-UTF-32BE-bom txt            A   S   C   I   I                       a   b   c   d   e       x   y   z     G   e   r   m   a   n                                      -                      P   o   l   i   s   h                           z       D   B    R   u   s   s   i   a   n               0   1   2   3   4   5   6       M   N    O    C   J   K                              O   Y   uc-test-UTF-32BE-nobom txt      A   S   C   I   I                       a   b   c   d   e       x   y   z    G   e   r   m   a   n                                      -                      P   o   l   i   s   h                           z       D   B    R   u   s   s   i   a   n               0   1   2   3   4   5   6       M   N    O    C   J   K                              O   Y   uc-test-UTF-32LE-bom txt    A S C I I           a b c d e   x y z  G e r m a n                                   P o l i s h         a e z z n l  R u s s i a n                              C J K                    uc-test-UTF-32LE-nobom txt   A   S   C   I   I                       a   b   c   d   e       x   y   z    G   e   r   m   a   n                                      -                      P   o   l   i   s   h                           z       D   B     R   u   s   s   i   a   n               0   1   2   3   4   5   6       M   N    O     C   J   K                                O   Y  uc-test-UTF-8-bom txt       ASCII     abcde xyz German                           Polish    -  -              Russian                                             CJK                     uc-test-UTF-8-nobom txt   ASCII     abcde xyz German                           Polish    -  -              Russian                                             CJK                      The only thing that works is UTF-16LE file  with a BOM  printed to the console via type   If we use anything other than type to print the file  we get garbage   Z  andrew projects sx 1259084 gt copy uc-test-UTF-16LE-bom txt CON     A S C I I           a b c d e   x y z  G e r m a n                    -            P o l i s h             z   D B   R u s s i a n       0 1 2 3 4 5 6   M N O   C J K                O Y          1 file s  copied    From the fact that copy CON does not display Unicode correctly  we can conclude that the type command has logic to detect a UTF-16LE BOM at the start of the file  and use special Windows APIs to print it   We can see this by opening cmd exe in a debugger when it goes to type out a file     After type opens a file  it checks for a BOM of 0xFEFF   i e   the bytes 0xFF 0xFE in little-endian   and if there is such a BOM  type sets an internal fOutputUnicode flag  This flag is checked later to decide whether to call WriteConsoleW   But that   s the only way to get type to output Unicode  and only for files that have BOMs and are in UTF-16LE  For all other files  and for programs that don   t have special code to handle console output  your files will be interpreted according to the current codepage  and will likely show up as gibberish   You can emulate how type outputs Unicode to the console in your own programs like so    include  lt stdio h gt   define UNICODE  include  lt windows h gt   static LPCSTR lpcsTest        ASCII     abcde xyz n       German                     n       Polish    aezznl n       Russian               n       CJK          n    int main         int n      wchar t buf 1024        HANDLE hConsole   GetStdHandle STD OUTPUT HANDLE        n   MultiByteToWideChar CP UTF8  0              lpcsTest  strlen lpcsTest               buf  sizeof buf         WriteConsole hConsole  buf  n   amp n  NULL        return 0      This program works for printing Unicode on the Windows console using the default codepage     For the sample Java program  we can get a little bit of correct output by setting the codepage manually  though the output gets messed up in weird ways   Z  andrew projects sx 1259084 gt chcp 65001 Active code page  65001  Z  andrew projects sx 1259084 gt java Foo    UTF-8   no bom ASCII     abcde xyz German                     Polish    aezznl Russian               CJK                CJK                    bom ASCII     abcde xyz German                     Polish    aezznl Russian               CJK                 CJK                      UTF-16LE   no bom A S C I I           a b c d e   x y z       However  a C program that sets a Unicode UTF-8 codepage    include  lt stdio h gt   include  lt windows h gt   int main         int c  n      UINT oldCodePage      char buf 1024        oldCodePage   GetConsoleOutputCP        if   SetConsoleOutputCP 65001             printf  error n               freopen  uc-test-UTF-8-nobom txt    rb   stdin       n   fread buf  sizeof buf 0    sizeof buf   stdin       fwrite buf  sizeof buf 0    n  stdout        SetConsoleOutputCP oldCodePage        return 0      does have correct output   Z  andrew projects sx 1259084 gt   test ASCII     abcde xyz German                     Polish    aezznl Russian               CJK              The moral of the story    type can print UTF-16LE files with a BOM regardless of your current codepage Win32 programs can be programmed to output Unicode to the console  using WriteConsoleW  Other programs which set the codepage and adjust their output encoding accordingly can print Unicode on the console regardless of what the codepage was when the program started For everything else you will have to mess around with chcp  and will probably still get weird output

User · Answer

To answer your second query re  how encoding works  Joel Spolsky wrote a great introductory article on this  Strongly recommended

User · Answer

Type  chcp   to see your current code page  as Dewfy already said    Use  nlsinfo   to see all installed code pages and find out what your code page number means   You need to have Windows Server 2003 Resource kit installed  works on Windows nbsp XP  to use nlsinfo

User · Answer

In Java I used encoding  IBM850  to write the file  That solved the problem

User · Answer

I ve been frustrated for long by Windows code page issues  and the C programs portability and localisation issues they cause  The previous posts have detailed the issues at length  so I m not going to add anything in this respect   To make a long story short  eventually I ended up writing my own UTF-8 compatibility library layer over the Visual C   standard C library  Basically this library ensures that a standard C program works right  in any code page  using UTF-8 internally   This library  called MsvcLibX  is available as open source at https   github com JFLarvoire SysToolsLib  Main features    C sources encoded in UTF-8  using normal char   C strings  and standard C library APIs  In any code page  everything is processed internally as UTF-8 in your code  including the main   routine argv    with standard input and output automatically converted to the right code page  All stdio h file functions support UTF-8 pathnames   260 characters  up to 64 KBytes actually  The same sources can compile and link successfully in Windows using Visual C   and MsvcLibX and Visual C   C library  and in Linux using gcc and Linux standard C library  with no need for  ifdef      endif blocks  Adds include files common in Linux  but missing in Visual C    Ex  unistd h Adds missing functions  like those for directory I O  symbolic link management  etc  all with UTF-8 support of course  -     More details in the MsvcLibX README on GitHub  including how to build the library and use it in your own programs   The release section in the above GitHub repository provides several programs using this MsvcLibX library  that will show its capabilities  Ex  Try my which exe tool with directories with non-ASCII names in the PATH  searching for programs with non-ASCII names  and changing code pages   Another useful tool there is the conv exe program  This program can easily convert a data stream from any code page to any other  Its default is input in the Windows code page  and output in the current console code page  This allows to correctly view data generated by Windows GUI apps  ex  Notepad  in a command console  with a simple command like  type WINFILE txt   conv  This MsvcLibX library is by no means complete  and contributions for improving it are welcome

User · Answer

Command CHCP shows the current codepage  It has three digits  8xx and is different from Windows 12xx  So typing a English-only text you wouldn t see any difference  but an extended codepage  like Cyrillic  will be printed wrongly

[windows] What encoding/code page is cmd.exe using?

Examples related to windows

Examples related to command-line

Examples related to encoding