[unicode] How to use unicode characters in Windows command line?

We have a project in Team Foundation Server (TFS) that has a non-English character (š) in it. When trying to script a few build-related things we've stumbled upon a problem - we can't pass the š letter to the command-line tools. The command prompt or what not else messes it up, and the tf.exe utility can't find the specified project.

I've tried different formats for the .bat file (ANSI, UTF-8 with and without BOM) as well as scripting it in JavaScript (which is Unicode inherently) - but no luck. How do I execute a program and pass it a Unicode command line?

This question is related to unicode command-line input windows-console

The answer is


For a similar problem, (my problem was to show UTF-8 characters from MySQL on a command prompt),

I solved it like this:

  1. I changed the font of command prompt to Lucida Console. (This step must be irrelevant for your situation. It has to do only with what you see on the screen and not with what is really the character).

  2. I changed the codepage to Windows-1253. You do this on the command prompt by "chcp 1253". It worked for my case where I wanted to see UTF-8.


Actually, the trick is that the command prompt actually understands these non-english characters, just can't display them correctly.

When I enter a path in the command prompt that contains some non-english chracters it is displayed as "?? ?????? ?????". When you submit your command (cd "??? ?????? ?????" in my case), everything is working as expected.


Check the language for non-Unicode programs. If you have problems with Russian in the Windows console, then you should set Russian here:

Changing language for non-Unicode programs


Actually, the trick is that the command prompt actually understands these non-english characters, just can't display them correctly.

When I enter a path in the command prompt that contains some non-english chracters it is displayed as "?? ?????? ?????". When you submit your command (cd "??? ?????? ?????" in my case), everything is working as expected.


I had same problem (I'm from the Czech Republic). I have an English installation of Windows, and I have to work with files on a shared drive. Paths to the files include Czech-specific characters.

The solution that works for me is:

In the batch file, change the charset page

My batch file:

chcp 1250
copy "O:\VEREJNÉ\ŽŽŽŽŽŽ\Ž.xls" c:\temp

The batch file has to be saved in CP 1250.

Note that the console will not show characters correctly, but it will understand them...


On a Windows 10 x64 machine, I made the command prompt display non-English characters by:

Open an elevated command prompt (run CMD.EXE as administrator). Query your registry for available TrueType fonts to the console by:

    REG query "HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Console\TrueTypeFont"

You'll see an output like:

    0    REG_SZ    Lucida Console
    00    REG_SZ    Consolas
    936    REG_SZ    *???
    932    REG_SZ    *MS ????

Now we need to add a TrueType font that supports the characters you need like Courier New. We do this by adding zeros to the string name, so in this case the next one would be "000":

    REG ADD "HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Console\TrueTypeFont" /v 000 /t REG_SZ /d "Courier New"

Now we implement UTF-8 support:

    REG ADD HKCU\Console /v CodePage /t REG_DWORD /d 65001 /f

Set default font to "Courier New":

    REG ADD HKCU\Console /v FaceName /t REG_SZ /d "Courier New" /f

Set font size to 20:

    REG ADD HKCU\Console /v FontSize /t REG_DWORD /d 20 /f

Enable quick edit if you like:

    REG ADD HKCU\Console /v QuickEdit /t REG_DWORD /d 1 /f

Try:

chcp 65001

which will change the code page to UTF-8. Also, you need to use Lucida console fonts.


As I haven't seen any full answers for Python 2.7, I'll outline the two important steps and an optional step that is quite useful.

  1. You need a font with Unicode support. Windows comes with Lucida Console which may be selected by right-clicking the title bar of command prompt and clicking the Defaults option. This also gives access to colours. Note that you can also change settings for command windows invoked in certain ways (e.g, open here, Visual Studio) by choosing Properties instead.
  2. You need to set the code page to cp65001, which appears to be Microsoft's attempt to offer UTF-7 and UTF-8 support to command prompt. Do this by running chcp 65001 in command prompt. Once set, it remains this way until the window is closed. You'll need to redo this every time you launch cmd.exe.

For a more permanent solution, refer to this answer on Super User. In short, create a REG_SZ (String) entry using regedit at HKEY_LOCAL_MACHINE\Software\Microsoft\Command Processor and name it AutoRun. Change the value of it to chcp 65001. If you don't want to see the output message from the command, use @chcp 65001>nul instead.

Some programs have trouble interacting with this encoding, MinGW being a notable one that fails while compiling with a nonsensical error message. Nonetheless, this works very well and doesn't cause bugs with the majority of programs.


I got around a similar issue deleting Unicode-named files by referring to them in the batch file by their short (8 dot 3) names.

The short names can be viewed by doing dir /x. Obviously, this only works with Unicode file names that are already known.


A quick decision for .bat files if you computer displays your path/file name correct when you typing it in DOS-window:

  1. copy con temp.txt [press Enter]
  2. Type the path/file name [press Enter]
  3. Press Ctrl-Z [press Enter]

This way you create a .txt file - temp.txt. Open it in Notepad, copy the text (don't worry it will look unreadable) and paste it in your .bat file. Executing the .bat created this way in DOS-window worked for m? (Cyrillic, Bulgarian).


On a Windows 10 x64 machine, I made the command prompt display non-English characters by:

Open an elevated command prompt (run CMD.EXE as administrator). Query your registry for available TrueType fonts to the console by:

    REG query "HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Console\TrueTypeFont"

You'll see an output like:

    0    REG_SZ    Lucida Console
    00    REG_SZ    Consolas
    936    REG_SZ    *???
    932    REG_SZ    *MS ????

Now we need to add a TrueType font that supports the characters you need like Courier New. We do this by adding zeros to the string name, so in this case the next one would be "000":

    REG ADD "HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Console\TrueTypeFont" /v 000 /t REG_SZ /d "Courier New"

Now we implement UTF-8 support:

    REG ADD HKCU\Console /v CodePage /t REG_DWORD /d 65001 /f

Set default font to "Courier New":

    REG ADD HKCU\Console /v FaceName /t REG_SZ /d "Courier New" /f

Set font size to 20:

    REG ADD HKCU\Console /v FontSize /t REG_DWORD /d 20 /f

Enable quick edit if you like:

    REG ADD HKCU\Console /v QuickEdit /t REG_DWORD /d 1 /f

I found this method as useful in new versions of Windows 10:

Turn on this feature: "Beta: Use Unicode UTF-8 for worldwide language support"

Control panel -> Regional settings -> Administrative tab-> Change system locale...

Region Settings


Starting June 2019, with Windows 10, you won't have to change the codepage.

See "Introducing Windows Terminal" (from Kayla Cinnamon) and the Microsoft/Terminal.
Through the use of the Consolas font, partial Unicode support will be provided.

As documented in Microsoft/Terminal issue 387:

There are 87,887 ideographs currently in Unicode. You need all of them too?
We need a boundary, and characters beyond that boundary should be handled by font fallback / font linking / whatever.

What Consolas should cover:

  • Characters that used as symbols that used by modern OSS programs in CLI.
  • These characters should follow Consolas' design and metrics, and properly aligned with existing Consolas characters.

What Consolas should NOT cover:

  • Characters and punctuation of scripts that beyond Latin, Greek and Cyrillic, especially characters need complex shaping (like Arabic).
  • These characters should be handled with font fallback.

This problem is quite annoying. I usually have Chinese character in my filename and file content. Please note that I am using Windows 10, here is my solution:

To display the file name, such as dir or ls if you installed Ubuntu bash on Windows 10

  1. Set the region to support non-utf 8 character.

  2. After that, console's font will be changed to the font of that locale, and it also changes the encoding of the console.

After you have done previous steps, in order to display the file content of a UTF-8 file using command line tool

  1. Change the page to utf-8 by chcp 65001
  2. Change to the font that supports utf-8, such as Lucida Console
  3. Use type command to peek the file content, or cat if you installed Ubuntu bash on Windows 10
  4. Please note that, after setting the encoding of the console to utf-8, I can't type Chinese character in the cmd using Chinese input method.

The laziest solution: Just use a console emulator such as http://cmder.net/


A better cleaner thing to do: Just install the available, free, Microsoft Japanese language pack. (Other oriental language packs will also work, but I have tested the Japanese one.)

This gives you the fonts with the larger sets of glyphs, makes them the default behavior, changes the various Windows tools like cmd, WordPad, etc.


Starting June 2019, with Windows 10, you won't have to change the codepage.

See "Introducing Windows Terminal" (from Kayla Cinnamon) and the Microsoft/Terminal.
Through the use of the Consolas font, partial Unicode support will be provided.

As documented in Microsoft/Terminal issue 387:

There are 87,887 ideographs currently in Unicode. You need all of them too?
We need a boundary, and characters beyond that boundary should be handled by font fallback / font linking / whatever.

What Consolas should cover:

  • Characters that used as symbols that used by modern OSS programs in CLI.
  • These characters should follow Consolas' design and metrics, and properly aligned with existing Consolas characters.

What Consolas should NOT cover:

  • Characters and punctuation of scripts that beyond Latin, Greek and Cyrillic, especially characters need complex shaping (like Arabic).
  • These characters should be handled with font fallback.

A quick decision for .bat files if you computer displays your path/file name correct when you typing it in DOS-window:

  1. copy con temp.txt [press Enter]
  2. Type the path/file name [press Enter]
  3. Press Ctrl-Z [press Enter]

This way you create a .txt file - temp.txt. Open it in Notepad, copy the text (don't worry it will look unreadable) and paste it in your .bat file. Executing the .bat created this way in DOS-window worked for m? (Cyrillic, Bulgarian).


I had same problem (I'm from the Czech Republic). I have an English installation of Windows, and I have to work with files on a shared drive. Paths to the files include Czech-specific characters.

The solution that works for me is:

In the batch file, change the charset page

My batch file:

chcp 1250
copy "O:\VEREJNÉ\ŽŽŽŽŽŽ\Ž.xls" c:\temp

The batch file has to be saved in CP 1250.

Note that the console will not show characters correctly, but it will understand them...


I see several answers here, but they don't seem to address the question - the user wants to get Unicode input from the command line.

Windows uses UTF-16 for encoding in two byte strings, so you need to get these from the OS in your program. There are two ways to do this -

1) Microsoft has an extension that allows main to take a wide character array: int wmain(int argc, wchar_t *argv[]); https://msdn.microsoft.com/en-us/library/6wd819wh.aspx

2) Call the windows api to get the unicode version of the command line wchar_t win_argv = (wchar_t)CommandLineToArgvW(GetCommandLineW(), &nargs); https://docs.microsoft.com/en-us/windows/desktop/api/shellapi/nf-shellapi-commandlinetoargvw

Read this: http://utf8everywhere.org for detailed info, particularly if you are supporting other operating systems.


Changing code page to 1252 is working for me. The problem for me is the symbol double doller § is converting to another symbol by DOS on Windows Server 2008.

I have used CHCP 1252 and a cap before it in my BCP statement ^§.


I got around a similar issue deleting Unicode-named files by referring to them in the batch file by their short (8 dot 3) names.

The short names can be viewed by doing dir /x. Obviously, this only works with Unicode file names that are already known.


Try:

chcp 65001

which will change the code page to UTF-8. Also, you need to use Lucida console fonts.


It's is quite difficult to change the default Codepage of Windows console. When you search the web you find different proposals, however some of them may break your Windows entirely, i.e. your PC does not boot anymore.

The most secure solution is this one: Go to your Registry key HKEY_CURRENT_USER\Software\Microsoft\Command Processor and add String value Autorun = chcp 65001.

Or you can use this small Batch-Script for the most common code pages.

@ECHO off

SET ROOT_KEY="HKEY_CURRENT_USER"


FOR /f "skip=2 tokens=3" %%i in ('reg query HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage /v OEMCP') do set OEMCP=%%i

ECHO System default values:

ECHO.
ECHO ...............................................
ECHO Select Codepage 
ECHO ...............................................
ECHO.
ECHO 1 - CP1252
ECHO 2 - UTF-8
ECHO 3 - CP850
ECHO 4 - ISO-8859-1
ECHO 5 - ISO-8859-15
ECHO 6 - US-ASCII
ECHO.
ECHO 9 - Reset to System Default (CP%OEMCP%)
ECHO 0 - EXIT
ECHO.


SET /P  CP="Select a Codepage: "

if %CP%==1 (
    echo Set default Codepage to CP1252
    reg add "%ROOT_KEY%\Software\Microsoft\Command Processor" /v Autorun /t REG_SZ /d "@chcp 1252>nul" /f
) else if %CP%==2 (
    echo Set default Codepage to UTF-8
    reg add "%ROOT_KEY%\Software\Microsoft\Command Processor" /v Autorun /t REG_SZ /d "@chcp 65001>nul" /f
) else if %CP%==3 (
    echo Set default Codepage to CP850
    reg add "%ROOT_KEY%\Software\Microsoft\Command Processor" /v Autorun /t REG_SZ /d "@chcp 850>nul" /f
) else if %CP%==4 (
    echo Set default Codepage to ISO-8859-1
    add "%ROOT_KEY%\Software\Microsoft\Command Processor" /v Autorun /t REG_SZ /d "@chcp 28591>nul" /f
) else if %CP%==5 (
    echo Set default Codepage to ISO-8859-15
    add "%ROOT_KEY%\Software\Microsoft\Command Processor" /v Autorun /t REG_SZ /d "@chcp 28605>nul" /f
) else if %CP%==6 (
    echo Set default Codepage to ASCII
    add "%ROOT_KEY%\Software\Microsoft\Command Processor" /v Autorun /t REG_SZ /d "@chcp 20127>nul" /f
) else if %CP%==9 (
    echo Reset Codepage to System Default
    reg delete "%ROOT_KEY%\Software\Microsoft\Command Processor" /v AutoRun /f
) else if %CP%==0 (
    echo Bye
) else (
    echo Invalid choice
    pause
)

Using @chcp 65001>nul instead of chcp 65001 suppresses the output "Active code page: 65001" you would get every time you start a new command line windows.

A full list of all available number you can get from Code Page Identifiers

Note, the settings will apply only for the current user. If you like to set it for all users, replace line SET ROOT_KEY="HKEY_CURRENT_USER" by SET ROOT_KEY="HKEY_LOCAL_MACHINE"


Try:

chcp 65001

which will change the code page to UTF-8. Also, you need to use Lucida console fonts.


This problem is quite annoying. I usually have Chinese character in my filename and file content. Please note that I am using Windows 10, here is my solution:

To display the file name, such as dir or ls if you installed Ubuntu bash on Windows 10

  1. Set the region to support non-utf 8 character.

  2. After that, console's font will be changed to the font of that locale, and it also changes the encoding of the console.

After you have done previous steps, in order to display the file content of a UTF-8 file using command line tool

  1. Change the page to utf-8 by chcp 65001
  2. Change to the font that supports utf-8, such as Lucida Console
  3. Use type command to peek the file content, or cat if you installed Ubuntu bash on Windows 10
  4. Please note that, after setting the encoding of the console to utf-8, I can't type Chinese character in the cmd using Chinese input method.

The laziest solution: Just use a console emulator such as http://cmder.net/


Changing code page to 1252 is working for me. The problem for me is the symbol double doller § is converting to another symbol by DOS on Windows Server 2008.

I have used CHCP 1252 and a cap before it in my BCP statement ^§.


A better cleaner thing to do: Just install the available, free, Microsoft Japanese language pack. (Other oriental language packs will also work, but I have tested the Japanese one.)

This gives you the fonts with the larger sets of glyphs, makes them the default behavior, changes the various Windows tools like cmd, WordPad, etc.


Check the language for non-Unicode programs. If you have problems with Russian in the Windows console, then you should set Russian here:

Changing language for non-Unicode programs


Try:

chcp 65001

which will change the code page to UTF-8. Also, you need to use Lucida console fonts.


I found this method as useful in new versions of Windows 10:

Turn on this feature: "Beta: Use Unicode UTF-8 for worldwide language support"

Control panel -> Regional settings -> Administrative tab-> Change system locale...

Region Settings


One really simple option is to install a Windows bash shell such as MinGW and use that:

Enter image description here

There is a little bit of a learning curve as you will need to use Unix command line functionality, but you will love the power of it and you can set the console character set to UTF-8.

Enter image description here

Of course you also get all the usual *nix goodies like grep, find, less, etc.


I see several answers here, but they don't seem to address the question - the user wants to get Unicode input from the command line.

Windows uses UTF-16 for encoding in two byte strings, so you need to get these from the OS in your program. There are two ways to do this -

1) Microsoft has an extension that allows main to take a wide character array: int wmain(int argc, wchar_t *argv[]); https://msdn.microsoft.com/en-us/library/6wd819wh.aspx

2) Call the windows api to get the unicode version of the command line wchar_t win_argv = (wchar_t)CommandLineToArgvW(GetCommandLineW(), &nargs); https://docs.microsoft.com/en-us/windows/desktop/api/shellapi/nf-shellapi-commandlinetoargvw

Read this: http://utf8everywhere.org for detailed info, particularly if you are supporting other operating systems.


As I haven't seen any full answers for Python 2.7, I'll outline the two important steps and an optional step that is quite useful.

  1. You need a font with Unicode support. Windows comes with Lucida Console which may be selected by right-clicking the title bar of command prompt and clicking the Defaults option. This also gives access to colours. Note that you can also change settings for command windows invoked in certain ways (e.g, open here, Visual Studio) by choosing Properties instead.
  2. You need to set the code page to cp65001, which appears to be Microsoft's attempt to offer UTF-7 and UTF-8 support to command prompt. Do this by running chcp 65001 in command prompt. Once set, it remains this way until the window is closed. You'll need to redo this every time you launch cmd.exe.

For a more permanent solution, refer to this answer on Super User. In short, create a REG_SZ (String) entry using regedit at HKEY_LOCAL_MACHINE\Software\Microsoft\Command Processor and name it AutoRun. Change the value of it to chcp 65001. If you don't want to see the output message from the command, use @chcp 65001>nul instead.

Some programs have trouble interacting with this encoding, MinGW being a notable one that fails while compiling with a nonsensical error message. Nonetheless, this works very well and doesn't cause bugs with the majority of programs.


For a similar problem, (my problem was to show UTF-8 characters from MySQL on a command prompt),

I solved it like this:

  1. I changed the font of command prompt to Lucida Console. (This step must be irrelevant for your situation. It has to do only with what you see on the screen and not with what is really the character).

  2. I changed the codepage to Windows-1253. You do this on the command prompt by "chcp 1253". It worked for my case where I wanted to see UTF-8.


One really simple option is to install a Windows bash shell such as MinGW and use that:

Enter image description here

There is a little bit of a learning curve as you will need to use Unix command line functionality, but you will love the power of it and you can set the console character set to UTF-8.

Enter image description here

Of course you also get all the usual *nix goodies like grep, find, less, etc.


It's is quite difficult to change the default Codepage of Windows console. When you search the web you find different proposals, however some of them may break your Windows entirely, i.e. your PC does not boot anymore.

The most secure solution is this one: Go to your Registry key HKEY_CURRENT_USER\Software\Microsoft\Command Processor and add String value Autorun = chcp 65001.

Or you can use this small Batch-Script for the most common code pages.

@ECHO off

SET ROOT_KEY="HKEY_CURRENT_USER"


FOR /f "skip=2 tokens=3" %%i in ('reg query HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage /v OEMCP') do set OEMCP=%%i

ECHO System default values:

ECHO.
ECHO ...............................................
ECHO Select Codepage 
ECHO ...............................................
ECHO.
ECHO 1 - CP1252
ECHO 2 - UTF-8
ECHO 3 - CP850
ECHO 4 - ISO-8859-1
ECHO 5 - ISO-8859-15
ECHO 6 - US-ASCII
ECHO.
ECHO 9 - Reset to System Default (CP%OEMCP%)
ECHO 0 - EXIT
ECHO.


SET /P  CP="Select a Codepage: "

if %CP%==1 (
    echo Set default Codepage to CP1252
    reg add "%ROOT_KEY%\Software\Microsoft\Command Processor" /v Autorun /t REG_SZ /d "@chcp 1252>nul" /f
) else if %CP%==2 (
    echo Set default Codepage to UTF-8
    reg add "%ROOT_KEY%\Software\Microsoft\Command Processor" /v Autorun /t REG_SZ /d "@chcp 65001>nul" /f
) else if %CP%==3 (
    echo Set default Codepage to CP850
    reg add "%ROOT_KEY%\Software\Microsoft\Command Processor" /v Autorun /t REG_SZ /d "@chcp 850>nul" /f
) else if %CP%==4 (
    echo Set default Codepage to ISO-8859-1
    add "%ROOT_KEY%\Software\Microsoft\Command Processor" /v Autorun /t REG_SZ /d "@chcp 28591>nul" /f
) else if %CP%==5 (
    echo Set default Codepage to ISO-8859-15
    add "%ROOT_KEY%\Software\Microsoft\Command Processor" /v Autorun /t REG_SZ /d "@chcp 28605>nul" /f
) else if %CP%==6 (
    echo Set default Codepage to ASCII
    add "%ROOT_KEY%\Software\Microsoft\Command Processor" /v Autorun /t REG_SZ /d "@chcp 20127>nul" /f
) else if %CP%==9 (
    echo Reset Codepage to System Default
    reg delete "%ROOT_KEY%\Software\Microsoft\Command Processor" /v AutoRun /f
) else if %CP%==0 (
    echo Bye
) else (
    echo Invalid choice
    pause
)

Using @chcp 65001>nul instead of chcp 65001 suppresses the output "Active code page: 65001" you would get every time you start a new command line windows.

A full list of all available number you can get from Code Page Identifiers

Note, the settings will apply only for the current user. If you like to set it for all users, replace line SET ROOT_KEY="HKEY_CURRENT_USER" by SET ROOT_KEY="HKEY_LOCAL_MACHINE"


Examples related to unicode

How to resolve TypeError: can only concatenate str (not "int") to str (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape UnicodeEncodeError: 'ascii' codec can't encode character at special name Python NLTK: SyntaxError: Non-ASCII character '\xc3' in file (Sentiment Analysis -NLP) HTML for the Pause symbol in audio and video control Javascript: Unicode string to hex Concrete Javascript Regex for Accented Characters (Diacritics) Replace non-ASCII characters with a single space UTF-8 in Windows 7 CMD NameError: global name 'unicode' is not defined - in Python 3

Examples related to command-line

Git is not working after macOS Update (xcrun: error: invalid active developer path (/Library/Developer/CommandLineTools) Flutter command not found Angular - ng: command not found how to run python files in windows command prompt? How to run .NET Core console app from the command line Copy Paste in Bash on Ubuntu on Windows How to find which version of TensorFlow is installed in my system? How to install JQ on Mac by command-line? Python not working in the command line of git bash Run function in script from command line (Node JS)

Examples related to input

Angular 4 - get input value React - clearing an input value after form submit Min and max value of input in angular4 application Disable Button in Angular 2 Angular2 - Input Field To Accept Only Numbers How to validate white spaces/empty spaces? [Angular 2] Can't bind to 'ngModel' since it isn't a known property of 'input' Mask for an Input to allow phone numbers? File upload from <input type="file"> Why does the html input with type "number" allow the letter 'e' to be entered in the field?

Examples related to windows-console

Assign output of a program to a variable using a MS batch file Logical operators ("and", "or") in DOS batch How to use unicode characters in Windows command line?