[powershell] UTF-8 output from PowerShell

I'm trying to use Process.Start with redirected I/O to call PowerShell.exe with a string, and to get the output back, all in UTF-8. But I don't seem to be able to make this work.

What I've tried:

  • Passing the command to run via the -Command parameter
  • Writing the PowerShell script as a file to disk with UTF-8 encoding
  • Writing the PowerShell script as a file to disk with UTF-8 with BOM encoding
  • Writing the PowerShell script as a file to disk with UTF-16
  • Setting Console.OutputEncoding in both my console application and in the PowerShell script
  • Setting $OutputEncoding in PowerShell
  • Setting Process.StartInfo.StandardOutputEncoding
  • Doing it all with Encoding.Unicode instead of Encoding.UTF8

In every case, when I inspect the bytes I'm given, I get different values to my original string. I'd really love an explanation as to why this doesn't work.

Here is my code:

static void Main(string[] args)
{
    DumpBytes("Héllo");

    ExecuteCommand("PowerShell.exe", "-Command \"$OutputEncoding = [System.Text.Encoding]::UTF8 ; Write-Output 'Héllo';\"",
        Environment.CurrentDirectory, DumpBytes, DumpBytes);

    Console.ReadLine();
}

static void DumpBytes(string text)
{
    Console.Write(text + " " + string.Join(",", Encoding.UTF8.GetBytes(text).Select(b => b.ToString("X"))));
    Console.WriteLine();
}

static int ExecuteCommand(string executable, string arguments, string workingDirectory, Action<string> output, Action<string> error)
{
    try
    {
        using (var process = new Process())
        {
            process.StartInfo.FileName = executable;
            process.StartInfo.Arguments = arguments;
            process.StartInfo.WorkingDirectory = workingDirectory;
            process.StartInfo.UseShellExecute = false;
            process.StartInfo.CreateNoWindow = true;
            process.StartInfo.RedirectStandardOutput = true;
            process.StartInfo.RedirectStandardError = true;
            process.StartInfo.StandardOutputEncoding = Encoding.UTF8;
            process.StartInfo.StandardErrorEncoding = Encoding.UTF8;

            using (var outputWaitHandle = new AutoResetEvent(false))
            using (var errorWaitHandle = new AutoResetEvent(false))
            {
                process.OutputDataReceived += (sender, e) =>
                {
                    if (e.Data == null)
                    {
                        outputWaitHandle.Set();
                    }
                    else
                    {
                        output(e.Data);
                    }
                };

                process.ErrorDataReceived += (sender, e) =>
                {
                    if (e.Data == null)
                    {
                        errorWaitHandle.Set();
                    }
                    else
                    {
                        error(e.Data);
                    }
                };

                process.Start();

                process.BeginOutputReadLine();
                process.BeginErrorReadLine();

                process.WaitForExit();
                outputWaitHandle.WaitOne();
                errorWaitHandle.WaitOne();

                return process.ExitCode;
            }
        }
    }
    catch (Exception ex)
    {
        throw new Exception(string.Format("Error when attempting to execute {0}: {1}", executable, ex.Message),
            ex);
    }
}

Update 1

I found that if I make this script:

[Console]::OutputEncoding = [System.Text.Encoding]::UTF8
Write-Host "Héllo!"
[Console]::WriteLine("Héllo")

Then invoke it via:

ExecuteCommand("PowerShell.exe", "-File C:\\Users\\Paul\\Desktop\\Foo.ps1",
  Environment.CurrentDirectory, DumpBytes, DumpBytes);

The first line is corrupted, but the second isn't:

H?llo! 48,EF,BF,BD,6C,6C,6F,21
Héllo 48,C3,A9,6C,6C,6F

This suggests to me that my redirection code is all working fine; when I use Console.WriteLine in PowerShell I get UTF-8 as I expect.

This means that PowerShell's Write-Output and Write-Host commands must be doing something different with the output, and not simply calling Console.WriteLine.

Update 2

I've even tried the following to force the PowerShell console code page to UTF-8, but Write-Host and Write-Output continue to produce broken results while [Console]::WriteLine works.

$sig = @'
[DllImport("kernel32.dll")]
public static extern bool SetConsoleCP(uint wCodePageID);

[DllImport("kernel32.dll")]
public static extern bool SetConsoleOutputCP(uint wCodePageID);
'@

$type = Add-Type -MemberDefinition $sig -Name Win32Utils -Namespace Foo -PassThru

$type::SetConsoleCP(65001)
$type::SetConsoleOutputCP(65001)

Write-Host "Héllo!"

& chcp    # Tells us 65001 (UTF-8) is being used

The answer is


Set the [Console]::OuputEncoding as encoding whatever you want, and print out with [Console]::WriteLine.

If powershell ouput method has a problem, then don't use it. It feels bit bad, but works like a charm :)


Not an expert on encoding, but after reading these...

... it seems fairly clear that the $OutputEncoding variable only affects data piped to native applications.

If sending to a file from withing PowerShell, the encoding can be controlled by the -encoding parameter on the out-file cmdlet e.g.

write-output "hello" | out-file "enctest.txt" -encoding utf8

Nothing else you can do on the PowerShell front then, but the following post may well help you:.


Spent some time working on a solution to my issue and thought it may be of interest. I ran into a problem trying to automate code generation using PowerShell 3.0 on Windows 8. The target IDE was the Keil compiler using MDK-ARM Essential Toolchain 5.24.1. A bit different from OP, as I am using PowerShell natively during the pre-build step. When I tried to #include the generated file, I received the error

fatal error: UTF-16 (LE) byte order mark detected '..\GITVersion.h' but encoding is not supported

I solved the problem by changing the line that generated the output file from:

out-file -FilePath GITVersion.h -InputObject $result

to:

out-file -FilePath GITVersion.h -Encoding ascii -InputObject $result

Examples related to powershell

Why powershell does not run Angular commands? How do I install the Nuget provider for PowerShell on a unconnected machine so I can install a nuget package from the PS command line? How to print environment variables to the console in PowerShell? Check if a string is not NULL or EMPTY The term 'ng' is not recognized as the name of a cmdlet VSCode Change Default Terminal 'Connect-MsolService' is not recognized as the name of a cmdlet Powershell Invoke-WebRequest Fails with SSL/TLS Secure Channel Install-Module : The term 'Install-Module' is not recognized as the name of a cmdlet Change directory in PowerShell

Examples related to encoding

How to check encoding of a CSV file UnicodeEncodeError: 'ascii' codec can't encode character at special name Using Javascript's atob to decode base64 doesn't properly decode utf-8 strings What is the difference between utf8mb4 and utf8 charsets in MySQL? The character encoding of the plain text document was not declared - mootool script UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 23: ordinal not in range(128) How to encode text to base64 in python UTF-8 output from PowerShell Set Encoding of File to UTF8 With BOM in Sublime Text 3 Replace non-ASCII characters with a single space

Examples related to utf-8

error UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte Changing PowerShell's default output encoding to UTF-8 'Malformed UTF-8 characters, possibly incorrectly encoded' in Laravel Encoding Error in Panda read_csv Using Javascript's atob to decode base64 doesn't properly decode utf-8 strings What is the difference between utf8mb4 and utf8 charsets in MySQL? what is <meta charset="utf-8">? Pandas df.to_csv("file.csv" encode="utf-8") still gives trash characters for minus sign UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 23: ordinal not in range(128) Android Studio : unmappable character for encoding UTF-8

Examples related to character-encoding

Changing PowerShell's default output encoding to UTF-8 JsonParseException : Illegal unquoted character ((CTRL-CHAR, code 10) Change the encoding of a file in Visual Studio Code What is the difference between utf8mb4 and utf8 charsets in MySQL? How to open html file? All inclusive Charset to avoid "java.nio.charset.MalformedInputException: Input length = 1"? UTF-8 output from PowerShell ERROR 1115 (42000): Unknown character set: 'utf8mb4' "for line in..." results in UnicodeDecodeError: 'utf-8' codec can't decode byte How to make php display \t \n as tab and new line instead of characters

Examples related to io-redirection

UTF-8 output from PowerShell Redirecting Output from within Batch file Redirect stderr to stdout in C shell Bash write to file without echo? What is /dev/null 2>&1? How to redirect both stdout and stderr to a file Redirect all output to file in Bash Send string to stdin Find and replace in file and overwrite file doesn't work, it empties the file How do I capture all of my compiler's output to a file?