Using PowerShell to write a file in UTF-8 without the BOM

Question

Out-File seems to force the BOM when using UTF-8    MyFile   Get-Content  MyPath  MyFile   Out-File -Encoding  UTF8   MyPath   How can I write a file in UTF-8 with no BOM using PowerShell

User · Answer

When using Set-Content instead of Out-File  you can specify the encoding Byte  which can be used to write a byte array to a file  This in combination with a custom UTF8 encoding which does not emit the BOM gives the desired result     This variable can be reused  utf8   New-Object System Text UTF8Encoding  false   MyFile   Get-Content  MyPath -Raw Set-Content -Value  utf8 GetBytes  MyFile  -Encoding Byte -Path  MyPath   The difference to using  IO File   WriteAllLines   or similar is that it should work fine with any type of item and path  not only actual file paths

User · Answer

One technique I utilize is to redirect output to an ASCII file using the Out-File cmdlet   For example  I often run SQL scripts that create another SQL script to execute in Oracle  With simple redirection        the output will be in UTF-16 which is not recognized by SQLPlus  To work around this   sqlplus -s   as sysdba   create sql script sql    Out-File -FilePath new script sql -Encoding ASCII -Force   The generated script can then be executed via another SQLPlus session without any Unicode worries   sqlplus   as sysdba   new script sql    tee new script log

User · Answer

This script will convert  to UTF-8 without BOM  all  txt files in DIRECTORY1 and output them to DIRECTORY2  foreach   i in ls -name DIRECTORY1   txt         file content   Get-Content  DIRECTORY1  i        System IO File   WriteAllLines  DIRECTORY2  i    file content

User · Answer

Could use below to get UTF8 without BOM   MyFile   Out-File -Encoding ASCII

User · Answer

Change multiple files by extension to UTF-8 without BOM    Utf8NoBomEncoding   New-Object System Text UTF8Encoding  False  foreach  i in ls -recurse -filter    java          MyFile   Get-Content  i fullname       System IO File   WriteAllLines  i fullname   MyFile   Utf8NoBomEncoding

User · Answer

System IO FileInfo   file   Get-Item -Path  FilePath       sequenceBOM   New-Object System Byte   3       reader    file OpenRead         bytesRead    reader Read  sequenceBOM  0  3        reader Dispose         A UTF-8 BOM string will start with the three following bytes  Hex  0xEF0xBB0xBF  Decimal  239 187 191      if   bytesRead -eq 3 -and  sequenceBOM 0  -eq 239 -and  sequenceBOM 1  -eq 187 -and  sequenceBOM 2  -eq 191                   utf8NoBomEncoding   New-Object System Text UTF8Encoding  False            System IO File   WriteAllLines  FilePath   Get-Content  FilePath    utf8NoBomEncoding           Write-Host  Remove UTF-8 BOM successfully              Else                 Write-Warning  Not UTF-8 BOM file             Source How to remove UTF8 Byte Order Mark  BOM  from a file using PowerShell

User · Answer

I figured this wouldn t be UTF  but I just found a pretty simple solution that seems to work     Get-Content path to file ext   out-file -encoding ASCII targetFile ext   For me this results in a utf-8 without bom file regardless of the source format

User · Answer

If you want to use  System IO File   WriteAllLines    you should cast second parameter to String    if the type of  MyFile is Object     and also specify absolute path with  ExecutionContext SessionState Path GetUnresolvedProviderPathFromPSPath  MyPath   like    Utf8NoBomEncoding   New-Object System Text UTF8Encoding  False Get-ChildItem   ConvertTo-Csv   Set-Variable MyFile  System IO File   WriteAllLines  ExecutionContext SessionState Path GetUnresolvedProviderPathFromPSPath  MyPath    String    MyFile   Utf8NoBomEncoding    If you want to use  System IO File   WriteAllText    sometimes you should pipe the second parameter into   Out-String   to add CRLFs to the end of each line explictly  Especially when you use them with ConvertTo-Csv     Utf8NoBomEncoding   New-Object System Text UTF8Encoding  False Get-ChildItem   ConvertTo-Csv   Out-String   Set-Variable tmp  System IO File   WriteAllText   absolute path to foobar csv    tmp   Utf8NoBomEncoding    Or you can use  Text Encoding   UTF8 GetBytes   with Set-Content -Encoding Byte    Utf8NoBomEncoding   New-Object System Text UTF8Encoding  False Get-ChildItem   ConvertTo-Csv   Out-String        Text Encoding   UTF8 GetBytes         Set-Content -Encoding Byte -Path   absolute path to foobar csv    see  How to write result of ConvertTo-Csv to a file in UTF-8 without BOM

User · Answer

The proper way as of now is to use a solution recommended by  Roman Kuzmin in comments to  M  Dudley answer    IO File   WriteAllLines  filename   content     I ve also shortened it a bit by stripping unnecessary System namespace clarification - it will be substituted automatically by default

User · Answer

Using  NET s UTF8Encoding class and passing  False to the constructor seems to work    MyRawString   Get-Content -Raw  MyPath  Utf8NoBomEncoding   New-Object System Text UTF8Encoding  False  System IO File   WriteAllLines  MyPath   MyRawString   Utf8NoBomEncoding

User · Answer

important   this only works if an extra space or newline at the start is no problem for your use case of the file  e g  if it is an SQL file  Java file or human readable text file  one could use a combination of creating an empty  non-UTF8 or ASCII  UTF8-compatible   file and appending to it  replace  str with gc  src if the source is a file    quot   quot        out-file  -encoding ASCII  -noNewline   dest  str     out-file  -encoding UTF8   -append      dest  as one-liner replace  dest and  str according to your use case    ofdst    dest    quot   quot    out-file -encoding ASCII -noNewline   ofdst    src   out-file -encoding UTF8 -append   ofdst  as simple function function Out-File-UTF8-noBOM   param   str   dest      quot   quot        out-file  -encoding ASCII  -noNewline   dest    str     out-file  -encoding UTF8   -append      dest    using it with a source file  Out-File-UTF8-noBOM   gc  src     dest  using it with a string  Out-File-UTF8-noBOM   str    dest   optionally  continue appending with Out-File   quot more foo bar quot      Out-File -encoding UTF8 -append   dest

User · Answer

For PowerShell 5 1  enable this setting  Control Panel  Region  Administrative  Change system locale  Use Unicode UTF-8 for worldwide language support Then enter this into PowerShell   PSDefaultParameterValues    Encoding      Default   Alternatively  you can upgrade to PowerShell 6 or higher  https   github com PowerShell PowerShell

User · Answer

Starting from version 6 powershell supports the UTF8NoBOM encoding both for set-content and out-file and even uses this as default encoding   So in the above example it should simply be like this    MyFile   Out-File -Encoding UTF8NoBOM  MyPath

User · Answer

Note  This answer applies to Windows PowerShell  by contrast  in the cross-platform PowerShell Core edition  v6    UTF-8 without BOM is the default encoding  across all cmdlets   In other words  If you re using PowerShell  Core  version 6 or higher  you get BOM-less UTF-8 files by default  which you can also explicitly request with -Encoding utf8   -Encoding utf8NoBOM  whereas you get with-BOM encoding with -utf8BOM    If you re running Windows 10 and you re willing to switch to BOM-less UTF-8 encoding system-wide - which can have side effects - even Windows PowerShell can be made to use BOM-less UTF-8 consistently - see this answer     To complement M  Dudley s own simple and pragmatic answer  and ForNeVeR s more concise reformulation   For convenience  here s advanced function Out-FileUtf8NoBom  a pipeline-based alternative that mimics Out-File  which means   you can use it just like Out-File in a pipeline  input objects that aren t strings are formatted as they would be if you sent them to the console  just like with Out-File  an additional -UseLF switch allows you transform Windows-style CRLF newlines to Unix-style LF-only newlines   Example   Get-Content  MyPath    Out-FileUtf8NoBom  MyPath   Add -UseLF for Unix newlines  Note how  Get-Content  MyPath  is enclosed in        which ensures that the entire file is opened  read in full  and closed before sending the result through the pipeline  This is necessary in order to be able to write back to the same file  update it in place   Generally  though  this technique is not advisable for 2 reasons   a  the whole file must fit into memory and  b  if the command is interrupted  data will be lost  A note on memory use   M  Dudley s own answer requires that the entire file contents be built up in memory first  which can be problematic with large files  The function below improves on this only slightly  all input objects are still buffered first  but their string representations are then generated and written to the output file one by one    Source code of function Out-FileUtf8NoBom  Note  The function is also available as an MIT-licensed Gist  and only it will be maintained going forward  You can install it directly with the following command  while I can personally assure you that doing so is safe  you should always check the content of a script before directly executing it this way     Download and define the function  irm https   gist github com mklement0 8689b9b5123a9ba11df7214f82a673be raw Out-FileUtf8NoBom ps1   iex  function Out-FileUtf8NoBom    lt    SYNOPSIS   Outputs to a UTF-8-encoded file  without a BOM   byte-order mark    DESCRIPTION   Mimics the most important aspects of Out-File        Input objects are sent to Out-String first        -Append allows you to append to an existing file  -NoClobber prevents       overwriting of an existing file        -Width allows you to specify the line width for the text representations        of input objects that aren t strings    However  it is not a complete implementation of all Out-File parameters        Only a literal output path is supported  and only as a parameter        -Force is not supported        Conversely  an extra -UseLF switch is supported for using LF-only newlines    Caveat   All  pipeline input is buffered before writing output starts            but the string representations are generated and written to the target           file one by one   NOTES   The raison d   tre for this advanced function is that Windows PowerShell   lacks the ability to write UTF-8 files without a BOM  using -Encoding UTF8    invariably prepends a BOM    Copyright  c  2017  2020 Michael Klement  lt mklement0 gmail com gt   http   same2u net      released under the  MIT license  https   spdx org licenses MIT licenseText     gt      CmdletBinding      param       Parameter Mandatory  Position 0    string   LiteralPath       switch   Append       switch   NoClobber       AllowNull     int   Width       switch   UseLF       Parameter ValueFromPipeline    InputObject         requires -version 3      Convert the input path to a full one  since  NET s working dir  usually     differs from PowerShell s     dir   Split-Path -LiteralPath  LiteralPath   if   dir     dir   Convert-Path -ErrorAction Stop -LiteralPath  dir   else    dir    pwd ProviderPath     LiteralPath    IO Path   Combine  dir   IO Path   GetFileName  LiteralPath        If -NoClobber was specified  throw an exception if the target file already     exists    if   NoClobber -and  Test-Path  LiteralPath         Throw  IO IOException   quot The file   LiteralPath  already exists  quot           Create a StreamWriter object      Note that we take advantage of the fact that the StreamWriter class by default      - uses UTF-8 encoding     - without a BOM     sw   New-Object System IO StreamWriter  LiteralPath   Append     htOutStringArgs         if   Width         htOutStringArgs       Width    Width            Note  By not using begin   process   end blocks  we re effectively running           in the end block  which means that all pipeline input has already           been collected in automatic variable  Input            We must use this approach  because using   Out-String individually           in each iteration of a process block would format each input object           with an indvidual header    try        Input   Out-String -Stream  htOutStringArgs              if   UseLf             sw Write       quot  n quot                  else            sw WriteLine                        finally        sw Dispose

User · Answer

This one works for me  use  Default  instead of  UTF8      MyFile   Get-Content  MyPath  MyFile   Out-File -Encoding  Default   MyPath   The result is ASCII without BOM

[encoding] Using PowerShell to write a file in UTF-8 without the BOM

Examples related to encoding

Examples related to powershell

Examples related to utf-8

Examples related to byte-order-mark