Best way to convert text files between character sets

Question

What is the fastest  easiest tool or method to convert text files between character sets   Specifically  I need to convert from UTF-8 to ISO-8859-15 and vice versa   Everything goes  one-liners in your favorite scripting language  command-line tools or other utilities for OS  web sites  etc   Best solutions so far   On Linux UNIX OS X cygwin    Gnu iconv suggested by Troels Arvin is best used as a filter  It seems to be universally available  Example     iconv -f UTF-8 -t ISO-8859-15 in txt  gt  out txt   As pointed out by Ben  there is an online converter using iconv  Gnu recode  manual  suggested by Cheekysoft will convert one or several files in-place  Example     recode UTF8  ISO-8859-15 in txt   This one uses shorter aliases     recode utf8  l9 in txt   Recode also supports surfaces which can be used to convert between different line ending types and encodings   Convert newlines from LF  Unix  to CR-LF  DOS      recode    CR-LF in txt   Base64 encode file     recode    Base64 in txt   You can also combine them   Convert a Base64 encoded UTF8 file with Unix line endings to Base64 encoded Latin 1 file with Dos line endings     recode utf8 Base64  l1 CR-LF Base64 file txt    On Windows with Powershell  Jay Bazuzi      PS C   gt  gc -en utf8 in txt   Out-File -en ascii out txt   No ISO-8859-15 support though  it says that supported charsets are unicode  utf7  utf8  utf32  ascii  bigendianunicode  default  and oem     Edit  Do you mean iso-8859-1 support  Using  String  does this e g  for vice versa  gc -en string in txt   Out-File -en utf8 out txt   Note  The possible enumeration values are  Unknown  String  Unicode  Byte  BigEndianUnicode  UTF8  UTF7  Ascii     CsCvt - Kalytta s Character Set Converter is another great command line based conversion tool for Windows

User · Answer

to write properties file  Java  normally I use this in linux  mint and ubuntu distributions      native2ascii filename properties   For example     cat test properties  first Execu    o n  mero um second Execu    o n  mero dois    native2ascii test properties  first Execu u00e7 u00e3o n u00famero um second Execu u00e7 u00e3o n u00famero dois   PS  I writed Execution number one two in portugues to force special characters   In my case  in first execution I received this message     native2ascii teste txt  The program  native2ascii  can be found in the following packages     gcj-5-jdk    openjdk-8-jdk-headless    gcj-4 8-jdk    gcj-4 9-jdk Try  sudo apt install  lt selected package gt    When I installed the first option  gcj-5-jdk  the problem was finished   I hope this help someone

User · Answer

Get-Content -Encoding UTF8 FILE-UTF8 TXT   Out-File -Encoding UTF7 FILE-UTF7 TXT   The shortest version  if you can assume that the input BOM is correct   gc FILE TXT   Out-File -en utf7 file-utf7 txt

User · Answer

With ruby   ruby -e  File write  output txt   File read  input txt   encode  UTF-8    binary   invalid   replace  undef   replace  replace          Source  https   robots thoughtbot com fight-back-utf-8-invalid-byte-sequences

User · Answer

Under Linux you can use the very powerful recode command to try and convert between the different charsets as well as any line ending issues  recode -l will show you all of the formats and encodings that the tool can convert between  It is likely to be a VERY long list

User · Answer

DOS Windows  use Code page  chcp 65001 gt NUL type ascii txt  gt  unicode txt   Command chcp can be used to change the code page  Code page 65001 is Microsoft name for UTF-8  After setting code page  the output generated by following commands will be of code page set

User · Answer

Stand-alone utility approach iconv -f ISO-8859-1 -t UTF-8 in txt  gt  out txt  -f ENCODING  the encoding of the input -t ENCODING  the encoding of the output  You don t have to specify either of these arguments  They will default to your current locale  which is usually UTF-8

User · Answer

Stand-alone utility approach iconv -f ISO-8859-1 -t UTF-8 in txt  gt  out txt  -f ENCODING  the encoding of the input -t ENCODING  the encoding of the output  You don t have to specify either of these arguments  They will default to your current locale  which is usually UTF-8

User · Answer

My favorite tool for this is Jedit  a java based text editor  which has two very convenient features     One which enables the user to reload a text with a different encoding  and  as such  to control visually the result  Another one which enables the user to explicitly choose the encoding  and end of line char  before saving

User · Answer

Try Notepad   On Windows I was able to use Notepad   to do the conversion from ISO-8859-1 to UTF-8  Click  quot Encoding quot  and then  quot Convert to UTF-8 quot

User · Answer

With ruby   ruby -e  File write  output txt   File read  input txt   encode  UTF-8    binary   invalid   replace  undef   replace  replace          Source  https   robots thoughtbot com fight-back-utf-8-invalid-byte-sequences

User · Answer

Get-Content -Encoding UTF8 FILE-UTF8 TXT   Out-File -Encoding UTF7 FILE-UTF7 TXT   The shortest version  if you can assume that the input BOM is correct   gc FILE TXT   Out-File -en utf7 file-utf7 txt

User · Answer

Stand-alone utility approach iconv -f ISO-8859-1 -t UTF-8 in txt  gt  out txt  -f ENCODING  the encoding of the input -t ENCODING  the encoding of the output  You don t have to specify either of these arguments  They will default to your current locale  which is usually UTF-8

User · Answer

As described on How do I correct the character encoding of a file  Synalyze It  lets you easily convert on OS X between all encodings supported by the ICU library   Additionally you can display some bytes of a file translated to Unicode from all the encodings to see quickly which is the right one for your file

User · Answer

iconv 1   iconv -f FROM-ENCODING -t TO-ENCODING file txt   Also there are iconv-based tools in many languages

User · Answer

Simply change encoding of loaded file in IntelliJ IDEA IDE  on the right of status bar  bottom   where current charset is indicated  It prompts to Reload or Convert  use Convert  Make sure you backed up original file in advance

User · Answer

Assuming  you don t know the input encoding and still wish to automate most of the conversion  I concluded this one liner from summing up previous answers  iconv -f   chardetect input text   awk   print  2    -t utf-8 -o output text

User · Answer

In powershell  function Recode  InCharset   InFile   OutCharset   OutFile           Read input file in the source encoding      Encoding    System Text Encoding   GetEncoding  InCharset       Text    System IO File   ReadAllText  InFile   Encoding             Write output file in the destination encoding      Encoding    System Text Encoding   GetEncoding  OutCharset           System IO File   WriteAllText  OutFile   Text   Encoding     Recode Windows-1252  quot  pwd in txt quot  utf8  quot  pwd out txt quot    For a list of supported encoding names  https   docs microsoft com en-us dotnet api system text encoding

User · Answer

Try EncodingChecker EncodingChecker on github File Encoding Checker is a GUI tool that allows you to validate the text encoding of one or more files  The tool can display the encoding for all selected files  or only the files that do not have the encodings you specify  File Encoding Checker requires  NET 4 or above to run  For encoding detection  File Encoding Checker uses the UtfUnknown Charset Detector library  UTF-16 text files without byte-order-mark  BOM  can be detected by heuristics

User · Answer

If macOS GUI applications are your bread and butter  SubEthaEdit is the text editor I usually go to for encoding-wrangling     its  conversion preview  allows you to see all invalid characters in the output encoding  and fix remove them   And it s open-source now  so yay for them

User · Answer

Try iconv Bash function I ve put this into  bashrc  utf8         iconv -f ISO-8859-1 -t UTF-8  1  gt   1 tmp     rm  1     mv  1 tmp  1      to be able to convert files like so  utf8 MyClass java

User · Answer

iconv 1   iconv -f FROM-ENCODING -t TO-ENCODING file txt   Also there are iconv-based tools in many languages

User · Answer

Stand-alone utility approach iconv -f ISO-8859-1 -t UTF-8 in txt  gt  out txt  -f ENCODING  the encoding of the input -t ENCODING  the encoding of the output  You don t have to specify either of these arguments  They will default to your current locale  which is usually UTF-8

User · Answer

iconv 1   iconv -f FROM-ENCODING -t TO-ENCODING file txt   Also there are iconv-based tools in many languages

User · Answer

PHP iconv    iconv  UTF-8    ISO-8859-15    input

User · Answer

Try VIM If you have vim you can use this  Not tested for every encoding  The cool part about this is that you don t have to know the source encoding vim   quot set nobomb   set fenc utf8   x quot  filename txt  Be aware that this command modify directly the file  Explanation part       Used by vim to directly enter command when opening a file  Usualy used to open a file at a specific line  vim  14 file txt     Separator of multiple commands  like   in bash  set nobomb   no utf-8 BOM set fenc utf8   Set new encoding to utf-8 doc link x   Save and close file filename txt   path to the file  quot    qotes are here because of pipes   otherwise bash will use them as bash pipe

User · Answer

PHP iconv    iconv  UTF-8    ISO-8859-15    input

User · Answer

Assuming  you don t know the input encoding and still wish to automate most of the conversion  I concluded this one liner from summing up previous answers  iconv -f   chardetect input text   awk   print  2    -t utf-8 -o output text

User · Answer

There is also a web tool to convert file encoding  https   webtool cloud change-file-encoding It supports wide range of encodings  including some rare ones  like IBM code page 37

User · Answer

PHP iconv    iconv  UTF-8    ISO-8859-15    input

User · Answer

Under Linux you can use the very powerful recode command to try and convert between the different charsets as well as any line ending issues  recode -l will show you all of the formats and encodings that the tool can convert between  It is likely to be a VERY long list

User · Answer

If macOS GUI applications are your bread and butter  SubEthaEdit is the text editor I usually go to for encoding-wrangling     its  conversion preview  allows you to see all invalid characters in the output encoding  and fix remove them   And it s open-source now  so yay for them

User · Answer

Try Notepad   On Windows I was able to use Notepad   to do the conversion from ISO-8859-1 to UTF-8  Click  quot Encoding quot  and then  quot Convert to UTF-8 quot

User · Answer

iconv 1   iconv -f FROM-ENCODING -t TO-ENCODING file txt   Also there are iconv-based tools in many languages

User · Answer

Get-Content -Encoding UTF8 FILE-UTF8 TXT   Out-File -Encoding UTF7 FILE-UTF7 TXT   The shortest version  if you can assume that the input BOM is correct   gc FILE TXT   Out-File -en utf7 file-utf7 txt

User · Answer

to write properties file  Java  normally I use this in linux  mint and ubuntu distributions      native2ascii filename properties   For example     cat test properties  first Execu    o n  mero um second Execu    o n  mero dois    native2ascii test properties  first Execu u00e7 u00e3o n u00famero um second Execu u00e7 u00e3o n u00famero dois   PS  I writed Execution number one two in portugues to force special characters   In my case  in first execution I received this message     native2ascii teste txt  The program  native2ascii  can be found in the following packages     gcj-5-jdk    openjdk-8-jdk-headless    gcj-4 8-jdk    gcj-4 9-jdk Try  sudo apt install  lt selected package gt    When I installed the first option  gcj-5-jdk  the problem was finished   I hope this help someone

User · Answer

Under Linux you can use the very powerful recode command to try and convert between the different charsets as well as any line ending issues  recode -l will show you all of the formats and encodings that the tool can convert between  It is likely to be a VERY long list

User · Answer

Simply change encoding of loaded file in IntelliJ IDEA IDE  on the right of status bar  bottom   where current charset is indicated  It prompts to Reload or Convert  use Convert  Make sure you backed up original file in advance

User · Answer

As described on How do I correct the character encoding of a file  Synalyze It  lets you easily convert on OS X between all encodings supported by the ICU library   Additionally you can display some bytes of a file translated to Unicode from all the encodings to see quickly which is the right one for your file

User · Answer

Under Linux you can use the very powerful recode command to try and convert between the different charsets as well as any line ending issues  recode -l will show you all of the formats and encodings that the tool can convert between  It is likely to be a VERY long list

User · Answer

PHP iconv    iconv  UTF-8    ISO-8859-15    input

User · Answer

Oneliner using find  with automatic character set detection The character encoding of all matching text files gets detected automatically and all matching text files are converted to utf-8 encoding    find   -type f -iname   txt -exec sh -c  iconv -f   file -bi  quot  1 quot   sed -e  quot s      charset    quot   -t utf-8 -o converted  quot  1 quot   amp  amp  mv converted  quot  1 quot   --        To perform these steps  a sub shell sh is used with -exec  running a one-liner with the -c flag  and passing the filename as the positional argument  quot  1 quot  with --     In between  the utf-8 output file is temporarily named converted  Whereby file -bi means   -b  --brief Do not prepend filenames to output lines  brief mode    -i  --mime Causes the file command to output mime type strings rather than the more traditional human readable ones  Thus it may say for example text plain  charset us-ascii rather than ASCII text  The sed command cuts this to only us-ascii as is required by iconv    The find command is very useful for such file management automation  Click here for more find galore

User · Answer

In powershell  function Recode  InCharset   InFile   OutCharset   OutFile           Read input file in the source encoding      Encoding    System Text Encoding   GetEncoding  InCharset       Text    System IO File   ReadAllText  InFile   Encoding             Write output file in the destination encoding      Encoding    System Text Encoding   GetEncoding  OutCharset           System IO File   WriteAllText  OutFile   Text   Encoding     Recode Windows-1252  quot  pwd in txt quot  utf8  quot  pwd out txt quot    For a list of supported encoding names  https   docs microsoft com en-us dotnet api system text encoding

User · Answer

Try VIM If you have vim you can use this  Not tested for every encoding  The cool part about this is that you don t have to know the source encoding vim   quot set nobomb   set fenc utf8   x quot  filename txt  Be aware that this command modify directly the file  Explanation part       Used by vim to directly enter command when opening a file  Usualy used to open a file at a specific line  vim  14 file txt     Separator of multiple commands  like   in bash  set nobomb   no utf-8 BOM set fenc utf8   Set new encoding to utf-8 doc link x   Save and close file filename txt   path to the file  quot    qotes are here because of pipes   otherwise bash will use them as bash pipe

User · Answer

There is also a web tool to convert file encoding  https   webtool cloud change-file-encoding It supports wide range of encodings  including some rare ones  like IBM code page 37

User · Answer

DOS Windows  use Code page  chcp 65001 gt NUL type ascii txt  gt  unicode txt   Command chcp can be used to change the code page  Code page 65001 is Microsoft name for UTF-8  After setting code page  the output generated by following commands will be of code page set

User · Answer

Use this Python script  https   github com goerz convert encoding py Works on any platform  Requires Python 2 7

User · Answer

Try EncodingChecker EncodingChecker on github File Encoding Checker is a GUI tool that allows you to validate the text encoding of one or more files  The tool can display the encoding for all selected files  or only the files that do not have the encodings you specify  File Encoding Checker requires  NET 4 or above to run  For encoding detection  File Encoding Checker uses the UtfUnknown Charset Detector library  UTF-16 text files without byte-order-mark  BOM  can be detected by heuristics

User · Answer

Oneliner using find  with automatic character set detection The character encoding of all matching text files gets detected automatically and all matching text files are converted to utf-8 encoding    find   -type f -iname   txt -exec sh -c  iconv -f   file -bi  quot  1 quot   sed -e  quot s      charset    quot   -t utf-8 -o converted  quot  1 quot   amp  amp  mv converted  quot  1 quot   --        To perform these steps  a sub shell sh is used with -exec  running a one-liner with the -c flag  and passing the filename as the positional argument  quot  1 quot  with --     In between  the utf-8 output file is temporarily named converted  Whereby file -bi means   -b  --brief Do not prepend filenames to output lines  brief mode    -i  --mime Causes the file command to output mime type strings rather than the more traditional human readable ones  Thus it may say for example text plain  charset us-ascii rather than ASCII text  The sed command cuts this to only us-ascii as is required by iconv    The find command is very useful for such file management automation  Click here for more find galore

User · Answer

Get-Content -Encoding UTF8 FILE-UTF8 TXT   Out-File -Encoding UTF7 FILE-UTF7 TXT   The shortest version  if you can assume that the input BOM is correct   gc FILE TXT   Out-File -en utf7 file-utf7 txt

User · Answer

Use this Python script  https   github com goerz convert encoding py Works on any platform  Requires Python 2 7

User · Answer

Try iconv Bash function I ve put this into  bashrc  utf8         iconv -f ISO-8859-1 -t UTF-8  1  gt   1 tmp     rm  1     mv  1 tmp  1      to be able to convert files like so  utf8 MyClass java

User · Answer

My favorite tool for this is Jedit  a java based text editor  which has two very convenient features     One which enables the user to reload a text with a different encoding  and  as such  to control visually the result  Another one which enables the user to explicitly choose the encoding  and end of line char  before saving

[text] Best way to convert text files between character sets?

Examples related to text

Examples related to unicode

Examples related to utf-8

Examples related to character-set