How to write file in UTF-8 format

Question

I have bunch of files that are not in UTF-8 encoding and I m converting a site to UTF-8 encoding   I m using simple script for files that I want to save in utf-8  but the files are saved in old encoding   header  Content-type  text html  charset utf-8    mb internal encoding  UTF-8     fpath  folder    d dir  fpath   while  False       a    d- gt read          if   a        and  a                  npath  fpath      a      data file get contents  npath      file put contents  tempfolder    a   data              How can I save files in utf-8 encoding

User · Answer

This works for me       f fopen  filename  w       Now UTF-8 - Add byte order mark  fwrite  f  pack  CCC  0xef 0xbb 0xbf     fwrite  f  content    fclose  f

User · Answer

This is quite useful question  I think that my solution on Windows 10 PHP7 is rather useful for people who have yet some UTF-8 conversion trouble   Here are my steps  The PHP script calling the following function  here named utfsave php must have UTF-8 encoding itself  this can be easily done by conversion on UltraEdit   In utfsave php  we define a function calling PHP fopen  filename   wb    ie  it s opened in both w write mode  and especially with b in binary mode    lt  php        UTF-8           fnc001  save string as a file in UTF-8     The resulting file is UTF-8 only if  strContent is     with French accents  chinese ideograms  etc      function entSaveAsUtf8  strContent   filename       fp   fopen  filename   wb       fwrite  fp   strContent     fclose  fp     return True           0  write UTF-8 string in fly into UTF-8 file      strContent    My string contains UTF-8 chars ie      for un   t   en France     filename    utf8text txt    entSaveAsUtf8  strContent   filename           2  convert CP936 ANSI OEM - chinese simplified GBK file into UTF-8 file      strContent   file get contents  cp936gbktext txt     strContent   mb convert encoding  strContent   UTF-8    CP936       filename    utf8text2 txt    entSaveAsUtf8  strContent   filename      gt    The source file cp936gbktext txt file content    gt  gt Get-Content cp936gbktext txt My string contains UTF-8 chars ie      for un   t   en France 936  ANSI OEM - chinois simplifi   GBK    Running utf8save php on Windows 10 PHP  thus created utf8text txt  utf8text2 txt files will be automatically saved in UTF-8 format   With this method  BOM char is not required  BOM solution is bad because it causes troubles when we do sourcing an sql file for MySQL for example   It s worth noting that I failed making work file put contents  filename  utf8 encode  mystring    for this purpose                                                                    If you don t know the encoding of the source file  you can list encodings with PHP   print r mb list encodings       This gives a list like this   Array      0    gt  pass    1    gt  wchar    2    gt  byte2be    3    gt  byte2le    4    gt  byte4be    5    gt  byte4le    6    gt  BASE64    7    gt  UUENCODE    8    gt  HTML-ENTITIES    9    gt  Quoted-Printable    10    gt  7bit    11    gt  8bit    12    gt  UCS-4    13    gt  UCS-4BE    14    gt  UCS-4LE    15    gt  UCS-2    16    gt  UCS-2BE    17    gt  UCS-2LE    18    gt  UTF-32    19    gt  UTF-32BE    20    gt  UTF-32LE    21    gt  UTF-16    22    gt  UTF-16BE    23    gt  UTF-16LE    24    gt  UTF-8    25    gt  UTF-7    26    gt  UTF7-IMAP    27    gt  ASCII    28    gt  EUC-JP    29    gt  SJIS    30    gt  eucJP-win    31    gt  EUC-JP-2004    32    gt  SJIS-win    33    gt  SJIS-Mobile DOCOMO    34    gt  SJIS-Mobile KDDI    35    gt  SJIS-Mobile SOFTBANK    36    gt  SJIS-mac    37    gt  SJIS-2004    38    gt  UTF-8-Mobile DOCOMO    39    gt  UTF-8-Mobile KDDI-A    40    gt  UTF-8-Mobile KDDI-B    41    gt  UTF-8-Mobile SOFTBANK    42    gt  CP932    43    gt  CP51932    44    gt  JIS    45    gt  ISO-2022-JP    46    gt  ISO-2022-JP-MS    47    gt  GB18030    48    gt  Windows-1252    49    gt  Windows-1254    50    gt  ISO-8859-1    51    gt  ISO-8859-2    52    gt  ISO-8859-3    53    gt  ISO-8859-4    54    gt  ISO-8859-5    55    gt  ISO-8859-6    56    gt  ISO-8859-7    57    gt  ISO-8859-8    58    gt  ISO-8859-9    59    gt  ISO-8859-10    60    gt  ISO-8859-13    61    gt  ISO-8859-14    62    gt  ISO-8859-15    63    gt  ISO-8859-16    64    gt  EUC-CN    65    gt  CP936    66    gt  HZ    67    gt  EUC-TW    68    gt  BIG-5    69    gt  CP950    70    gt  EUC-KR    71    gt  UHC    72    gt  ISO-2022-KR    73    gt  Windows-1251    74    gt  CP866    75    gt  KOI8-R    76    gt  KOI8-U    77    gt  ArmSCII-8    78    gt  CP850    79    gt  JIS-ms    80    gt  ISO-2022-JP-2004    81    gt  ISO-2022-JP-MOBILE KDDI    82    gt  CP50220    83    gt  CP50220raw    84    gt  CP50221    85    gt  CP50222     If you cannot guess  you try one by one  as mb detect encoding   cannot do the job easily

User · Answer

On Unix Linux a simple shell command could be used alternatively to convert all files from a given directory    recode L1  UTF8 dir     Could be started via PHPs exec   as well

User · Answer

Open your files in windows notebook Change the encoding to be an UTF-8 encoding Save your file Try again   O

User · Answer

I put all together and got easy way to convert ANSI text files to  UTF-8 No Mark    function filesToUTF8  searchdir  convdir  filetypes       get files   glob  searchdir       filetypes      GLOB BRACE     foreach  get files as  file         expl path   explode      file        filename   end  expl path        get file content   file get contents  file        new file content   iconv mb detect encoding  get file content  mb detect order    true    UTF-8    get file content        put new file   file put contents  convdir  filename  new file content           Usage  filesToUTF8  C  Temp    C  Temp conv files    php txt

User · Answer

add BOM to fix UTF-8 in Excel fputs  fp   bom    chr 0xEF    chr 0xBB    chr 0xBF        I got this line from Cool

User · Answer

file get contents   file put contents will not magically convert encoding   You have to convert the string explicitly  for example with iconv   or mb convert encoding     Try this    data   file get contents  npath    data   mb convert encoding  data   UTF-8    OLD-ENCODING    file put contents  tempfolder    a   data     Or alternatively  with PHP s stream filters    fd   fopen  file   r    stream filter append  fd   convert iconv UTF-8 OLD-ENCODING    stream copy to stream  fd  fopen  output   w

User · Answer

If you want to use recode recursively  and filter for type  try this   find   -name    html  -exec recode L1  UTF8

User · Answer

lt  php function writeUTF8File  filename  content              f fopen  filename  w               Now UTF-8 - Add byte order mark          fwrite  f  pack  CCC  0xef 0xbb 0xbf             fwrite  f  content            fclose  f

User · Answer

Add BOM  UTF-8  file put contents  myFile    xEF xBB xBF     content

User · Answer

Iconv to the rescue

[php] How to write file in UTF-8 format?

Examples related to php

Examples related to encoding

Examples related to utf-8

Examples related to iconv

Examples related to mbstring