[excel] Microsoft Excel mangles Diacritics in .csv files?

I am programmatically exporting data (using PHP 5.2) into a .csv test file.
Example data: Numéro 1 (note the accented e). The data is utf-8 (no prepended BOM).

When I open this file in MS Excel is displays as Numéro 1.

I am able to open this in a text editor (UltraEdit) which displays it correctly. UE reports the character is decimal 233.

How can I export text data in a .csv file so that MS Excel will correctly render it, preferably without forcing the use of the import wizard, or non-default wizard settings?

This question is related to excel encoding csv diacritics

The answer is


Excel 2007 properly reads UTF-8 with BOM (EF BB BF) encoded csv.

Excel 2003 (and maybe earlier) reads UTF-16LE with BOM (FF FE), but with TABs instead of commas or semicolons.


You can save an html file with the extension 'xls' and accents will work (pre 2007 at least).

Example: save this (using Save As utf8 in Notepad) as test.xls:

<html>
<meta http-equiv="Content-Type" content="text/html" charset="utf-8" />
<table>
<tr>
  <th>id</th>
  <th>name</th>
</tr>
<tr>
 <td>4</td>
 <td>Hélène</td>
</tr>
</table>
</html>

Echo UTF-8 BOM before outputing CSV data. This fixes all character issues in Windows but doesnt work for Mac.

echo "\xEF\xBB\xBF";

It works for me because I need to generate a file which will be used on Windows PCs only.


select UTF-8 enconding when importing. if you use Office 2007 this is where you chose it : right after you open the file.


Prepending a BOM (\uFEFF) worked for me (Excel 2007), in that Excel recognised the file as UTF-8. Otherwise, saving it and using the import wizard works, but is less ideal.


The CSV format is implemented as ASCII, not unicode, in Excel, thus mangling the diacritics. We experienced the same issue which is how I tracked down that the official CSV standard was defined as being ASCII-based in Excel.


Note that including the UTF-8 BOM is not necessarily a good idea - Mac versions of Excel ignore it and will actually display the BOM as ASCII… three nasty characters at the start of the first field in your spreadsheet…


Another solution I found was just to encode the result as Windows Code Page 1252 (Windows-1252 or CP1252). This would be done, for example by setting Content-Type appropriately to something like text/csv; charset=Windows-1252 and setting the character encoding of the response stream similarly.


I can only get CSV to parse properly in Excel 2007 as tab-separated little-endian UTF-16 starting with the proper byte order mark.


This is just of a question of character encodings. It looks like you're exporting your data as UTF-8: é in UTF-8 is the two-byte sequence 0xC3 0xA9, which when interpreted in Windows-1252 is é. When you import your data into Excel, make sure to tell it that the character encoding you're using is UTF-8.


I've found a way to solve the problem. This is a nasty hack but it works: open the doc with Open Office, then save it into any excel format; the resulting .xls or .xlsx will display the accentuated characters.


The answer for all combinations of Excel versions (2003 + 2007) and file types

Most other answers here concern their Excel version only and will not necessarily help you, because their answer just might not be true for your version of Excel.

For example, adding the BOM character introduces problems with automatic column separator recognition, but not with every Excel version.

There are 3 variables that determines if it works in most Excel versions:

  • Encoding
  • BOM character presence
  • Cell separator

Somebody stoic at SAP tried every combination and reported the outcome. End result? Use UTF16le with BOM and tab character as separator to have it work in most Excel versions.

You don't believe me? I wouldn't either, but read here and weep: http://wiki.sdn.sap.com/wiki/display/ABAP/CSV+tests+of+encoding+and+column+separator


If you have legacy code in vb.net like I have, the following code worked for me:

    Response.Clear()
    Response.ClearHeaders()
    Response.ContentType = "text/csv"
    Response.Expires = 0
    Response.AddHeader("Content-Disposition", "attachment; filename=export.csv;")
    Using sw As StreamWriter = New StreamWriter(Context.Response.OutputStream, System.Text.Encoding.Unicode)
        sw.Write(csv)
        sw.Close()
    End Using
    Response.End()

As Fregal said \uFEFF is the way to go.

<%@LANGUAGE="JAVASCRIPT" CODEPAGE="65001"%>
<%
Response.Clear();
Response.ContentType = "text/csv";
Response.Charset = "utf-8";
Response.AddHeader("Content-Disposition", "attachment; filename=excelTest.csv");
Response.Write("\uFEFF");
// csv text here
%>

Writing a BOM to the output CSV file actually did work for me in Django:

def handlePersoonListExport(request):
    # Retrieve a query_set
    ...

    template = loader.get_template("export.csv")
    context = Context({
        'data': query_set,
    })

    response = HttpResponse()
    response['Content-Disposition'] = 'attachment; filename=export.csv'
    response['Content-Type'] = 'text/csv; charset=utf-8'
    response.write("\xEF\xBB\xBF")
    response.write(template.render(context))

    return response

For more info http://crashcoursing.blogspot.com/2011/05/exporting-csv-with-special-characters.html Thanks guys!


UTF-8 doesn't work for me in office 2007 without any service pack, with or without BOM (U+ffef or 0xEF,0xBB,0xBF , neither works) installing sp3 makes UTF-8 work when 0xEF,0xBB,0xBF BOM is prepended.

UTF-16 works when encoding in python using "utf-16-le" with a 0xff 0xef BOM prepended, and using tab as seperator. I had to manually write out the BOM, and then use "utf-16-le" rather then "utf-16", otherwise each encode() prepended the BOM to every row written out which appeared as garbage on the first column of the second line and after.

can't tell whether UTF-16 would work without any sp installed, since I can't go back now. sigh

This is on windows, dunno about office for MAC.

for both working cases, the import works when launching a download directly from the browser and the text import wizard doesn't intervence, it works like you would expect.


Below is the PHP code I use in my project when sending Microsoft Excel to user:

  /**
   * Export an array as downladable Excel CSV
   * @param array   $header
   * @param array   $data
   * @param string  $filename
   */
  function toCSV($header, $data, $filename) {
    $sep  = "\t";
    $eol  = "\n";
    $csv  =  count($header) ? '"'. implode('"'.$sep.'"', $header).'"'.$eol : '';
    foreach($data as $line) {
      $csv .= '"'. implode('"'.$sep.'"', $line).'"'.$eol;
    }
    $encoded_csv = mb_convert_encoding($csv, 'UTF-16LE', 'UTF-8');
    header('Content-Description: File Transfer');
    header('Content-Type: application/vnd.ms-excel');
    header('Content-Disposition: attachment; filename="'.$filename.'.csv"');
    header('Content-Transfer-Encoding: binary');
    header('Expires: 0');
    header('Cache-Control: must-revalidate, post-check=0, pre-check=0');
    header('Pragma: public');
    header('Content-Length: '. strlen($encoded_csv));
    echo chr(255) . chr(254) . $encoded_csv;
    exit;
  }

UPDATED: Filename improvement and BUG fix correct length calculation. Thanks to TRiG and @ivanhoe011


Check the encoding in which you are generating the file, to make excel display the file correctly you must use the system default codepage.

Wich language are you using? if it's .Net you only need to use Encoding.Default while generating the file.


open the file csv with notepad++ clic on Encode, select convert to UTF-8 (not convert to UTF-8(without BOM)) Save open by double clic with excel Hope that help Christophe GRISON


With Ruby 1.8.7 I encode every field to UTF-16 and discard BOM (maybe).

The following code is extracted from active_scaffold_export:

<%                                                                                                                                                                                                                                                                                                                           
      require 'fastercsv'                                                                                                                                                                                                                                                                                                        
      fcsv_options = {                                                                                                                                                                                                                                                                                                           
        :row_sep => "\n",                                                                                                                                                                                                                                                                                                        
        :col_sep => params[:delimiter],                                                                                                                                                                                                                                                                                          
        :force_quotes => @export_config.force_quotes,                                                                                                                                                                                                                                                                            
        :headers => @export_columns.collect { |column| format_export_column_header_name(column) }                                                                                                                                                                                                                                
      }                                                                                                                                                                                                                                                                                                                          

      data = FasterCSV.generate(fcsv_options) do |csv|                                                                                                                                                                                                                                                                           
        csv << fcsv_options[:headers] unless params[:skip_header] == 'true'                                                                                                                                                                                                                                                      
        @records.each do |record|                                                                                                                                                                                                                                                                                                
          csv << @export_columns.collect { |column|                                                                                                                                                                                                                                                                              
            # Convert to UTF-16 discarding the BOM, required for Excel (> 2003 ?)                                                                                                                                                                                                                                     
            Iconv.conv('UTF-16', 'UTF-8', get_export_column_value(record, column))[2..-1]                                                                                                                                                                                                                                        
          }                                                                                                                                                                                                                                                                                                                      
        end                                                                                                                                                                                                                                                                                                                      
      end                                                                                                                                                                                                                                                                                                                        
    -%><%= data -%>

The important line is:

Iconv.conv('UTF-16', 'UTF-8', get_export_column_value(record, column))[2..-1]

I've also noticed that the question was "answered" some time ago but I don't understand the stories that say you can't open a utf8-encoded csv file successfully in Excel without using the text wizard.

My reproducible experience: Type Old MacDonald had a farm,ÈÌÉÍØ into Notepad, hit Enter, then Save As (using the UTF-8 option).

Using Python to show what's actually in there:

>>> open('oldmac.csv', 'rb').read()
'\xef\xbb\xbfOld MacDonald had a farm,\xc3\x88\xc3\x8c\xc3\x89\xc3\x8d\xc3\x98\r\n'
>>> ^Z

Good. Notepad has put a BOM at the front.

Now go into Windows Explorer, double click on the file name, or right click and use "Open with ...", and up pops Excel (2003) with display as expected.


Examples related to excel

Python: Pandas pd.read_excel giving ImportError: Install xlrd >= 0.9.0 for Excel support Converting unix time into date-time via excel How to increment a letter N times per iteration and store in an array? 'Microsoft.ACE.OLEDB.16.0' provider is not registered on the local machine. (System.Data) How to import an Excel file into SQL Server? Copy filtered data to another sheet using VBA Better way to find last used row Could pandas use column as index? Check if a value is in an array or not with Excel VBA How to sort dates from Oldest to Newest in Excel?

Examples related to encoding

How to check encoding of a CSV file UnicodeEncodeError: 'ascii' codec can't encode character at special name Using Javascript's atob to decode base64 doesn't properly decode utf-8 strings What is the difference between utf8mb4 and utf8 charsets in MySQL? The character encoding of the plain text document was not declared - mootool script UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 23: ordinal not in range(128) How to encode text to base64 in python UTF-8 output from PowerShell Set Encoding of File to UTF8 With BOM in Sublime Text 3 Replace non-ASCII characters with a single space

Examples related to csv

Pandas: ValueError: cannot convert float NaN to integer Export result set on Dbeaver to CSV Convert txt to csv python script How to import an Excel file into SQL Server? "CSV file does not exist" for a filename with embedded quotes Save Dataframe to csv directly to s3 Python Data-frame Object has no Attribute (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape How to write to a CSV line by line? How to check encoding of a CSV file

Examples related to diacritics

Is there a way to get rid of accents and convert a whole string to regular letters? Converting Symbols, Accent Letters to English Alphabet Remove accents/diacritics in a string in JavaScript What is the best way to remove accents (normalize) in a Python unicode string? How do I remove diacritics (accents) from a string in .NET? Microsoft Excel mangles Diacritics in .csv files?