[php] UTF-8 problems while reading CSV file with fgetcsv

I try to read a CSV and echo the content. But the content displays the characters wrong.

Mäx Müstermänn -> Mäx Müstermänn

Encoding of the CSV file is UTF-8 without BOM (checked with Notepad++).

This is the content of the CSV file:

"Mäx";"Müstermänn"

My PHP script

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
</head>
<body>
<?php
$handle = fopen ("specialchars.csv","r");
echo '<table border="1"><tr><td>First name</td><td>Last name</td></tr><tr>';
while ($data = fgetcsv ($handle, 1000, ";")) {
        $num = count ($data);
        for ($c=0; $c < $num; $c++) {
            // output data
            echo "<td>$data[$c]</td>";
        }
        echo "</tr><tr>";
}
?>
</body>
</html>

I tried to use setlocale(LC_ALL, 'de_DE.utf8'); as suggested here without success. The content is still wrong displayed.

What I'm missing?

Edit:

An echo mb_detect_encoding($data[$c],'UTF-8'); gives me UTF-8 UTF-8.

echo file_get_contents("specialchars.csv"); gives me "Mäx";"Müstermänn".

And

print_r(str_getcsv(reset(explode("\n", file_get_contents("specialchars.csv"))), ';'))

gives me

Array ( [0] => Mäx [1] => Müstermänn )

What does it mean?

This question is related to php csv utf-8 fgetcsv

The answer is


Try this:

<?php
$handle = fopen ("specialchars.csv","r");
echo '<table border="1"><tr><td>First name</td><td>Last name</td></tr><tr>';
while ($data = fgetcsv ($handle, 1000, ";")) {
        $data = array_map("utf8_encode", $data); //added
        $num = count ($data);
        for ($c=0; $c < $num; $c++) {
            // output data
            echo "<td>$data[$c]</td>";
        }
        echo "</tr><tr>";
}
?>

Encountered similar problem: parsing CSV file with special characters like é, è, ö etc ...

The following worked fine for me:

To represent the characters correctly on the html page, the header was needed :

header('Content-Type: text/html; charset=UTF-8');

In order to parse every character correctly, I used:

utf8_encode(fgets($file));

Dont forget to use in all following string operations the 'Multibyte String Functions', like:

mb_strtolower($value, 'UTF-8');

The problem is that the function returns UTF-8 (it can check using mb_detect_encoding), but do not convert, and these characters takes as UTF-8. ?herefore, it's necessary to do the reverse-convert to initial encoding (Windows-1251 or CP1251) using iconv. But since by the fgetcsv returns an array, I suggest to write a custom function: [Sorry for my english]

function customfgetcsv(&$handle, $length, $separator = ';'){
    if (($buffer = fgets($handle, $length)) !== false) {
        return explode($separator, iconv("CP1251", "UTF-8", $buffer));
    }
    return false;
}

In my case the source file has windows-1250 encoding and iconv prints tons of notices about illegal characters in input string...

So this solution helped me a lot:

/**
 * getting CSV array with UTF-8 encoding
 *
 * @param   resource    &$handle
 * @param   integer     $length
 * @param   string      $separator
 *
 * @return  array|false
 */
private function fgetcsvUTF8(&$handle, $length, $separator = ';')
{
    if (($buffer = fgets($handle, $length)) !== false)
    {
        $buffer = $this->autoUTF($buffer);
        return str_getcsv($buffer, $separator);
    }
    return false;
}

/**
 * automatic convertion windows-1250 and iso-8859-2 info utf-8 string
 *
 * @param   string  $s
 *
 * @return  string
 */
private function autoUTF($s)
{
    // detect UTF-8
    if (preg_match('#[\x80-\x{1FF}\x{2000}-\x{3FFF}]#u', $s))
        return $s;

    // detect WINDOWS-1250
    if (preg_match('#[\x7F-\x9F\xBC]#', $s))
        return iconv('WINDOWS-1250', 'UTF-8', $s);

    // assume ISO-8859-2
    return iconv('ISO-8859-2', 'UTF-8', $s);
}

Response to @manvel's answer - use str_getcsv instead of explode - because of cases like this:

some;nice;value;"and;here;comes;combinated;value";and;some;others

explode will explode string into parts:

some
nice
value
"and
here
comes
combinated
value"
and
some
others

but str_getcsv will explode string into parts:

some
nice
value
and;here;comes;combinated;value
and
some
others

Try putting this into the top of your file (before any other output):

<?php

header('Content-Type: text/html; charset=UTF-8');

?>

Examples related to php

I am receiving warning in Facebook Application using PHP SDK Pass PDO prepared statement to variables Parse error: syntax error, unexpected [ Preg_match backtrack error Removing "http://" from a string How do I hide the PHP explode delimiter from submitted form results? Problems with installation of Google App Engine SDK for php in OS X Laravel 4 with Sentry 2 add user to a group on Registration php & mysql query not echoing in html with tags? How do I show a message in the foreach loop?

Examples related to csv

Pandas: ValueError: cannot convert float NaN to integer Export result set on Dbeaver to CSV Convert txt to csv python script How to import an Excel file into SQL Server? "CSV file does not exist" for a filename with embedded quotes Save Dataframe to csv directly to s3 Python Data-frame Object has no Attribute (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape How to write to a CSV line by line? How to check encoding of a CSV file

Examples related to utf-8

error UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte Changing PowerShell's default output encoding to UTF-8 'Malformed UTF-8 characters, possibly incorrectly encoded' in Laravel Encoding Error in Panda read_csv Using Javascript's atob to decode base64 doesn't properly decode utf-8 strings What is the difference between utf8mb4 and utf8 charsets in MySQL? what is <meta charset="utf-8">? Pandas df.to_csv("file.csv" encode="utf-8") still gives trash characters for minus sign UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 23: ordinal not in range(128) Android Studio : unmappable character for encoding UTF-8

Examples related to fgetcsv

How to parse a CSV file using PHP UTF-8 problems while reading CSV file with fgetcsv