[php] Convert utf8-characters to iso-88591 and back in PHP

Some of my script are using different encoding, and when I try to combine them, this has becom an issue.

But I can't change the encoding they use, instead I want to change the encodig of the result from script A, and use it as parameter in script B.

So: is there any simple way to change a string from UTF-8 to ISO-88591 in PHP? I have looked at utf_encode and _decode, but they doesn't do what i want. Why doesn't there exsist any "utf2iso()"-function, or similar?

I don't think I have characters that can't be written in ISO-format, so that shouldn't be an huge issue.

This question is related to php encoding utf-8 iso-8859-1

The answer is


First of all, don't use different encodings. It leads to a mess, and UTF-8 is definitely the one you should be using everywhere.

Chances are your input is not ISO-8859-1, but something else (ISO-8859-15, Windows-1252). To convert from those, use iconv or mb_convert_encoding.

Nevertheless, utf8_encode and utf8_decode should work for ISO-8859-1. It would be nice if you could post a link to a file or a uuencoded or base64 example string for which the conversion fails or yields unexpected results.


I used:

function utf8_to_html ($data) {
    return preg_replace(
        array (
            '/ä/',
            '/ö/',
            '/ü/',
            '/é/',
            '/à/',
            '/è/'
        ),
        array (
            'ä',
            'ö',
            'ü',
            'é',
            'à',
            'è'
        ),
        $data 
    );
}

Use html_entity_decode() and htmlentities().

$html = html_entity_decode(htmlentities($html, ENT_QUOTES, 'UTF-8'), ENT_QUOTES , 'ISO-8859-1');

htmlentities() formats your input into UTF8 and html_entity_decode() formats it back to ISO-8859-1.


You need to use the iconv package, specifically its iconv function.


set meta tag in head as

 <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" /> 

use the link http://www.i18nqa.com/debug/utf8-debug.html to replace the symbols character you want.

then use str_replace like

    $find = array('“', '’', '…', '—', '–', '‘', 'é', 'Â', '•', 'Ëœ', 'â€'); // en dash
                        $replace = array('“', '’', '…', '—', '–', '‘', 'é', '', '•', '˜', '”');
$content = str_replace($find, $replace, $content);

Its the method i use and help alot. Thanks!


First of all, don't use different encodings. It leads to a mess, and UTF-8 is definitely the one you should be using everywhere.

Chances are your input is not ISO-8859-1, but something else (ISO-8859-15, Windows-1252). To convert from those, use iconv or mb_convert_encoding.

Nevertheless, utf8_encode and utf8_decode should work for ISO-8859-1. It would be nice if you could post a link to a file or a uuencoded or base64 example string for which the conversion fails or yields unexpected results.


You need to use the iconv package, specifically its iconv function.


function parseUtf8ToIso88591(&$string){
     if(!is_null($string)){
            $iso88591_1 = utf8_decode($string);
            $iso88591_2 = iconv('UTF-8', 'ISO-8859-1', $string);
            $string = mb_convert_encoding($string, 'ISO-8859-1', 'UTF-8');       
     }
}

First of all, don't use different encodings. It leads to a mess, and UTF-8 is definitely the one you should be using everywhere.

Chances are your input is not ISO-8859-1, but something else (ISO-8859-15, Windows-1252). To convert from those, use iconv or mb_convert_encoding.

Nevertheless, utf8_encode and utf8_decode should work for ISO-8859-1. It would be nice if you could post a link to a file or a uuencoded or base64 example string for which the conversion fails or yields unexpected results.


In my case after files with names containing those characters were uploaded, they were not even visible with Filezilla! In Cpanel filemanager they were shown with ? (under black background). And this combination made it shown correctly on the browser (HTML document is Western-encoded):

$dspFileName = utf8_decode(htmlspecialchars(iconv(mb_internal_encoding(), 'utf-8', basename($thisFile['path']))) );

function parseUtf8ToIso88591(&$string){
     if(!is_null($string)){
            $iso88591_1 = utf8_decode($string);
            $iso88591_2 = iconv('UTF-8', 'ISO-8859-1', $string);
            $string = mb_convert_encoding($string, 'ISO-8859-1', 'UTF-8');       
     }
}

I used:

function utf8_to_html ($data) {
    return preg_replace(
        array (
            '/ä/',
            '/ö/',
            '/ü/',
            '/é/',
            '/à/',
            '/è/'
        ),
        array (
            '&auml;',
            '&ouml;',
            '&uuml;',
            '&eacute;',
            '&agrave;',
            '&egrave;'
        ),
        $data 
    );
}

You need to use the iconv package, specifically its iconv function.


It is much better to use

$value = mb_convert_encode($value,'HTML-ENTITIES','UTF-8');

Specially when you are using AJAX call for submitting 'ISO-8859-1' characters. It works for Chinese, Japanese, Czech, German and many more languages.


I use this function:

function formatcell($data, $num, $fill=" ") {
    $data = trim($data);
    $data=str_replace(chr(13),' ',$data);
    $data=str_replace(chr(10),' ',$data);
    // translate UTF8 to English characters
    $data = iconv('UTF-8', 'ASCII//TRANSLIT', $data);
    $data = preg_replace("/[\'\"\^\~\`]/i", '', $data);


    // fill it up with spaces
    for ($i = strlen($data); $i < $num; $i++) {
        $data .= $fill;
    }
    // limit string to num characters
   $data = substr($data, 0, $num);

    return $data;
}


echo formatcell("YES UTF8 String Zürich", 25, 'x'); //YES UTF8 String Zürichxxx
echo formatcell("NON UTF8 String Zurich", 25, 'x'); //NON UTF8 String Zurichxxx

Check out my function in my blog http://www.unexpectedit.com/php/php-handling-non-english-characters-utf8


set meta tag in head as

 <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" /> 

use the link http://www.i18nqa.com/debug/utf8-debug.html to replace the symbols character you want.

then use str_replace like

    $find = array('“', '’', '…', '—', '–', '‘', 'é', 'Â', '•', 'Ëœ', 'â€'); // en dash
                        $replace = array('“', '’', '…', '—', '–', '‘', 'é', '', '•', '˜', '”');
$content = str_replace($find, $replace, $content);

Its the method i use and help alot. Thanks!


It is much better to use

$value = mb_convert_encode($value,'HTML-ENTITIES','UTF-8');

Specially when you are using AJAX call for submitting 'ISO-8859-1' characters. It works for Chinese, Japanese, Czech, German and many more languages.


In my case after files with names containing those characters were uploaded, they were not even visible with Filezilla! In Cpanel filemanager they were shown with ? (under black background). And this combination made it shown correctly on the browser (HTML document is Western-encoded):

$dspFileName = utf8_decode(htmlspecialchars(iconv(mb_internal_encoding(), 'utf-8', basename($thisFile['path']))) );

First of all, don't use different encodings. It leads to a mess, and UTF-8 is definitely the one you should be using everywhere.

Chances are your input is not ISO-8859-1, but something else (ISO-8859-15, Windows-1252). To convert from those, use iconv or mb_convert_encoding.

Nevertheless, utf8_encode and utf8_decode should work for ISO-8859-1. It would be nice if you could post a link to a file or a uuencoded or base64 example string for which the conversion fails or yields unexpected results.


You need to use the iconv package, specifically its iconv function.


I use this function:

function formatcell($data, $num, $fill=" ") {
    $data = trim($data);
    $data=str_replace(chr(13),' ',$data);
    $data=str_replace(chr(10),' ',$data);
    // translate UTF8 to English characters
    $data = iconv('UTF-8', 'ASCII//TRANSLIT', $data);
    $data = preg_replace("/[\'\"\^\~\`]/i", '', $data);


    // fill it up with spaces
    for ($i = strlen($data); $i < $num; $i++) {
        $data .= $fill;
    }
    // limit string to num characters
   $data = substr($data, 0, $num);

    return $data;
}


echo formatcell("YES UTF8 String Zürich", 25, 'x'); //YES UTF8 String Zürichxxx
echo formatcell("NON UTF8 String Zurich", 25, 'x'); //NON UTF8 String Zurichxxx

Check out my function in my blog http://www.unexpectedit.com/php/php-handling-non-english-characters-utf8


Use html_entity_decode() and htmlentities().

$html = html_entity_decode(htmlentities($html, ENT_QUOTES, 'UTF-8'), ENT_QUOTES , 'ISO-8859-1');

htmlentities() formats your input into UTF8 and html_entity_decode() formats it back to ISO-8859-1.


Examples related to php

I am receiving warning in Facebook Application using PHP SDK Pass PDO prepared statement to variables Parse error: syntax error, unexpected [ Preg_match backtrack error Removing "http://" from a string How do I hide the PHP explode delimiter from submitted form results? Problems with installation of Google App Engine SDK for php in OS X Laravel 4 with Sentry 2 add user to a group on Registration php & mysql query not echoing in html with tags? How do I show a message in the foreach loop?

Examples related to encoding

How to check encoding of a CSV file UnicodeEncodeError: 'ascii' codec can't encode character at special name Using Javascript's atob to decode base64 doesn't properly decode utf-8 strings What is the difference between utf8mb4 and utf8 charsets in MySQL? The character encoding of the plain text document was not declared - mootool script UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 23: ordinal not in range(128) How to encode text to base64 in python UTF-8 output from PowerShell Set Encoding of File to UTF8 With BOM in Sublime Text 3 Replace non-ASCII characters with a single space

Examples related to utf-8

error UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte Changing PowerShell's default output encoding to UTF-8 'Malformed UTF-8 characters, possibly incorrectly encoded' in Laravel Encoding Error in Panda read_csv Using Javascript's atob to decode base64 doesn't properly decode utf-8 strings What is the difference between utf8mb4 and utf8 charsets in MySQL? what is <meta charset="utf-8">? Pandas df.to_csv("file.csv" encode="utf-8") still gives trash characters for minus sign UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 23: ordinal not in range(128) Android Studio : unmappable character for encoding UTF-8

Examples related to iso-8859-1

What is the difference between UTF-8 and ISO-8859-1? C# Convert string from UTF-8 to ISO-8859-1 (Latin1) H HTML encoding issues - "Â" character showing up instead of "&nbsp;" Converting UTF-8 to ISO-8859-1 in Java - how to keep it as single byte How do I convert between ISO-8859-1 and UTF-8 in Java? Convert utf8-characters to iso-88591 and back in PHP How do I write out a text file in C# with a code page other than UTF-8? Setting the character encoding in form submit for Internet Explorer