[php] PHP replacing special characters like à->a, è->e

I have php document signup.php which save the content from form (in form.php document) to MySQL base. The problem arises when I want to reformat the input content. I want do decode UTF-8 charachters like à->a.

  $first_name=$_POST['first_name'];
  $last_name=$_POST['last_name'];
  $course=$_POST['course'];

  $chain="prêt-à-porter";

$pattern = array("'é'", "'è'", "'ë'", "'ê'", "'É'", "'È'", "'Ë'", "'Ê'", "'á'", "'à'", "'ä'", "'â'", "'å'", "'Á'", "'À'", "'Ä'", "'Â'", "'Å'", "'ó'", "'ò'", "'ö'", "'ô'", "'Ó'", "'Ò'", "'Ö'", "'Ô'", "'í'", "'ì'", "'ï'", "'î'", "'Í'", "'Ì'", "'Ï'", "'Î'", "'ú'", "'ù'", "'ü'", "'û'", "'Ú'", "'Ù'", "'Ü'", "'Û'", "'ý'", "'ÿ'", "'Ý'", "'ø'", "'Ø'", "'œ'", "'Œ'", "'Æ'", "'ç'", "'Ç'");

$replace = array('e', 'e', 'e', 'e', 'E', 'E', 'E', 'E', 'a', 'a', 'a', 'a', 'a', 'A', 'A', 'A', 'A', 'A', 'o', 'o', 'o', 'o', 'O', 'O', 'O', 'O', 'i', 'i', 'i', 'I', 'I', 'I', 'I', 'I', 'u', 'u', 'u', 'u', 'U', 'U', 'U', 'U', 'y', 'y', 'Y', 'o', 'O', 'a', 'A', 'A', 'c', 'C'); 

$chain = preg_replace($pattern, $replace, $chain);

echo $chain; // print pret-a-porter

$first_name =  preg_replace($pattern, $replace, $first_name);

echo $first_name; // does not change the input!?!

Why it works perfectly for $chain, but for $first_name or $last_name doesnt work?

Also i try

echo $first_name; // print áááááábéééééébšššš
$trans = array("á" => "a", "é" => "e", "š" => "s");
echo strtr("áááááábéééééébšššš", $trans); // print aaaaaabeeeeeebssss
echo strtr($first_name,$trans);  // print áááááábéééééébšššš

but the problem, as you can see, is same!

This question is related to php utf-8 preg-replace decode

The answer is


Wish I found this thread sooner. The function I made (that took me way too long) is below:

function CheckLetters($field){
    $letters = [
        0 => "a à á â ä æ ã å a",
        1 => "c ç c c",
        2 => "e é è ê ë e e e",
        3 => "i i i í ì ï î",
        4 => "l l",
        5 => "n ñ n",
        6 => "o o ø œ õ ó ò ö ô",
        7 => "s ß s š",
        8 => "u u ú ù ü û",
        9 => "w w",
        10 => "y y ÿ",
        11 => "z z ž z",
    ];
    foreach ($letters as &$values){
        $newValue = substr($values, 0, 1);
        $values = substr($values, 2, strlen($values));
        $values = explode(" ", $values);
        foreach ($values as &$oldValue){
            while (strpos($field,$oldValue) !== false){
                $field = preg_replace("/" . $oldValue . '/', $newValue, $field, 1);
            }
        }
    }
    return $field;
}

Here is a way to have some flexibility in what should be discarded and what should be replaced. This is how I currently do it.

$string = 'À some string with junk I Ä ';

$replace = [
    '<' => '', '>' => '', ''' => '', '&' => '',
    '"' => '', 'À' => 'A', 'Á' => 'A', 'Â' => 'A', 'Ã' => 'A', 'Ä' => 'Ae',
    'Ä' => 'A', 'Å' => 'A', 'A' => 'A', 'A' => 'A', 'A' => 'A', 'Æ' => 'Ae',
    'Ç' => 'C', 'C' => 'C', 'C' => 'C', 'C' => 'C', 'C' => 'C', 'D' => 'D', 'Ð' => 'D',
    'Ð' => 'D', 'È' => 'E', 'É' => 'E', 'Ê' => 'E', 'Ë' => 'E', 'E' => 'E',
    'E' => 'E', 'E' => 'E', 'E' => 'E', 'E' => 'E', 'G' => 'G', 'G' => 'G',
    'G' => 'G', 'G' => 'G', 'H' => 'H', 'H' => 'H', 'Ì' => 'I', 'Í' => 'I',
    'Î' => 'I', 'Ï' => 'I', 'I' => 'I', 'I' => 'I', 'I' => 'I', 'I' => 'I',
    'I' => 'I', '?' => 'IJ', 'J' => 'J', 'K' => 'K', 'L' => 'K', 'L' => 'K',
    'L' => 'K', 'L' => 'K', '?' => 'K', 'Ñ' => 'N', 'N' => 'N', 'N' => 'N',
    'N' => 'N', '?' => 'N', 'Ò' => 'O', 'Ó' => 'O', 'Ô' => 'O', 'Õ' => 'O',
    'Ö' => 'Oe', 'Ö' => 'Oe', 'Ø' => 'O', 'O' => 'O', 'O' => 'O', 'O' => 'O',
    'Œ' => 'OE', 'R' => 'R', 'R' => 'R', 'R' => 'R', 'S' => 'S', 'Š' => 'S',
    'S' => 'S', 'S' => 'S', '?' => 'S', 'T' => 'T', 'T' => 'T', 'T' => 'T',
    '?' => 'T', 'Ù' => 'U', 'Ú' => 'U', 'Û' => 'U', 'Ü' => 'Ue', 'U' => 'U',
    'Ü' => 'Ue', 'U' => 'U', 'U' => 'U', 'U' => 'U', 'U' => 'U', 'U' => 'U',
    'W' => 'W', 'Ý' => 'Y', 'Y' => 'Y', 'Ÿ' => 'Y', 'Z' => 'Z', 'Ž' => 'Z',
    'Z' => 'Z', 'Þ' => 'T', 'à' => 'a', 'á' => 'a', 'â' => 'a', 'ã' => 'a',
    'ä' => 'ae', 'ä' => 'ae', 'å' => 'a', 'a' => 'a', 'a' => 'a', 'a' => 'a',
    'æ' => 'ae', 'ç' => 'c', 'c' => 'c', 'c' => 'c', 'c' => 'c', 'c' => 'c',
    'd' => 'd', 'd' => 'd', 'ð' => 'd', 'è' => 'e', 'é' => 'e', 'ê' => 'e',
    'ë' => 'e', 'e' => 'e', 'e' => 'e', 'e' => 'e', 'e' => 'e', 'e' => 'e',
    'ƒ' => 'f', 'g' => 'g', 'g' => 'g', 'g' => 'g', 'g' => 'g', 'h' => 'h',
    'h' => 'h', 'ì' => 'i', 'í' => 'i', 'î' => 'i', 'ï' => 'i', 'i' => 'i',
    'i' => 'i', 'i' => 'i', 'i' => 'i', 'i' => 'i', '?' => 'ij', 'j' => 'j',
    'k' => 'k', '?' => 'k', 'l' => 'l', 'l' => 'l', 'l' => 'l', 'l' => 'l',
    '?' => 'l', 'ñ' => 'n', 'n' => 'n', 'n' => 'n', 'n' => 'n', '?' => 'n',
    '?' => 'n', 'ò' => 'o', 'ó' => 'o', 'ô' => 'o', 'õ' => 'o', 'ö' => 'oe',
    'ö' => 'oe', 'ø' => 'o', 'o' => 'o', 'o' => 'o', 'o' => 'o', 'œ' => 'oe',
    'r' => 'r', 'r' => 'r', 'r' => 'r', 'š' => 's', 'ù' => 'u', 'ú' => 'u',
    'û' => 'u', 'ü' => 'ue', 'u' => 'u', 'ü' => 'ue', 'u' => 'u', 'u' => 'u',
    'u' => 'u', 'u' => 'u', 'u' => 'u', 'w' => 'w', 'ý' => 'y', 'ÿ' => 'y',
    'y' => 'y', 'ž' => 'z', 'z' => 'z', 'z' => 'z', 'þ' => 't', 'ß' => 'ss',
    '?' => 'ss', '??' => 'iy', '?' => 'A', '?' => 'B', '?' => 'V', '?' => 'G',
    '?' => 'D', '?' => 'E', '?' => 'YO', '?' => 'ZH', '?' => 'Z', '?' => 'I',
    '?' => 'Y', '?' => 'K', '?' => 'L', '?' => 'M', '?' => 'N', '?' => 'O',
    '?' => 'P', '?' => 'R', '?' => 'S', '?' => 'T', '?' => 'U', '?' => 'F',
    '?' => 'H', '?' => 'C', '?' => 'CH', '?' => 'SH', '?' => 'SCH', '?' => '',
    '?' => 'Y', '?' => '', '?' => 'E', '?' => 'YU', '?' => 'YA', '?' => 'a',
    '?' => 'b', '?' => 'v', '?' => 'g', '?' => 'd', '?' => 'e', '?' => 'yo',
    '?' => 'zh', '?' => 'z', '?' => 'i', '?' => 'y', '?' => 'k', '?' => 'l',
    '?' => 'm', '?' => 'n', '?' => 'o', '?' => 'p', '?' => 'r', '?' => 's',
    '?' => 't', '?' => 'u', '?' => 'f', '?' => 'h', '?' => 'c', '?' => 'ch',
    '?' => 'sh', '?' => 'sch', '?' => '', '?' => 'y', '?' => '', '?' => 'e',
    '?' => 'yu', '?' => 'ya'
];

echo str_replace(array_keys($replace), $replace, $string);  

CodeIgniter way:

$this->load->helper('text');

$string = convert_accented_characters($string);

This function uses a companion config file application/config/foreign_chars.php to define the to and from array for transliteration.

https://www.codeigniter.com/user_guide/helpers/text_helper.html#ascii_to_entities


The string $chain is in the same character encoding as the characters in the array - it's possible, even likely, that the $first_name string is in a different encoding, and so those characters don't match. You might want to try using the multibyte string functions instead.

Try mb_convert_encoding. You might also want to try using HTML_ENTITIES as the to_encoding parameter, then you don't need to worry about how the characters will get converted - it will be very predictable.

Assuming your input to this script is in UTF-8, probably not a bad place to start...

$first_name = mb_convert_encoding($first_name, "HTML-ENTITIES", "UTF-8"); 

As of PHP >= 5.4.0

$translatedString = transliterator_transliterate('Any-Latin; Latin-ASCII; [\u0080-\u7fff] remove', $string);

function correctedText($txt=''){
  $ss = str_split($txt);
    
  for($i=0; $i<count($ss); $i++){
    $asciiNumber = ord($ss[$i]);// get the ascii dec of a single character
    
    // asciiNumber will be from the DEC column showing at https://www.ascii-code.com
    
    // capital letters only checked 
    if($asciiNumber >= 192 && $asciiNumber <= 197)$ss[$i] = 'A';
    elseif($asciiNumber == 198)$ss[$i] = 'AE';
    elseif($asciiNumber == 199)$ss[$i] = 'C';
    elseif($asciiNumber >= 200 && $asciiNumber <= 203)$ss[$i] = 'E';
    elseif($asciiNumber >= 204 && $asciiNumber <= 207)$ss[$i] = 'I';
    elseif($asciiNumber == 209)$ss[$i] = 'N';
    elseif($asciiNumber >= 210 && $asciiNumber <= 214)$ss[$i] = 'O';
    elseif($asciiNumber == 216)$ss[$i] = 'O';
    elseif($asciiNumber >= 217 && $asciiNumber <= 220)$ss[$i] = 'U';
    elseif($asciiNumber == 221)$ss[$i] = 'Y';
  }
    
  $txt = implode('', $ss);
    
  return $txt;
}

Simple function. Transform strings like 'Ábç Éfg' to 'abc_efg'

/**
 * @param $str
 * @return mixed
 */
function sanitizeString($str) {
    $str = preg_replace('/[áàãâä]/ui', 'a', $str);
    $str = preg_replace('/[éèêë]/ui', 'e', $str);
    $str = preg_replace('/[íìîï]/ui', 'i', $str);
    $str = preg_replace('/[óòõôö]/ui', 'o', $str);
    $str = preg_replace('/[úùûü]/ui', 'u', $str);
    $str = preg_replace('/[ç]/ui', 'c', $str);
    $str = preg_replace('/[^a-z0-9]/i', '_', $str);
    $str = preg_replace('/_+/', '_', $str);

    return $str;
}

Examples related to php

I am receiving warning in Facebook Application using PHP SDK Pass PDO prepared statement to variables Parse error: syntax error, unexpected [ Preg_match backtrack error Removing "http://" from a string How do I hide the PHP explode delimiter from submitted form results? Problems with installation of Google App Engine SDK for php in OS X Laravel 4 with Sentry 2 add user to a group on Registration php & mysql query not echoing in html with tags? How do I show a message in the foreach loop?

Examples related to utf-8

error UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte Changing PowerShell's default output encoding to UTF-8 'Malformed UTF-8 characters, possibly incorrectly encoded' in Laravel Encoding Error in Panda read_csv Using Javascript's atob to decode base64 doesn't properly decode utf-8 strings What is the difference between utf8mb4 and utf8 charsets in MySQL? what is <meta charset="utf-8">? Pandas df.to_csv("file.csv" encode="utf-8") still gives trash characters for minus sign UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 23: ordinal not in range(128) Android Studio : unmappable character for encoding UTF-8

Examples related to preg-replace

Replace deprecated preg_replace /e with preg_replace_callback Replace preg_replace() e modifier with preg_replace_callback PHP replacing special characters like à->a, è->e PHP preg replace only allow numbers PHP remove special character from string Replacing accented characters php How to replace <span style="font-weight: bold;">foo</span> by <strong>foo</strong> using PHP and regex? Remove multiple whitespaces

Examples related to decode

How to decode encrypted wordpress admin password? How to decode a QR-code image in (preferably pure) Python? Write Base64-encoded image to file Base64 Java encode and decode a string Android - How to decode and decompile any APK file? UnicodeEncodeError: 'charmap' codec can't encode - character maps to <undefined>, print function PHP replacing special characters like à->a, è->e UnicodeDecodeError: 'charmap' codec can't decode byte X in position Y: character maps to <undefined> How do I decode a string with escaped unicode? UnicodeDecodeError, invalid continuation byte