I have php document signup.php which save the content from form (in form.php document) to MySQL base. The problem arises when I want to reformat the input content. I want do decode UTF-8 charachters like à->a.
$first_name=$_POST['first_name'];
$last_name=$_POST['last_name'];
$course=$_POST['course'];
$chain="prêt-à-porter";
$pattern = array("'é'", "'è'", "'ë'", "'ê'", "'É'", "'È'", "'Ë'", "'Ê'", "'á'", "'à'", "'ä'", "'â'", "'å'", "'Á'", "'À'", "'Ä'", "'Â'", "'Å'", "'ó'", "'ò'", "'ö'", "'ô'", "'Ó'", "'Ò'", "'Ö'", "'Ô'", "'í'", "'ì'", "'ï'", "'î'", "'Í'", "'Ì'", "'Ï'", "'Î'", "'ú'", "'ù'", "'ü'", "'û'", "'Ú'", "'Ù'", "'Ü'", "'Û'", "'ý'", "'ÿ'", "'Ý'", "'ø'", "'Ø'", "'œ'", "'Œ'", "'Æ'", "'ç'", "'Ç'");
$replace = array('e', 'e', 'e', 'e', 'E', 'E', 'E', 'E', 'a', 'a', 'a', 'a', 'a', 'A', 'A', 'A', 'A', 'A', 'o', 'o', 'o', 'o', 'O', 'O', 'O', 'O', 'i', 'i', 'i', 'I', 'I', 'I', 'I', 'I', 'u', 'u', 'u', 'u', 'U', 'U', 'U', 'U', 'y', 'y', 'Y', 'o', 'O', 'a', 'A', 'A', 'c', 'C');
$chain = preg_replace($pattern, $replace, $chain);
echo $chain; // print pret-a-porter
$first_name = preg_replace($pattern, $replace, $first_name);
echo $first_name; // does not change the input!?!
Why it works perfectly for $chain, but for $first_name or $last_name doesnt work?
Also i try
echo $first_name; // print áááááábéééééébšššš
$trans = array("á" => "a", "é" => "e", "š" => "s");
echo strtr("áááááábéééééébšššš", $trans); // print aaaaaabeeeeeebssss
echo strtr($first_name,$trans); // print áááááábéééééébšššš
but the problem, as you can see, is same!
This question is related to
php
utf-8
preg-replace
decode
Wish I found this thread sooner. The function I made (that took me way too long) is below:
function CheckLetters($field){
$letters = [
0 => "a à á â ä æ ã å a",
1 => "c ç c c",
2 => "e é è ê ë e e e",
3 => "i i i í ì ï î",
4 => "l l",
5 => "n ñ n",
6 => "o o ø œ õ ó ò ö ô",
7 => "s ß s š",
8 => "u u ú ù ü û",
9 => "w w",
10 => "y y ÿ",
11 => "z z ž z",
];
foreach ($letters as &$values){
$newValue = substr($values, 0, 1);
$values = substr($values, 2, strlen($values));
$values = explode(" ", $values);
foreach ($values as &$oldValue){
while (strpos($field,$oldValue) !== false){
$field = preg_replace("/" . $oldValue . '/', $newValue, $field, 1);
}
}
}
return $field;
}
Here is a way to have some flexibility in what should be discarded and what should be replaced. This is how I currently do it.
$string = 'À some string with junk I Ä ';
$replace = [
'<' => '', '>' => '', ''' => '', '&' => '',
'"' => '', 'À' => 'A', 'Á' => 'A', 'Â' => 'A', 'Ã' => 'A', 'Ä' => 'Ae',
'Ä' => 'A', 'Å' => 'A', 'A' => 'A', 'A' => 'A', 'A' => 'A', 'Æ' => 'Ae',
'Ç' => 'C', 'C' => 'C', 'C' => 'C', 'C' => 'C', 'C' => 'C', 'D' => 'D', 'Ð' => 'D',
'Ð' => 'D', 'È' => 'E', 'É' => 'E', 'Ê' => 'E', 'Ë' => 'E', 'E' => 'E',
'E' => 'E', 'E' => 'E', 'E' => 'E', 'E' => 'E', 'G' => 'G', 'G' => 'G',
'G' => 'G', 'G' => 'G', 'H' => 'H', 'H' => 'H', 'Ì' => 'I', 'Í' => 'I',
'Î' => 'I', 'Ï' => 'I', 'I' => 'I', 'I' => 'I', 'I' => 'I', 'I' => 'I',
'I' => 'I', '?' => 'IJ', 'J' => 'J', 'K' => 'K', 'L' => 'K', 'L' => 'K',
'L' => 'K', 'L' => 'K', '?' => 'K', 'Ñ' => 'N', 'N' => 'N', 'N' => 'N',
'N' => 'N', '?' => 'N', 'Ò' => 'O', 'Ó' => 'O', 'Ô' => 'O', 'Õ' => 'O',
'Ö' => 'Oe', 'Ö' => 'Oe', 'Ø' => 'O', 'O' => 'O', 'O' => 'O', 'O' => 'O',
'Œ' => 'OE', 'R' => 'R', 'R' => 'R', 'R' => 'R', 'S' => 'S', 'Š' => 'S',
'S' => 'S', 'S' => 'S', '?' => 'S', 'T' => 'T', 'T' => 'T', 'T' => 'T',
'?' => 'T', 'Ù' => 'U', 'Ú' => 'U', 'Û' => 'U', 'Ü' => 'Ue', 'U' => 'U',
'Ü' => 'Ue', 'U' => 'U', 'U' => 'U', 'U' => 'U', 'U' => 'U', 'U' => 'U',
'W' => 'W', 'Ý' => 'Y', 'Y' => 'Y', 'Ÿ' => 'Y', 'Z' => 'Z', 'Ž' => 'Z',
'Z' => 'Z', 'Þ' => 'T', 'à' => 'a', 'á' => 'a', 'â' => 'a', 'ã' => 'a',
'ä' => 'ae', 'ä' => 'ae', 'å' => 'a', 'a' => 'a', 'a' => 'a', 'a' => 'a',
'æ' => 'ae', 'ç' => 'c', 'c' => 'c', 'c' => 'c', 'c' => 'c', 'c' => 'c',
'd' => 'd', 'd' => 'd', 'ð' => 'd', 'è' => 'e', 'é' => 'e', 'ê' => 'e',
'ë' => 'e', 'e' => 'e', 'e' => 'e', 'e' => 'e', 'e' => 'e', 'e' => 'e',
'ƒ' => 'f', 'g' => 'g', 'g' => 'g', 'g' => 'g', 'g' => 'g', 'h' => 'h',
'h' => 'h', 'ì' => 'i', 'í' => 'i', 'î' => 'i', 'ï' => 'i', 'i' => 'i',
'i' => 'i', 'i' => 'i', 'i' => 'i', 'i' => 'i', '?' => 'ij', 'j' => 'j',
'k' => 'k', '?' => 'k', 'l' => 'l', 'l' => 'l', 'l' => 'l', 'l' => 'l',
'?' => 'l', 'ñ' => 'n', 'n' => 'n', 'n' => 'n', 'n' => 'n', '?' => 'n',
'?' => 'n', 'ò' => 'o', 'ó' => 'o', 'ô' => 'o', 'õ' => 'o', 'ö' => 'oe',
'ö' => 'oe', 'ø' => 'o', 'o' => 'o', 'o' => 'o', 'o' => 'o', 'œ' => 'oe',
'r' => 'r', 'r' => 'r', 'r' => 'r', 'š' => 's', 'ù' => 'u', 'ú' => 'u',
'û' => 'u', 'ü' => 'ue', 'u' => 'u', 'ü' => 'ue', 'u' => 'u', 'u' => 'u',
'u' => 'u', 'u' => 'u', 'u' => 'u', 'w' => 'w', 'ý' => 'y', 'ÿ' => 'y',
'y' => 'y', 'ž' => 'z', 'z' => 'z', 'z' => 'z', 'þ' => 't', 'ß' => 'ss',
'?' => 'ss', '??' => 'iy', '?' => 'A', '?' => 'B', '?' => 'V', '?' => 'G',
'?' => 'D', '?' => 'E', '?' => 'YO', '?' => 'ZH', '?' => 'Z', '?' => 'I',
'?' => 'Y', '?' => 'K', '?' => 'L', '?' => 'M', '?' => 'N', '?' => 'O',
'?' => 'P', '?' => 'R', '?' => 'S', '?' => 'T', '?' => 'U', '?' => 'F',
'?' => 'H', '?' => 'C', '?' => 'CH', '?' => 'SH', '?' => 'SCH', '?' => '',
'?' => 'Y', '?' => '', '?' => 'E', '?' => 'YU', '?' => 'YA', '?' => 'a',
'?' => 'b', '?' => 'v', '?' => 'g', '?' => 'd', '?' => 'e', '?' => 'yo',
'?' => 'zh', '?' => 'z', '?' => 'i', '?' => 'y', '?' => 'k', '?' => 'l',
'?' => 'm', '?' => 'n', '?' => 'o', '?' => 'p', '?' => 'r', '?' => 's',
'?' => 't', '?' => 'u', '?' => 'f', '?' => 'h', '?' => 'c', '?' => 'ch',
'?' => 'sh', '?' => 'sch', '?' => '', '?' => 'y', '?' => '', '?' => 'e',
'?' => 'yu', '?' => 'ya'
];
echo str_replace(array_keys($replace), $replace, $string);
CodeIgniter way:
$this->load->helper('text');
$string = convert_accented_characters($string);
This function uses a companion config file application/config/foreign_chars.php
to define the to and from array for transliteration.
https://www.codeigniter.com/user_guide/helpers/text_helper.html#ascii_to_entities
The string $chain is in the same character encoding as the characters in the array - it's possible, even likely, that the $first_name string is in a different encoding, and so those characters don't match. You might want to try using the multibyte string functions instead.
Try mb_convert_encoding. You might also want to try using HTML_ENTITIES as the to_encoding parameter, then you don't need to worry about how the characters will get converted - it will be very predictable.
Assuming your input to this script is in UTF-8, probably not a bad place to start...
$first_name = mb_convert_encoding($first_name, "HTML-ENTITIES", "UTF-8");
As of PHP >= 5.4.0
$translatedString = transliterator_transliterate('Any-Latin; Latin-ASCII; [\u0080-\u7fff] remove', $string);
function correctedText($txt=''){
$ss = str_split($txt);
for($i=0; $i<count($ss); $i++){
$asciiNumber = ord($ss[$i]);// get the ascii dec of a single character
// asciiNumber will be from the DEC column showing at https://www.ascii-code.com
// capital letters only checked
if($asciiNumber >= 192 && $asciiNumber <= 197)$ss[$i] = 'A';
elseif($asciiNumber == 198)$ss[$i] = 'AE';
elseif($asciiNumber == 199)$ss[$i] = 'C';
elseif($asciiNumber >= 200 && $asciiNumber <= 203)$ss[$i] = 'E';
elseif($asciiNumber >= 204 && $asciiNumber <= 207)$ss[$i] = 'I';
elseif($asciiNumber == 209)$ss[$i] = 'N';
elseif($asciiNumber >= 210 && $asciiNumber <= 214)$ss[$i] = 'O';
elseif($asciiNumber == 216)$ss[$i] = 'O';
elseif($asciiNumber >= 217 && $asciiNumber <= 220)$ss[$i] = 'U';
elseif($asciiNumber == 221)$ss[$i] = 'Y';
}
$txt = implode('', $ss);
return $txt;
}
Simple function. Transform strings like 'Ábç Éfg' to 'abc_efg'
/**
* @param $str
* @return mixed
*/
function sanitizeString($str) {
$str = preg_replace('/[áàãâä]/ui', 'a', $str);
$str = preg_replace('/[éèêë]/ui', 'e', $str);
$str = preg_replace('/[íìîï]/ui', 'i', $str);
$str = preg_replace('/[óòõôö]/ui', 'o', $str);
$str = preg_replace('/[úùûü]/ui', 'u', $str);
$str = preg_replace('/[ç]/ui', 'c', $str);
$str = preg_replace('/[^a-z0-9]/i', '_', $str);
$str = preg_replace('/_+/', '_', $str);
return $str;
}
Source: Stackoverflow.com