[java] Converting Symbols, Accent Letters to English Alphabet

The problem with "converting" arbitrary Unicode to ASCII is that the meaning of a character is culture-dependent. For example, “ß” to a German-speaking person should be converted to "ss" while an English-speaker would probably convert it to “B”.

Add to that the fact that Unicode has multiple code points for the same glyphs.

The upshot is that the only way to do this is create a massive table with each Unicode character and the ASCII character you want to convert it to. You can take a shortcut by normalizing characters with accents to normalization form KD, but not all characters normalize to ASCII. In addition, Unicode does not define which parts of a glyph are "accents".

Here is a tiny excerpt from an app that does this:

switch (c)
{
    case 'A':
    case '\u00C0':  //  À LATIN CAPITAL LETTER A WITH GRAVE
    case '\u00C1':  //  Á LATIN CAPITAL LETTER A WITH ACUTE
    case '\u00C2':  //  Â LATIN CAPITAL LETTER A WITH CIRCUMFLEX
    // and so on for about 20 lines...
        return "A";
        break;

    case '\u00C6'://  Æ LATIN CAPITAL LIGATURE AE
        return "AE";
        break;

    // And so on for pages...
}

Examples related to java

Under what circumstances can I call findViewById with an Options Menu / Action Bar item? How much should a function trust another function How to implement a simple scenario the OO way Two constructors How do I get some variable from another class in Java? this in equals method How to split a string in two and store it in a field How to do perspective fixing? String index out of range: 4 My eclipse won't open, i download the bundle pack it keeps saying error log

Examples related to unicode

How to resolve TypeError: can only concatenate str (not "int") to str (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape UnicodeEncodeError: 'ascii' codec can't encode character at special name Python NLTK: SyntaxError: Non-ASCII character '\xc3' in file (Sentiment Analysis -NLP) HTML for the Pause symbol in audio and video control Javascript: Unicode string to hex Concrete Javascript Regex for Accented Characters (Diacritics) Replace non-ASCII characters with a single space UTF-8 in Windows 7 CMD NameError: global name 'unicode' is not defined - in Python 3

Examples related to special-characters

How to use tick / checkmark symbol (?) instead of bullets in unordered list? HTML for the Pause symbol in audio and video control How to run mysql command on bash? Which characters need to be escaped when using Bash? Matching special characters and letters in regex jQuery: Check if special characters exists in string Checking if a character is a special character in Java How to display special characters in PHP How should I escape commas and speech marks in CSV files so they work in Excel? grep for special characters in Unix

Examples related to diacritics

Is there a way to get rid of accents and convert a whole string to regular letters? Converting Symbols, Accent Letters to English Alphabet Remove accents/diacritics in a string in JavaScript What is the best way to remove accents (normalize) in a Python unicode string? How do I remove diacritics (accents) from a string in .NET? Microsoft Excel mangles Diacritics in .csv files?