[java] Convert International String to \u Codes in java

There are three parts to the answer

  1. Get the Unicode for each character
  2. Determine if it is in the Cyrillic Page
  3. Convert to Hexadecimal.

To get each character you can iterate through the String using the charAt() or toCharArray() methods.

for( char c : s.toCharArray() )

The value of the char is the Unicode value.

The Cyrillic Unicode characters are any character in the following ranges:

Cyrillic:            U+0400–U+04FF ( 1024 -  1279)
Cyrillic Supplement: U+0500–U+052F ( 1280 -  1327)
Cyrillic Extended-A: U+2DE0–U+2DFF (11744 - 11775)
Cyrillic Extended-B: U+A640–U+A69F (42560 - 42655)

If it is in this range it is Cyrillic. Just perform an if check. If it is in the range use Integer.toHexString() and prepend the "\\u". Put together it should look something like this:

final int[][] ranges = new int[][]{ 
        {  1024,  1279 }, 
        {  1280,  1327 }, 
        { 11744, 11775 }, 
        { 42560, 42655 },
    };
StringBuilder b = new StringBuilder();

for( char c : s.toCharArray() ){
    int[] insideRange = null;
    for( int[] range : ranges ){
        if( range[0] <= c && c <= range[1] ){
            insideRange = range;
            break;
        }
    }

    if( insideRange != null ){
        b.append( "\\u" ).append( Integer.toHexString(c) );
    }else{
        b.append( c );
    }
}

return b.toString();

Edit: probably should make the check c < 128 and reverse the if and the else bodies; you probably should escape everything that isn't ASCII. I was probably too literal in my reading of your question.

Examples related to java

Under what circumstances can I call findViewById with an Options Menu / Action Bar item? How much should a function trust another function How to implement a simple scenario the OO way Two constructors How do I get some variable from another class in Java? this in equals method How to split a string in two and store it in a field How to do perspective fixing? String index out of range: 4 My eclipse won't open, i download the bundle pack it keeps saying error log

Examples related to unicode

How to resolve TypeError: can only concatenate str (not "int") to str (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape UnicodeEncodeError: 'ascii' codec can't encode character at special name Python NLTK: SyntaxError: Non-ASCII character '\xc3' in file (Sentiment Analysis -NLP) HTML for the Pause symbol in audio and video control Javascript: Unicode string to hex Concrete Javascript Regex for Accented Characters (Diacritics) Replace non-ASCII characters with a single space UTF-8 in Windows 7 CMD NameError: global name 'unicode' is not defined - in Python 3

Examples related to escaping

Uses for the '&quot;' entity in HTML Javascript - How to show escape characters in a string? How to print a single backslash? How to escape special characters of a string with single backslashes Saving utf-8 texts with json.dumps as UTF8, not as \u escape sequence Properly escape a double quote in CSV How to Git stash pop specific stash in 1.8.3? In Java, should I escape a single quotation mark (') in String (double quoted)? How do I escape a single quote ( ' ) in JavaScript? Which characters need to be escaped when using Bash?

Examples related to unicode-escapes

Placing Unicode character in CSS content value Convert International String to \u Codes in java