There are three parts to the answer
To get each character you can iterate through the String using the charAt()
or toCharArray()
methods.
for( char c : s.toCharArray() )
The value of the char is the Unicode value.
The Cyrillic Unicode characters are any character in the following ranges:
Cyrillic: U+0400–U+04FF ( 1024 - 1279)
Cyrillic Supplement: U+0500–U+052F ( 1280 - 1327)
Cyrillic Extended-A: U+2DE0–U+2DFF (11744 - 11775)
Cyrillic Extended-B: U+A640–U+A69F (42560 - 42655)
If it is in this range it is Cyrillic. Just perform an if check. If it is in the range use Integer.toHexString()
and prepend the "\\u"
. Put together it should look something like this:
final int[][] ranges = new int[][]{
{ 1024, 1279 },
{ 1280, 1327 },
{ 11744, 11775 },
{ 42560, 42655 },
};
StringBuilder b = new StringBuilder();
for( char c : s.toCharArray() ){
int[] insideRange = null;
for( int[] range : ranges ){
if( range[0] <= c && c <= range[1] ){
insideRange = range;
break;
}
}
if( insideRange != null ){
b.append( "\\u" ).append( Integer.toHexString(c) );
}else{
b.append( c );
}
}
return b.toString();
Edit: probably should make the check c < 128
and reverse the if
and the else
bodies; you probably should escape everything that isn't ASCII. I was probably too literal in my reading of your question.