Convert International String to u Codes in java

Question

How can I convert an international  e g  Russian  String to  u numbers  unicode numbers  e g   u041e u041a for OK

User · Answer

There is an Open Source java library MgntUtils that has a Utility that converts Strings to unicode sequence and vise versa:

result = "Hello World";
result = StringUnicodeEncoderDecoder.encodeStringToUnicodeSequence(result);
System.out.println(result);
result = StringUnicodeEncoderDecoder.decodeUnicodeSequenceToString(result);
System.out.println(result);

The output of this code is:

\u0048\u0065\u006c\u006c\u006f\u0020\u0057\u006f\u0072\u006c\u0064
Hello World

The library can be found at Maven Central or at Github It comes as maven artifact and with sources and javadoc

Here is javadoc for the class StringUnicodeEncoderDecoder

User · Answer

I also had this problem  I had some Portuguese text with some special characters  but these characters where already in unicode format  ex    u00e3    So I want to convert S u00e3o to S  o   I did it using the apache commons StringEscapeUtils  As  sorin-sbarnea said  Can be downloaded here   Use the method unescapeJava  like this   String text    S u00e3o  text   StringEscapeUtils unescapeJava text   System out println  text     text      There is also the method escapeJava  but this one puts the unicode characters in the string    If any one knows a solution on pure Java  please tell us

User · Answer

You could use escapeJavaStyleString from org apache commons lang StringEscapeUtils

User · Answer

There s a command-line tool that ships with java called native2ascii  This converts unicode files to ASCII-escaped files  I ve found that this is a necessary step for generating  properties files for localization

User · Answer

Just some basic Methods for that  inspired from native2ascii tool           Encode a String like        to  u00e4 u00f6 u00fc         param text     return     public String native2ascii String text        if  text    null          return text      StringBuilder sb   new StringBuilder        for  char ch   text toCharArray              sb append native2ascii ch              return sb toString              Encode a Character like    to  u00e4         param ch     return     public String native2ascii char ch        if  ch  gt    u007f             StringBuilder sb   new StringBuilder               write  udddd         sb append    u            StringBuffer hex   new StringBuffer Integer toHexString ch            hex reverse            int length   4 - hex length            for  int j   0  j  lt  length  j                  hex append  0                      for  int j   0  j  lt  4  j                  sb append hex charAt 3 - j                      return sb toString          else           return Character toString ch

User · Answer

this type name is Decode Unescape Unicode  this site link online convertor

User · Answer

In case you need this to write a  properties file you can just add the Strings into a Properties object and then save it to a file  It will take care for the conversion

User · Answer

There are three parts to the answer   Get the Unicode for each character Determine if it is in the Cyrillic Page Convert to Hexadecimal    To get each character you can iterate through the String using the charAt   or toCharArray   methods   for  char c   s toCharArray       The value of the char is the Unicode value    The Cyrillic Unicode characters are any character in the following ranges   Cyrillic             U 0400   U 04FF   1024 -  1279  Cyrillic Supplement  U 0500   U 052F   1280 -  1327  Cyrillic Extended-A  U 2DE0   U 2DFF  11744 - 11775  Cyrillic Extended-B  U A640   U A69F  42560 - 42655    If it is in this range it is Cyrillic  Just perform an if check  If it is in the range use Integer toHexString   and prepend the    u   Put together it should look something like this   final int     ranges   new int                  1024   1279                1280   1327               11744  11775               42560  42655           StringBuilder b   new StringBuilder     for  char c   s toCharArray          int   insideRange   null      for  int   range   ranges            if  range 0   lt   c  amp  amp  c  lt   range 1                 insideRange   range              break                       if  insideRange    null            b append     u    append  Integer toHexString c          else          b append  c             return b toString        Edit  probably should make the check c  lt  128 and reverse the if and the else bodies  you probably should escape everything that isn t ASCII  I was probably too literal in my reading of your question

User · Answer

You could probably hack if from this JavaScript code      convert  to  uD83D uDE4C    function text to unicode string       use strict      function is whitespace c    return 9     c    10     c    13     c    32     c       function left pad string    return Array 4  concat string  join  0   slice -1   Math max 4  string length         string   string split     map function c   return    u    left pad c charCodeAt 0  toString 16  toUpperCase        join         return string         convert  uD83D uDE4C to     function unicode to text string      var  prefix        u         regex    new RegExp prefix       da-f  4     ig              string   string replace regex  function match  backtrace1       return String fromCharCode  parseInt backtrace1  16             return string      source  iCompile - Yet Another JavaScript Unicode Encode Decode

User · Answer

Apache commons StringEscapeUtils escapeEcmaScript String  returns a string with unicode characters escaped using the  u notation    Art of Beer    - gt   Art of Beer  u1F3A8  u1F37A

User · Answer

there is a JDK tools executed via command line as following     native2ascii -encoding utf8 src txt output txt   Example     src txt                           output txt    u0628 u0633 u0645  u0627 u0644 u0644 u0647  u0627 u0644 u0631 u062d u0645 u0646  u0627 u0644 u0631 u062d u064a u0645   If you want to use it in your Java application  you can wrap this command line by     String pathSrc      tmp src txt   String pathOut      tmp output txt   String cmdLine    native2ascii -encoding utf8     new File pathSrc  getAbsolutePath           new File pathOut  getAbsolutePath    Runtime getRuntime   exec cmdLine   System out println  THE END      Then read  content of the new file

User · Answer

Here s an improved version of ArtB s answer       StringBuilder b   new StringBuilder         for  char c   input toCharArray              if  c  gt   128              b append    u   append String format   04X    int  c            else             b append c              return b toString      This version escapes all non-ASCII chars and works correctly for low Unicode code points like

[java] Convert International String to \u Codes in java

Examples related to java

Examples related to unicode

Examples related to escaping

Examples related to unicode-escapes