Java String encoding UTF-8

Question

I have come across this line of legacy code  which I am trying to figure out   String newString   new String oldString getBytes  UTF-8     UTF-8       As far as I can understand  it is encoding  amp  decoding using the same charSet   How is this different from the following   String newString   oldString    Is there any scenario in which the two lines will have different outputs    p s   Just to clarify  yes I am aware of the excellent article on encoding by Joel Spolsky

User · Accepted Answer

This could be complicated way of doing  String newString   new String oldString     This shortens the String is the underlying char   used is much longer   However more specifically it will be checking that every character can be UTF-8 encoded   There are some  characters  you can have in a String which cannot be encoded and these would be turned into    Any character between  uD800 and  uDFFF cannot be encoded and will be turned into      String oldString     uD800   String newString   new String oldString getBytes  UTF-8     UTF-8    System out println newString equals oldString      prints  false

User · Answer

How is this different from the following    This line of code here   String newString   new String oldString getBytes  UTF-8     UTF-8       constructs a new String object  i e  a copy of oldString   while this line of code   String newString   oldString    declares a new variable of type java lang String and initializes it to refer to the same String object as the variable oldString      Is there any scenario in which the two lines will have different outputs    Absolutely   String newString   oldString  boolean isSameInstance   newString    oldString     isSameInstance    true   vs   String newString   new String oldString getBytes  UTF-8     UTF-8         isSameInstance    false  in most cases      boolean isSameInstance   newString    oldString    a horse with no name  see comment  is right of course  The equivalent of   String newString   new String oldString getBytes  UTF-8     UTF-8       is  String newString   new String oldString     minus the subtle difference wrt the encoding that Peter Lawrey explains in his answer

[java] Java String encoding (UTF-8)

Examples related to java

Examples related to string

Examples related to encoding