[java] How to convert Strings to and from UTF8 byte arrays in Java

In Java, I have a String and I want to encode it as a byte array (in UTF8, or some other encoding). Alternately, I have a byte array (in some known encoding) and I want to convert it into a Java String. How do I do these conversions?

This question is related to java string encoding character-encoding

The answer is


Convert from String to byte[]:

String s = "some text here";
byte[] b = s.getBytes(StandardCharsets.UTF_8);

Convert from byte[] to String:

byte[] b = {(byte) 99, (byte)97, (byte)116};
String s = new String(b, StandardCharsets.US_ASCII);

You should, of course, use the correct encoding name. My examples used US-ASCII and UTF-8, the two most common encodings.


Here's a solution that avoids performing the Charset lookup for every conversion:

import java.nio.charset.Charset;

private final Charset UTF8_CHARSET = Charset.forName("UTF-8");

String decodeUTF8(byte[] bytes) {
    return new String(bytes, UTF8_CHARSET);
}

byte[] encodeUTF8(String string) {
    return string.getBytes(UTF8_CHARSET);
}

String original = "hello world";
byte[] utf8Bytes = original.getBytes("UTF-8");

You can convert directly via the String(byte[], String) constructor and getBytes(String) method. Java exposes available character sets via the Charset class. The JDK documentation lists supported encodings.

90% of the time, such conversions are performed on streams, so you'd use the Reader/Writer classes. You would not incrementally decode using the String methods on arbitrary byte streams - you would leave yourself open to bugs involving multibyte characters.


My tomcat7 implementation is accepting strings as ISO-8859-1; despite the content-type of the HTTP request. The following solution worked for me when trying to correctly interpret characters like 'é' .

byte[] b1 = szP1.getBytes("ISO-8859-1");
System.out.println(b1.toString());

String szUT8 = new String(b1, "UTF-8");
System.out.println(szUT8);

When trying to interpret the string as US-ASCII, the byte info wasn't correctly interpreted.

b1 = szP1.getBytes("US-ASCII");
System.out.println(b1.toString());

As an alternative, StringUtils from Apache Commons can be used.

 byte[] bytes = {(byte) 1};
 String convertedString = StringUtils.newStringUtf8(bytes);

or

 String myString = "example";
 byte[] convertedBytes = StringUtils.getBytesUtf8(myString);

If you have non-standard charset, you can use getBytesUnchecked() or newString() accordingly.


I can't comment but don't want to start a new thread. But this isn't working. A simple round trip:

byte[] b = new byte[]{ 0, 0, 0, -127 };  // 0x00000081
String s = new String(b,StandardCharsets.UTF_8); // UTF8 = 0x0000, 0x0000,  0x0000, 0xfffd
b = s.getBytes(StandardCharsets.UTF_8); // [0, 0, 0, -17, -65, -67] 0x000000efbfbd != 0x00000081

I'd need b[] the same array before and after encoding which it isn't (this referrers to the first answer).


For decoding a series of bytes to a normal string message I finally got it working with UTF-8 encoding with this code:

/* Convert a list of UTF-8 numbers to a normal String
 * Usefull for decoding a jms message that is delivered as a sequence of bytes instead of plain text
 */
public String convertUtf8NumbersToString(String[] numbers){
    int length = numbers.length;
    byte[] data = new byte[length];

    for(int i = 0; i< length; i++){
        data[i] = Byte.parseByte(numbers[i]);
    }
    return new String(data, Charset.forName("UTF-8"));
}

If you are using 7-bit ASCII or ISO-8859-1 (an amazingly common format) then you don't have to create a new java.lang.String at all. It's much much more performant to simply cast the byte into char:

Full working example:

for (byte b : new byte[] { 43, 45, (byte) 215, (byte) 247 }) {
    char c = (char) b;
    System.out.print(c);
}

If you are not using extended-characters like Ä, Æ, Å, Ç, Ï, Ê and can be sure that the only transmitted values are of the first 128 Unicode characters, then this code will also work for UTF-8 and extended ASCII (like cp-1252).


Reader reader = new BufferedReader(
    new InputStreamReader(
        new ByteArrayInputStream(
            string.getBytes(StandardCharsets.UTF_8)), StandardCharsets.UTF_8));

Charset UTF8_CHARSET = Charset.forName("UTF-8");
String strISO = "{\"name\":\"?\"}";
System.out.println(strISO);
byte[] b = strISO.getBytes();
for (byte c: b) {
    System.out.print("[" + c + "]");
}
String str = new String(b, UTF8_CHARSET);
System.out.println(str);

//query is your json   

 DefaultHttpClient httpClient = new DefaultHttpClient();
 HttpPost postRequest = new HttpPost("http://my.site/test/v1/product/search?qy=");

 StringEntity input = new StringEntity(query, "UTF-8");
 input.setContentType("application/json");
 postRequest.setEntity(input);   
 HttpResponse response=response = httpClient.execute(postRequest);

terribly late but i just encountered this issue and this is my fix:

private static String removeNonUtf8CompliantCharacters( final String inString ) {
    if (null == inString ) return null;
    byte[] byteArr = inString.getBytes();
    for ( int i=0; i < byteArr.length; i++ ) {
        byte ch= byteArr[i]; 
        // remove any characters outside the valid UTF-8 range as well as all control characters
        // except tabs and new lines
        if ( !( (ch > 31 && ch < 253 ) || ch == '\t' || ch == '\n' || ch == '\r') ) {
            byteArr[i]=' ';
        }
    }
    return new String( byteArr );
}

Questions with java tag:

Under what circumstances can I call findViewById with an Options Menu / Action Bar item? How much should a function trust another function How to implement a simple scenario the OO way Two constructors How do I get some variable from another class in Java? this in equals method How to split a string in two and store it in a field How to do perspective fixing? String index out of range: 4 My eclipse won't open, i download the bundle pack it keeps saying error log getting " (1) no such column: _id10 " error Instantiating a generic type When to create variables (memory management) java doesn't run if structure inside of onclick listener String method cannot be found in a main class method Are all Spring Framework Java Configuration injection examples buggy? Calling another method java GUI I need to know how to get my program to output the word i typed in and also the new rearranged word using a 2D array Java and unlimited decimal places? Read input from a JOptionPane.showInputDialog box Cannot retrieve string(s) from preferences (settings) strange error in my Animation Drawable Two Page Login with Spring Security 3.2.x Hadoop MapReduce: Strange Result when Storing Previous Value in Memory in a Reduce Class (Java) Got a NumberFormatException while trying to parse a text file for objects Best way for storing Java application name and version properties Call japplet from jframe FragmentActivity to Fragment Comparing two joda DateTime instances Maven dependencies are failing with a 501 error IntelliJ: Error:java: error: release version 5 not supported Has been compiled by a more recent version of the Java Runtime (class file version 57.0) Why am I getting Unknown error in line 1 of pom.xml? Gradle: Could not determine java version from '11.0.2' Error: Java: invalid target release: 11 - IntelliJ IDEA Android Gradle 5.0 Update:Cause: org.jetbrains.plugins.gradle.tooling.util Why is 2 * (i * i) faster than 2 * i * i in Java? must declare a named package eclipse because this compilation unit is associated to the named module How do I install Java on Mac OSX allowing version switching? How to install JDK 11 under Ubuntu? Java 11 package javax.xml.bind does not exist IntelliJ can't recognize JavaFX 11 with OpenJDK 11 Difference between OpenJDK and Adoptium/AdoptOpenJDK OpenJDK8 for windows How to allow all Network connection types HTTP and HTTPS in Android (9) Pie? Find the smallest positive integer that does not occur in a given sequence Error: JavaFX runtime components are missing, and are required to run this application with JDK 11 How to uninstall Eclipse? Failed to resolve: com.google.firebase:firebase-core:16.0.1 How to resolve Unable to load authentication plugin 'caching_sha2_password' issue

Questions with string tag:

How to split a string in two and store it in a field String method cannot be found in a main class method Kotlin - How to correctly concatenate a String Replacing a character from a certain index Remove quotes from String in Python Detect whether a Python string is a number or a letter How does String substring work in Swift How does String.Index work in Swift swift 3.0 Data to String? How to parse JSON string in Typescript How to change the datetime format in pandas How to write to a CSV line by line? convert string to number node.js "error: assignment to expression with array type error" when I assign a struct field (C) Remove 'b' character do in front of a string literal in Python 3 Ruby: How to convert a string to boolean What does ${} (dollar sign and curly braces) mean in a string in Javascript? How do I make a new line in swift converting json to string in python PHP - remove all non-numeric characters from a string C# - How to convert string to char? How can I remove the last character of a string in python? Converting std::__cxx11::string to std::string How to convert string to date to string in Swift iOS? Convert time.Time to string TypeError: a bytes-like object is required, not 'str' when writing to a file in Python3 How can I capitalize the first letter of each word in a string using JavaScript? Best way to verify string is empty or null Hive cast string to date dd-MM-yyyy Check for special characters in string How to convert any Object to String? Print "\n" or newline characters as part of the output on terminal Set the maximum character length of a UITextField in Swift How do I convert a Python 3 byte-string variable into a regular string? What does $ mean before a string? Delete the last two characters of the String Splitting a string into separate variables Figure out size of UILabel based on String in Swift Matching strings with wildcard How do I concatenate strings? Print very long string completely in pandas dataframe Check string for nil & empty Convert float to string with precision & number of decimal digits specified? How do I print my Java object without getting "SomeType@2f92e0f4"? enum to string in modern C++11 / C++14 / C++17 and future C++20 How should I remove all the leading spaces from a string? - swift Convert array to JSON string in swift Swift extract regex matches Convert a file path to Uri in Android How would I get everything before a : in a string Python

Questions with encoding tag:

How to check encoding of a CSV file UnicodeEncodeError: 'ascii' codec can't encode character at special name Using Javascript's atob to decode base64 doesn't properly decode utf-8 strings What is the difference between utf8mb4 and utf8 charsets in MySQL? The character encoding of the plain text document was not declared - mootool script UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 23: ordinal not in range(128) How to encode text to base64 in python UTF-8 output from PowerShell Set Encoding of File to UTF8 With BOM in Sublime Text 3 Replace non-ASCII characters with a single space UTF-8 in Windows 7 CMD UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 7: ordinal not in range(128) Attempt to set a non-property-list object as an NSUserDefaults How to write UTF-8 in a CSV file Easy way to convert a unicode list to a list containing python strings? SyntaxError of Non-ASCII character Byte and char conversion in Java Url decode UTF-8 in Python Set encoding and fileencoding to utf-8 in Vim Why does the PHP json_encode function convert UTF-8 strings to hexadecimal entities? How do I see the current encoding of a file in Sublime Text? Converting string to byte array in C# python encoding utf-8 reading text file with utf-8 encoding using java UnicodeEncodeError: 'charmap' codec can't encode - character maps to <undefined>, print function NodeJS: How to decode base64 encoded string back to binary? Unicode via CSS :before How can I transform string to UTF-8 in C#? Convert UTF-8 to base64 string java.sql.SQLException: Incorrect string value: '\xF0\x9F\x91\xBD\xF0\x9F...' How do I POST form data with UTF-8 encoding by using curl? "TypeError: (Integer) is not JSON serializable" when serializing JSON in Python? Conversion from byte array to base64 and back How to convert a string with Unicode encoding to a string of letters Java URL encoding of query string parameters Usage of unicode() and encode() functions in Python Setting PHP default encoding to utf-8? ArrayBuffer to base64 encoded string How to support UTF-8 encoding in Eclipse Java String encoding (UTF-8) utf-8 special characters not displaying How can I send and receive WebSocket messages on the server side? fileReader.readAsBinaryString to upload files In OS X Lion, LANG is not set to UTF-8, how to fix it? How do I decode a base64 encoded string? Why does a base64 encoded string have an = sign at the end Let JSON object accept bytes or let urlopen output strings Does "\d" in regex mean a digit? Working with UTF-8 encoding in Python source Convert String (UTF-16) to UTF-8 in C#

Questions with character-encoding tag:

Changing PowerShell's default output encoding to UTF-8 JsonParseException : Illegal unquoted character ((CTRL-CHAR, code 10) Change the encoding of a file in Visual Studio Code What is the difference between utf8mb4 and utf8 charsets in MySQL? How to open html file? All inclusive Charset to avoid "java.nio.charset.MalformedInputException: Input length = 1"? UTF-8 output from PowerShell ERROR 1115 (42000): Unknown character set: 'utf8mb4' "for line in..." results in UnicodeDecodeError: 'utf-8' codec can't decode byte How to make php display \t \n as tab and new line instead of characters Is ASCII code 7-bit or 8-bit? How to make Unicode charset in cmd.exe by default? How can I transform string to UTF-8 in C#? Meaning of - <?xml version="1.0" encoding="utf-8"?> java.sql.SQLException: Incorrect string value: '\xF0\x9F\x91\xBD\xF0\x9F...' How can I determine the character encoding of an excel file? How to remove non UTF-8 characters from text file How to read text files with ANSI encoding and non-English letters? Force encode from US-ASCII to UTF-8 (iconv) PHP Curl UTF-8 Charset json_encode(): Invalid UTF-8 sequence in argument Setting PHP default encoding to utf-8? What does "Content-type: application/json; charset=utf-8" really mean? How to support UTF-8 encoding in Eclipse Error: unmappable character for encoding UTF8 during maven compilation PHP DOMDocument loadHTML not encoding UTF-8 correctly SQL Server - Convert varchar to another collation (code page) to fix character encoding PHP: Convert any string to UTF-8 without knowing the original character set, or at least try Strange Characters in database text: Ã, Ã, ¢, â‚ €, No line-break after a hyphen Best way to convert string to bytes in Python 3? PHP decoding and encoding json with unicode characters HMAC-SHA256 Algorithm for signature calculation What is the difference between UTF-8 and ISO-8859-1? Text file with 0D 0D 0A line breaks Python: Converting from ISO-8859-1/latin1 to UTF-8 FPDF utf-8 encoding (HOW-TO) Working with UTF-8 encoding in Python source How to set the "Content-Type ... charset" in the request header using a HTML link How to convert an entire MySQL database characterset and collation to UTF-8? Writing Unicode text to a text file? Spring MVC UTF-8 Encoding Converting byte array to String (Java) UTF-8 encoding problem in Spring MVC UTF-8 encoded html pages show ? (questions marks) instead of characters Convert byte[] to char[] How do I convert special UTF-8 chars to their iso-8859-1 equivalent using javascript? li:before{ content: "¦"; } How to Encode this Special Character as a Bullit in an Email Stationery? SQL Server default character encoding How to convert these strange characters? (ë, Ã, ì, ù, Ã)