Bytes of a string in Java

Question

In Java  if I have a String x  how can I calculate the number of bytes in that string

User · Answer

If you re running with 64-bit references   sizeof string     8      object header used by the VM 8      64-bit reference to char array  value  8   string length     2      character array itself  object header   16-bit chars  4      offset integer 4      count integer 4      cached hash code   In other words   sizeof string    36   string length     2   On a 32-bit VM or a 64-bit VM with compressed OOPs  -XX  UseCompressedOops   the references are 4 bytes  So the total would be   sizeof string    32   string length     2   This does not take into account the references to the string object

User · Answer

A String instance allocates a certain amount of bytes in memory  Maybe you re looking at something like sizeof  Hello World   which would return the number of bytes allocated by the datastructure itself   In Java  there s usually no need for a sizeof function  because we never allocate memory to store a data structure  We can have a look at the String java file for a rough estimation  and we see some  int   some references and a char    The Java language specification defines  that a char ranges from 0 to 65535  so two bytes are sufficient to keep a single char in memory  But a JVM does not have to store one char in 2 bytes  it only has to guarantee  that the implementation of char can hold values of the defines range   So sizeof really does not make any sense in Java  But  assuming that we have a large String and one char allocates two bytes  then the memory footprint of a String object is at least 2   str length   in bytes

User · Answer

A string is a list of characters  i e  code points    The number of bytes taken to represent the string depends entirely on which encoding you use to turn it into bytes   That said  you can turn the string into a byte array and then look at its size as follows      The input string for this test final String string    Hello World       Check length  in characters System out println string length        prints  11      Check encoded sizes final byte   utf8Bytes   string getBytes  UTF-8    System out println utf8Bytes length      prints  11   final byte   utf16Bytes  string getBytes  UTF-16    System out println utf16Bytes length      prints  24   final byte   utf32Bytes   string getBytes  UTF-32    System out println utf32Bytes length      prints  44   final byte   isoBytes   string getBytes  ISO-8859-1    System out println isoBytes length      prints  11   final byte   winBytes   string getBytes  CP1252    System out println winBytes length      prints  11    So you see  even a simple  ASCII  string can have different number of bytes in its representation  depending which encoding is used   Use whichever character set you re interested in for your case  as the argument to getBytes     And don t fall into the trap of assuming that UTF-8 represents every character as a single byte  as that s not true either   final String interesting     uF93D uF936 uF949 uF942      Chinese ideograms     Check length  in characters System out println interesting length        prints  4      Check encoded sizes final byte   utf8Bytes   interesting getBytes  UTF-8    System out println utf8Bytes length      prints  12   final byte   utf16Bytes  interesting getBytes  UTF-16    System out println utf16Bytes length      prints  10   final byte   utf32Bytes   interesting getBytes  UTF-32    System out println utf32Bytes length      prints  16   final byte   isoBytes   interesting getBytes  ISO-8859-1    System out println isoBytes length      prints  4   probably encoded          final byte   winBytes   interesting getBytes  CP1252    System out println winBytes length      prints  4   probably encoded            Note that if you don t provide a character set argument  the platform s default character set is used   This might be useful in some contexts  but in general you should avoid depending on defaults  and always use an explicit character set when encoding decoding is required

User · Answer

There s a method called getBytes    Use it wisely

User · Answer

To avoid try catch  use   String s    some text here   byte   b   s getBytes StandardCharsets UTF 8   System out println b length

User · Answer

Try this using apache commons  String src    quot Hello quot     This will work with any serialisable object System out println               quot Object Size  quot    SerializationUtils serialize  Serializable  src  length

User · Answer

The pedantic answer  though not necessarily the most useful one  depending on what you want to do with the result  is   string length     2   Java strings are physically stored in UTF-16BE encoding  which uses 2 bytes per code unit  and String length   measures the length in UTF-16 code units  so this is equivalent to   final byte   utf16Bytes  string getBytes  UTF-16BE    System out println utf16Bytes length     And this will tell you the size of the internal char array  in bytes   Note   UTF-16  will give a different result from  UTF-16BE  as the former encoding will insert a BOM  adding 2 bytes to the length of the array

User · Answer

Try this    Bytes toBytes x  length   Assuming you declared and initialized x before

User · Answer

According to How to convert Strings to and from UTF8 byte arrays in Java   String s    some text here   byte   b   s getBytes  UTF-8    System out println b length

[java] Bytes of a string in Java

The answer is

Examples related to java

Examples related to string

Tags