How many bytes in a JavaScript string

Question

I have a javascript string which is about 500K when being sent from the server in UTF-8  How can I tell its size in JavaScript   I know that JavaScript uses UCS-2  so does that mean 2 bytes per character  However  does it depend on the JavaScript implementation  Or on the page encoding or maybe content-type

User · Answer

The size of a JavaScript string is  Pre-ES6   2 bytes per character ES6 and later  2 bytes per character  or 5 or more bytes per character   Pre-ES6 Always 2 bytes per character  UTF-16 is not allowed because the spec says  quot values must be 16-bit unsigned integers quot   Since UTF-16 strings can use 3 or 4 byte characters  it would violate 2 byte requirement  Crucially  while UTF-16 cannot be fully supported  the standard does require that the two byte characters used are valid UTF-16 characters  In other words  Pre-ES6 JavaScript strings support a subset of UTF-16 characters   ES6 and later 2 bytes per character  or 5 or more bytes per character   The additional sizes come into play because ES6  ECMAScript 6  adds support for Unicode code point escapes  Using a unicode escape looks like this   u 1D306  Practical notes  This doesn t relate to the internal implemention of a particular engine  For example  some engines use data structures and libraries with full UTF-16 support  but what they provide externally doesn t have to be full UTF-16 support  Also an engine may provide external UTF-16 support as well but is not mandated to do so   For ES6  practically speaking characters will never be more than 5 bytes long  2 bytes for the escape point   3 bytes for the Unicode code point  because the latest version of Unicode only has 136 755 possible characters  which fits easily into 3 bytes   However this is technically not limited by the standard so in principal a single character could use say  4 bytes for the code point and 6 bytes total   Most of the code examples here for calculating byte size don t seem to take into account ES6 Unicode code point escapes  so the results could be incorrect in some cases

User · Answer

Try this combination with using unescape js function  const byteAmount   unescape encodeURIComponent yourString   length  Full encode proccess example  const s     quot 1 a          quot      length is 11 const s2   encodeURIComponent s      length is 41 const s3   unescape s2      length is 15  1-1 a-1  -2  -3  -1   -2  const s4   escape s3      length is 39 const s5   decodeURIComponent s4      length is 11

User · Answer

If you re using node js  there is a simpler solution using buffers    function getBinarySize string        return Buffer byteLength string   utf8        There is a npm lib for that   https   www npmjs org package utf8-binary-cutter  from yours faithfully

User · Answer

This function will return the byte size of any UTF-8 string you pass to it   function byteCount s        return encodeURI s  split          length - 1      Source  JavaScript engines are free to use UCS-2 or UTF-16 internally  Most engines that I know of use UTF-16  but whatever choice they made  it   s just an implementation detail that won   t affect the language   s characteristics   The ECMAScript JavaScript language itself  however  exposes characters according to UCS-2  not UTF-16   Source

User · Answer

String values are not implementation dependent  according the ECMA-262 3rd Edition Specification  each character represents a single 16-bit unit of UTF-16 text   4 3 16 String Value A string value is a member of the type String and is a finite ordered sequence of zero or more 16-bit unsigned integer values  NOTE Although each value usually represents a single 16-bit unit of UTF-16 text  the language does not place any restrictions or requirements on the values except that they be 16-bit unsigned integers

User · Answer

UTF-8 encodes characters using 1 to 4 bytes per code point  As CMS pointed out in the accepted answer  JavaScript will store each character internally using 16 bits  2 bytes    If you parse each character in the string via a loop and count the number of bytes used per code point  and then multiply the total count by 2  you should have JavaScript s memory usage in bytes for that UTF-8 encoded string  Perhaps something like this         getStringMemorySize   function   string              use strict            var codePoint               accum   0                    for  var stringIndex   0  endOfString    string length  stringIndex  lt  endOfString  stringIndex                   codePoint    string charCodeAt  stringIndex                 if  codePoint  lt  0x100                     accum    1                  continue                             if  codePoint  lt  0x10000                     accum    2                  continue                             if  codePoint  lt  0x1000000                     accum    3                else                   accum    4                                   return accum   2          Examples   getStringMemorySize   I                2 getStringMemorySize                    4 getStringMemorySize                  8 getStringMemorySize   I             14

User · Answer

These are 3 ways I use   TextEncoder  new TextEncoder   encode  quot myString quot   length   Blob  new Blob   quot myString quot    size   Buffer  Buffer byteLength  quot myString quot    utf8

User · Answer

You can try this     var b   str match     x00- xff  g     return  str length     b   0  b length       It worked for me

User · Answer

The answer from Lauri Oherd works well for most strings seen in the wild  but will fail if the string contains lone characters in the surrogate pair range  0xD800 to 0xDFFF  E g   byteCount String fromCharCode 55555      URIError  URI malformed   This longer function should handle all strings   function bytes  str      var bytes 0  len str length  codePoint  next  i     for  i 0  i  lt  len  i          codePoint   str charCodeAt i           Lone surrogates cannot be passed to encodeURI     if  codePoint  gt   0xD800  amp  amp  codePoint  lt  0xE000          if  codePoint  lt  0xDC00  amp  amp  i   1  lt  len            next   str charCodeAt i   1            if  next  gt   0xDC00  amp  amp  next  lt  0xE000              bytes    4            i              continue                               bytes     codePoint  lt  0x80   1    codePoint  lt  0x800   2   3           return bytes      E g   bytes String fromCharCode 55555      3   It will correctly calculate the size for strings containing surrogate pairs   bytes String fromCharCode 55555  57000      4  not 6    The results can be compared with Node s built-in function Buffer byteLength   Buffer byteLength String fromCharCode 55555    utf8      3  Buffer byteLength String fromCharCode 55555  57000    utf8      4  not 6

User · Answer

I m working with an embedded version of the V8 Engine   I ve tested a single string  Pushing each step 1000 characters  UTF-8    First test with single byte  8bit  ANSI  Character  A   hex  41   Second test with two byte character  16bit   O   hex  CE A9  and the  third test with three byte character  24bit       hex  E2 98 BA     In all three cases the device prints out of memory at  888 000 characters and using ca  26 348 kb in RAM    Result  The characters are not dynamically stored  And not with only 16bit  - Ok  perhaps only for my case  Embedded 128 MB RAM Device  V8 Engine C   QT  - The character encoding has nothing to do with the size in ram of the javascript engine  E g  encodingURI  etc  is only useful for highlevel data transmission and storage   Embedded or not  fact is that the characters are not only stored in 16bit  Unfortunally I ve no 100  answer  what Javascript do at low level area  Btw  I ve tested the same  first test above  with an array of character  A   Pushed 1000 items every step   Exactly the same test  Just replaced string to array  And the system bringt out of memory  wanted  after 10 416 KB using and array length of 1 337 000   So  the javascript engine is not simple restricted  It s a kind more complex

User · Answer

You can use the Blob to get the string size in bytes   Examples    x000D   x000D  console info  x000D    new Blob       size                                 4 x000D    new Blob       size                                 4 x000D    new Blob       size                               8 x000D    new Blob       size                               8 x000D    new Blob   I  m a string    size                      12 x000D   x000D       from Premasagar correction of Lauri s answer for x000D       strings containing lone characters in the surrogate pair range  x000D       https   stackoverflow com a 39488643 6225838 x000D    new Blob  String fromCharCode 55555    size           3 x000D    new Blob  String fromCharCode 55555  57000    size    4  not 6  x000D     x000D   x000D   x000D

User · Answer

A single element in a JavaScript String is considered to be a single UTF-16 code unit  That is to say  Strings characters are stored in 16-bit  1 code unit   and 16-bit is equal to 2 bytes  8-bit   1 byte    The charCodeAt   method can be used to return an integer between 0 and 65535 representing the UTF-16 code unit at the given index   The codePointAt   can be used to return the entire code point value for Unicode characters  e g  UTF-32   When a UTF-16 character can t be represented in a single 16-bit code unit  it will have a surrogate pair and therefore use two code units  2 x 16-bit   4 bytes   See Unicode encodings for different encodings and their code ranges

User · Answer

Note that if you re targeting node js you can use Buffer from string  length     var str     u2620        gt      str length       gt  1  character  Buffer from str  length      gt  3  bytes

[javascript] How many bytes in a JavaScript string?

The size of a JavaScript string is

Examples related to javascript

Examples related to string

Examples related to size

Examples related to byte