String length in bytes in JavaScript

Question

In my JavaScript code I need to compose a message to server in this format    lt size in bytes gt CRLF  lt data gt CRLF   Example   3 foo   The data may contain unicode characters  I need to send them as UTF-8   I m looking for the most cross-browser way to calculate the length of the string in bytes in JavaScript   I ve tried this to compose my payload   return unescape encodeURIComponent str   length     n    str     n    But it does not give me accurate results for the older browsers  or  maybe the strings in those browsers in UTF-16     Any clues   Update   Example  length in bytes of the string      Na  ve  in UTF-8 is 15 bytes  but some browsers report 23 bytes instead

User · Answer

For simple UTF-8 encoding  with slightly better compatibility than TextEncoder  Blob does the trick  Won t work in very old browsers though   new Blob       size     - gt  4

User · Answer

Years passed and nowadays you can do it natively  new TextEncoder   encode  foo    length  Note that it s not supported by IE  you may use a polyfill for that   MDN documentation Standard specifications

User · Answer

Here is a much faster version  which doesn t use regular expressions  nor encodeURIComponent     function byteLength str         returns the byte length of an utf8 string   var s   str length    for  var i str length-1  i gt  0  i--        var code   str charCodeAt i       if  code  gt  0x7f  amp  amp  code  lt   0x7ff  s        else if  code  gt  0x7ff  amp  amp  code  lt   0xffff  s  2      if  code  gt   0xDC00  amp  amp  code  lt   0xDFFF  i--    trail surrogate       return s      Here is a performance comparison   It just computes the length in UTF8 of each unicode codepoints returned by charCodeAt    based on wikipedia s descriptions of UTF8  and UTF16 surrogate characters    It follows RFC3629  where UTF-8 characters are at most 4-bytes long

User · Answer

This function will return the byte size of any UTF-8 string you pass to it   function byteCount s        return encodeURI s  split          length - 1      Source

User · Answer

This would work for BMP and SIP SMP characters        String prototype lengthInUtf8   function             var asciiLength   this match    u0000- u007f  g    this match    u0000- u007f  g  length   0          var multiByteLength   encodeURI this replace    u0000- u007f  g   match    g    encodeURI this replace    u0000- u007f  g       match    g  length   0          return asciiLength   multiByteLength              test  lengthInUtf8           returns 4       u 2f894   lengthInUtf8           returns 4                  lengthInUtf8           returns 19  each Arabic Persian alphabet character takes 2 bytes           JavaScript     lengthInUtf8           returns 26  each Chinese character punctuation takes 3 bytes

User · Answer

Took me a while to find a solution for React Native so I ll put it here   First install the buffer package   npm install --save buffer   Then user the node method   const   Buffer     require  buffer    const length   Buffer byteLength string   utf-8

User · Answer

You can try this   function getLengthInBytes str      var b   str match     x00- xff  g     return  str length     b   0  b length         It works for me

User · Answer

Actually  I figured out what s wrong  For the code to work the page  lt head gt  should have this tag    lt meta http-equiv  Content-Type  content  text html  charset utf-8    gt    Or  as suggested in comments  if server sends HTTP Content-Encoding header  it should work as well   Then results from different browsers are consistent   Here is an example     lt html gt   lt head gt     lt meta http-equiv  Content-Type  content  text html  charset utf-8    gt      lt title gt mini string length test lt  title gt   lt  head gt   lt body gt    lt script type  text javascript  gt  document write   lt div style  font-size 100px  gt           unescape encodeURIComponent       Na  ve     length      lt  div gt         lt  script gt   lt  body gt   lt  html gt    Note  I suspect that specifying any  accurate  encoding would fix the encoding problem  It is just a coincidence that I need UTF-8

User · Answer

Another very simple approach using Buffer  only for NodeJS    Buffer byteLength string   utf8    Buffer from string  length

User · Answer

There is no way to do it in JavaScript natively   See Riccardo Galli s answer for a modern approach      For historical reference or where TextEncoder APIs are still unavailable   If you know the character encoding  you can calculate it yourself though   encodeURIComponent assumes UTF-8 as the character encoding  so if you need that encoding  you can do   function lengthInUtf8Bytes str         Matches only the 10   bytes that are non-initial characters in a multi-byte sequence    var m   encodeURIComponent str  match    89ABab  g     return str length    m   m length   0       This should work because of the way UTF-8 encodes multi-byte sequences   The first encoded byte always starts with either a high bit of zero for a single byte sequence  or a byte whose first hex digit is C  D  E  or F   The second and subsequent bytes are the ones whose first two bits are 10   Those are the extra bytes you want to count in UTF-8   The table in wikipedia makes it clearer  Bits        Last code point Byte 1          Byte 2          Byte 3   7         U 007F          0xxxxxxx  11         U 07FF          110xxxxx        10xxxxxx  16         U FFFF          1110xxxx        10xxxxxx        10xxxxxx       If instead you need to understand the page encoding  you can use this trick   function lengthInPageEncoding s      var a   document createElement  A      a href         s    var sEncoded   a href    sEncoded   sEncoded substring sEncoded indexOf        1     var m   sEncoded match    0-9a-f  2  g     return sEncoded length -  m   m length   2   0

User · Answer

Here is an independent and efficient method to count UTF-8 bytes of a string    x000D   x000D    count UTF-8 bytes of a string x000D  function byteLengthOf s   x000D     assuming the String is UCS-2 aka UTF-16  encoded x000D   var n 0  x000D   for var i 0 l s length  i lt l  i     x000D    var hi s charCodeAt i   x000D    if hi lt 0x0080      0x0000  0x007F  x000D     n  1  x000D     else if hi lt 0x0800      0x0080  0x07FF  x000D     n  2  x000D     else if hi lt 0xD800      0x0800  0xD7FF  x000D     n  3  x000D     else if hi lt 0xDC00      0xD800  0xDBFF  x000D     var lo s charCodeAt   i   x000D     if i lt l amp  amp lo gt  0xDC00 amp  amp lo lt  0xDFFF     followed by  0xDC00  0xDFFF  x000D      n  4  x000D      else  x000D      throw new Error  UCS-2 String malformed    x000D       x000D     else if hi lt 0xE000      0xDC00  0xDFFF  x000D     throw new Error  UCS-2 String malformed    x000D     else     0xE000  0xFFFF  x000D     n  3  x000D      x000D     x000D   return n  x000D    x000D   x000D  var s   u0000 u007F u07FF uD7FF uDBFF uDFFF uFFFF   x000D  console log  expect byteLengthOf s  to be 14  actually it is  s   byteLengthOf s    x000D   x000D   x000D    Note that the method may throw error if an input string is UCS-2 malformed

User · Answer

In NodeJS  Buffer byteLength is a method specifically for this purpose   let strLengthInBytes   Buffer byteLength str      str is UTF-8   Note that by default the method assumes the string is in UTF-8 encoding  If a different encoding is required  pass it as the second argument

User · Answer

I compared some of the methods suggested here in Firefox for speed  The string I used contained the following characters                   p               O    v       All results are averages of 3 runs each  Times are in milliseconds  Note that all URIEncoding methods behaved similarly and had extreme results  so I only included one  While there are some fluctuations based on the size of the string  the charCode methods  lovasoa and fuweichin  both perform similarly and the fastest overall  with fuweichin s charCode method the fastest   The Blob and TextEncoder methods performed similarly to each other   Generally the charCode methods were about 75  faster than the Blob and TextEncoder methods   The URIEncoding method was basically unacceptable  Here are the results I got  Size 6 4   10 6 bytes  Lauri Oherd     URIEncoding      6400000    et  796 lovasoa     charCode             6400000    et  15 fuweichin     charCode2          6400000    et  16 simap     Blob                   6400000    et  26 Riccardo Galli     TextEncoder   6400000    et  23  Size 19 2   10 6 bytes  Blob does kind of a weird thing here  Lauri Oherd     URIEncoding      19200000    et  2322 lovasoa     charCode             19200000    et  42 fuweichin     charCode2          19200000    et  45 simap     Blob                   19200000    et  169 Riccardo Galli     TextEncoder   19200000    et  70  Size 64   10 6 bytes  Lauri Oherd     URIEncoding      64000000    et  12565 lovasoa     charCode             64000000    et  138 fuweichin     charCode2          64000000    et  133 simap     Blob                   64000000    et  231 Riccardo Galli     TextEncoder   64000000    et  211  Size 192   10 6 bytes  URIEncoding methods freezes browser at this point  lovasoa     charCode             192000000    et  754 fuweichin     charCode2          192000000    et  480 simap     Blob                   192000000    et  701 Riccardo Galli     TextEncoder   192000000    et  654  Size 640   10 6 bytes  lovasoa     charCode             640000000    et  2417 fuweichin     charCode2          640000000    et  1602 simap     Blob                   640000000    et  2492 Riccardo Galli     TextEncoder   640000000    et  2338  Size 1280   10 6 bytes  Blob  amp  TextEncoder methods are starting to hit the wall here  lovasoa     charCode             1280000000    et  4780 fuweichin     charCode2          1280000000    et  3177 simap     Blob                   1280000000    et  6588 Riccardo Galli     TextEncoder   1280000000    et  5074  Size 1920   10 6 bytes  lovasoa     charCode             1920000000    et  7465 fuweichin     charCode2          1920000000    et  4968 JavaScript error  file    Users xxx Desktop test html  line 74  NS ERROR OUT OF MEMORY   Here is the code  function byteLengthURIEncoding str      return encodeURI str  split          length - 1     function byteLengthCharCode str         returns the byte length of an utf8 string   var s   str length    for  var i str length-1  i gt  0  i--        var code   str charCodeAt i       if  code  gt  0x7f  amp  amp  code  lt   0x7ff  s        else if  code  gt  0x7ff  amp  amp  code  lt   0xffff  s  2      if  code  gt   0xDC00  amp  amp  code  lt   0xDFFF  i--    trail surrogate       return s     function byteLengthCharCode2 s       assuming the String is UCS-2 aka UTF-16  encoded   var n 0    for var i 0 l s length  i lt l  i         var hi s charCodeAt i       if hi lt 0x0080      0x0000  0x007F        n  1       else if hi lt 0x0800      0x0080  0x07FF        n  2       else if hi lt 0xD800      0x0800  0xD7FF        n  3       else if hi lt 0xDC00      0xD800  0xDBFF        var lo s charCodeAt   i         if i lt l amp  amp lo gt  0xDC00 amp  amp lo lt  0xDFFF     followed by  0xDC00  0xDFFF          n  4         else          throw new Error  quot UCS-2 String malformed quot                 else if hi lt 0xE000      0xDC00  0xDFFF        throw new Error  quot UCS-2 String malformed quot         else     0xE000  0xFFFF        n  3              return n     function byteLengthBlob str      return new Blob  str   size     function byteLengthTE str      return  new TextEncoder   encode str   length     var sample    quot                  p               O    v      i quot   var string    quot  quot       Adjust multiplier to change length of string  let mult   1000000   for  var i   0  i  lt  mult  i        string    sample     let t0   try     t0   Date now      console log  quot Lauri Oherd     URIEncoding     quot    byteLengthURIEncoding string     quot     et   quot     Date now   - t0      catch e      t0   Date now    console log  quot lovasoa     charCode              quot    byteLengthCharCode string     quot     et   quot     Date now   - t0     t0   Date now    console log  quot fuweichin     charCode2           quot    byteLengthCharCode2 string     quot     et   quot     Date now   - t0     t0   Date now    console log  quot simap     Blob                    quot    byteLengthBlob string     quot     et   quot     Date now   - t0     t0   Date now    console log  quot Riccardo Galli     TextEncoder    quot    byteLengthTE string     quot     et   quot     Date now   - t0

[javascript] String length in bytes in JavaScript

Examples related to javascript

Examples related to unicode