[javascript] How many bytes in a JavaScript string?

I have a javascript string which is about 500K when being sent from the server in UTF-8. How can I tell its size in JavaScript?

I know that JavaScript uses UCS-2, so does that mean 2 bytes per character. However, does it depend on the JavaScript implementation? Or on the page encoding or maybe content-type?

This question is related to javascript string size byte

The answer is


Note that if you're targeting node.js you can use Buffer.from(string).length:

var str = "\u2620"; // => "?"
str.length; // => 1 (character)
Buffer.from(str).length // => 3 (bytes)

A single element in a JavaScript String is considered to be a single UTF-16 code unit. That is to say, Strings characters are stored in 16-bit (1 code unit), and 16-bit is equal to 2 bytes (8-bit = 1 byte).

The charCodeAt() method can be used to return an integer between 0 and 65535 representing the UTF-16 code unit at the given index.

The codePointAt() can be used to return the entire code point value for Unicode characters, e.g. UTF-32.

When a UTF-16 character can't be represented in a single 16-bit code unit, it will have a surrogate pair and therefore use two code units( 2 x 16-bit = 4 bytes)

See Unicode encodings for different encodings and their code ranges.


I'm working with an embedded version of the V8 Engine. I've tested a single string. Pushing each step 1000 characters. UTF-8.

First test with single byte (8bit, ANSI) Character "A" (hex: 41). Second test with two byte character (16bit) "O" (hex: CE A9) and the third test with three byte character (24bit) "?" (hex: E2 98 BA).

In all three cases the device prints out of memory at 888 000 characters and using ca. 26 348 kb in RAM.

Result: The characters are not dynamically stored. And not with only 16bit. - Ok, perhaps only for my case (Embedded 128 MB RAM Device, V8 Engine C++/QT) - The character encoding has nothing to do with the size in ram of the javascript engine. E.g. encodingURI, etc. is only useful for highlevel data transmission and storage.

Embedded or not, fact is that the characters are not only stored in 16bit. Unfortunally I've no 100% answer, what Javascript do at low level area. Btw. I've tested the same (first test above) with an array of character "A". Pushed 1000 items every step. (Exactly the same test. Just replaced string to array) And the system bringt out of memory (wanted) after 10 416 KB using and array length of 1 337 000. So, the javascript engine is not simple restricted. It's a kind more complex.


Try this combination with using unescape js function:

const byteAmount = unescape(encodeURIComponent(yourString)).length

Full encode proccess example:

const s  = "1 a ? ? @ ®"; // length is 11
const s2 = encodeURIComponent(s); // length is 41
const s3 = unescape(s2); // length is 15 [1-1,a-1,?-2,?-3,@-1,®-2]
const s4 = escape(s3); // length is 39
const s5 = decodeURIComponent(s4); // length is 11

The answer from Lauri Oherd works well for most strings seen in the wild, but will fail if the string contains lone characters in the surrogate pair range, 0xD800 to 0xDFFF. E.g.

byteCount(String.fromCharCode(55555))
// URIError: URI malformed

This longer function should handle all strings:

function bytes (str) {
  var bytes=0, len=str.length, codePoint, next, i;

  for (i=0; i < len; i++) {
    codePoint = str.charCodeAt(i);

    // Lone surrogates cannot be passed to encodeURI
    if (codePoint >= 0xD800 && codePoint < 0xE000) {
      if (codePoint < 0xDC00 && i + 1 < len) {
        next = str.charCodeAt(i + 1);

        if (next >= 0xDC00 && next < 0xE000) {
          bytes += 4;
          i++;
          continue;
        }
      }
    }

    bytes += (codePoint < 0x80 ? 1 : (codePoint < 0x800 ? 2 : 3));
  }

  return bytes;
}

E.g.

bytes(String.fromCharCode(55555))
// 3

It will correctly calculate the size for strings containing surrogate pairs:

bytes(String.fromCharCode(55555, 57000))
// 4 (not 6)

The results can be compared with Node's built-in function Buffer.byteLength:

Buffer.byteLength(String.fromCharCode(55555), 'utf8')
// 3

Buffer.byteLength(String.fromCharCode(55555, 57000), 'utf8')
// 4 (not 6)

If you're using node.js, there is a simpler solution using buffers :

function getBinarySize(string) {
    return Buffer.byteLength(string, 'utf8');
}

There is a npm lib for that : https://www.npmjs.org/package/utf8-binary-cutter (from yours faithfully)


These are 3 ways I use:

  1. TextEncoder
new TextEncoder().encode("myString").length
  1. Blob
new Blob(["myString"]).size
  1. Buffer
Buffer.byteLength("myString", 'utf8')

You can try this:

  var b = str.match(/[^\x00-\xff]/g);
  return (str.length + (!b ? 0: b.length)); 

It worked for me.


This function will return the byte size of any UTF-8 string you pass to it.

function byteCount(s) {
    return encodeURI(s).split(/%..|./).length - 1;
}

Source

JavaScript engines are free to use UCS-2 or UTF-16 internally. Most engines that I know of use UTF-16, but whatever choice they made, it’s just an implementation detail that won’t affect the language’s characteristics.

The ECMAScript/JavaScript language itself, however, exposes characters according to UCS-2, not UTF-16.

Source


UTF-8 encodes characters using 1 to 4 bytes per code point. As CMS pointed out in the accepted answer, JavaScript will store each character internally using 16 bits (2 bytes).

If you parse each character in the string via a loop and count the number of bytes used per code point, and then multiply the total count by 2, you should have JavaScript's memory usage in bytes for that UTF-8 encoded string. Perhaps something like this:

      getStringMemorySize = function( _string ) {
        "use strict";

        var codePoint
            , accum = 0
        ;

        for( var stringIndex = 0, endOfString = _string.length; stringIndex < endOfString; stringIndex++ ) {
            codePoint = _string.charCodeAt( stringIndex );

            if( codePoint < 0x100 ) {
                accum += 1;
                continue;
            }

            if( codePoint < 0x10000 ) {
                accum += 2;
                continue;
            }

            if( codePoint < 0x1000000 ) {
                accum += 3;
            } else {
                accum += 4;
            }
        }

        return accum * 2;
    }

Examples:

getStringMemorySize( 'I'    );     //  2
getStringMemorySize( '?'    );     //  4
getStringMemorySize( ''   );     //  8
getStringMemorySize( 'I?' );     // 14

The size of a JavaScript string is

  • Pre-ES6: 2 bytes per character
  • ES6 and later: 2 bytes per character, or 5 or more bytes per character

Pre-ES6
Always 2 bytes per character. UTF-16 is not allowed because the spec says "values must be 16-bit unsigned integers". Since UTF-16 strings can use 3 or 4 byte characters, it would violate 2 byte requirement. Crucially, while UTF-16 cannot be fully supported, the standard does require that the two byte characters used are valid UTF-16 characters. In other words, Pre-ES6 JavaScript strings support a subset of UTF-16 characters.

ES6 and later
2 bytes per character, or 5 or more bytes per character. The additional sizes come into play because ES6 (ECMAScript 6) adds support for Unicode code point escapes. Using a unicode escape looks like this: \u{1D306}

Practical notes

  • This doesn't relate to the internal implemention of a particular engine. For example, some engines use data structures and libraries with full UTF-16 support, but what they provide externally doesn't have to be full UTF-16 support. Also an engine may provide external UTF-16 support as well but is not mandated to do so.

  • For ES6, practically speaking characters will never be more than 5 bytes long (2 bytes for the escape point + 3 bytes for the Unicode code point) because the latest version of Unicode only has 136,755 possible characters, which fits easily into 3 bytes. However this is technically not limited by the standard so in principal a single character could use say, 4 bytes for the code point and 6 bytes total.

  • Most of the code examples here for calculating byte size don't seem to take into account ES6 Unicode code point escapes, so the results could be incorrect in some cases.


You can use the Blob to get the string size in bytes.

Examples:

_x000D_
_x000D_
console.info(_x000D_
  new Blob(['']).size,                             // 4_x000D_
  new Blob(['']).size,                             // 4_x000D_
  new Blob(['']).size,                           // 8_x000D_
  new Blob(['']).size,                           // 8_x000D_
  new Blob(['I\'m a string']).size,                  // 12_x000D_
_x000D_
  // from Premasagar correction of Lauri's answer for_x000D_
  // strings containing lone characters in the surrogate pair range:_x000D_
  // https://stackoverflow.com/a/39488643/6225838_x000D_
  new Blob([String.fromCharCode(55555)]).size,       // 3_x000D_
  new Blob([String.fromCharCode(55555, 57000)]).size // 4 (not 6)_x000D_
);
_x000D_
_x000D_
_x000D_


Examples related to javascript

need to add a class to an element How to make a variable accessible outside a function? Hide Signs that Meteor.js was Used How to create a showdown.js markdown extension Please help me convert this script to a simple image slider Highlight Anchor Links when user manually scrolls? Summing radio input values How to execute an action before close metro app WinJS javascript, for loop defines a dynamic variable name Getting all files in directory with ajax

Examples related to string

How to split a string in two and store it in a field String method cannot be found in a main class method Kotlin - How to correctly concatenate a String Replacing a character from a certain index Remove quotes from String in Python Detect whether a Python string is a number or a letter How does String substring work in Swift How does String.Index work in Swift swift 3.0 Data to String? How to parse JSON string in Typescript

Examples related to size

PySpark 2.0 The size or shape of a DataFrame How to set label size in Bootstrap How to create a DataFrame of random integers with Pandas? How to split large text file in windows? How can I get the size of an std::vector as an int? How to load specific image from assets with Swift How to find integer array size in java Fit website background image to screen size How to set text size in a button in html How to change font size in html?

Examples related to byte

Convert bytes to int? TypeError: a bytes-like object is required, not 'str' when writing to a file in Python3 How many bits is a "word"? How many characters can you store with 1 byte? Save byte array to file Convert dictionary to bytes and back again python? How do I get total physical memory size using PowerShell without WMI? Hashing with SHA1 Algorithm in C# How to create a byte array in C++? Converting string to byte array in C#