How to convert a String to Bytearray

Question

How can I convert a string in bytearray using JavaScript  Output should be equivalent of the below C  code   UnicodeEncoding encoding   new UnicodeEncoding    byte   bytes   encoding GetBytes AnyString     As UnicodeEncoding is by default of UTF-16 with Little-Endianness   Edit  I have a requirement to match the bytearray generated client side with the one generated at server side using the above C  code

User · Accepted Answer

In C  running this  UnicodeEncoding encoding   new UnicodeEncoding    byte   bytes   encoding GetBytes  Hello      Will create an array with  72 0 101 0 108 0 108 0 111 0     For a character which the code is greater than 255 it will look like this    If you want a very similar behavior in JavaScript you can do this  v2 is a bit more robust solution  while the original version will only work for 0x00   0xff    x000D   x000D  var str    Hello    x000D  var bytes          char codes x000D  var bytesv2          char codes x000D   x000D  for  var i   0  i  lt  str length    i    x000D    var code   str charCodeAt i   x000D     x000D    bytes   bytes concat  code    x000D     x000D    bytesv2   bytesv2 concat  code  amp  0xff  code   256  gt  gt  gt  0    x000D    x000D   x000D     72  101  108  108  111  31452 x000D  console log  bytes   bytes join         x000D   x000D     72  0  101  0  108  0  108  0  111  0  220  122 x000D  console log  bytesv2   bytesv2 join         x000D   x000D   x000D

User · Answer

Inspired by  hgoebl s answer  His code is for UTF-16 and I needed something for US-ASCII  So here s a more complete answer covering US-ASCII  UTF-16  and UTF-32       returns  Array  bytes of US-ASCII   function stringToAsciiByteArray str        var bytes          for  var i   0  i  lt  str length    i              var charCode   str charCodeAt i         if  charCode  gt  0xFF      char  gt  1 byte since charCodeAt returns the UTF-16 value                   throw new Error  Character     String fromCharCode charCode      can  t be represented by a US-ASCII byte                    bytes push charCode            return bytes        returns  Array  bytes of UTF-16 Big Endian without BOM   function stringToUtf16ByteArray str        var bytes             currently the function returns without BOM  Uncomment the next line to change that        bytes push 254  255      Big Endian Byte Order Marks    for  var i   0  i  lt  str length    i              var charCode   str charCodeAt i            char  gt  2 bytes is impossible since charCodeAt can only return 2 bytes        bytes push  charCode  amp  0xFF00   gt  gt  gt  8      high byte  might be 0         bytes push charCode  amp  0xFF      low byte          return bytes        returns  Array  bytes of UTF-32 Big Endian without BOM   function stringToUtf32ByteArray str        var bytes             currently the function returns without BOM  Uncomment the next line to change that        bytes push 0  0  254  255      Big Endian Byte Order Marks    for  var i   0  i  lt  str length  i  2              var charPoint   str codePointAt i            char  gt  4 bytes is impossible since codePointAt can only return 4 bytes        bytes push  charPoint  amp  0xFF000000   gt  gt  gt  24          bytes push  charPoint  amp  0xFF0000   gt  gt  gt  16          bytes push  charPoint  amp  0xFF00   gt  gt  gt  8          bytes push charPoint  amp  0xFF            return bytes      UTF-8 is variable length and isn t included because I would have to write the encoding myself  UTF-8 and UTF-16 are variable length  UTF-8  UTF-16  and UTF-32 have a minimum number of bits as their name indicates  If a UTF-32 character has a code point of 65 then that means there are 3 leading 0s  But the same code for UTF-16 has only 1 leading 0  US-ASCII on the other hand is fixed width 8-bits which means it can be directly translated to bytes   String prototype charCodeAt returns a maximum number of 2 bytes and matches UTF-16 exactly  However for UTF-32 String prototype codePointAt is needed which is part of the ECMAScript 6  Harmony  proposal  Because charCodeAt returns 2 bytes which is more possible characters than US-ASCII can represent  the function stringToAsciiByteArray will throw in such cases instead of splitting the character in half and taking either or both bytes   Note that this answer is non-trivial because character encoding is non-trivial  What kind of byte array you want depends on what character encoding you want those bytes to represent   javascript has the option of internally using either UTF-16 or UCS-2 but since it has methods that act like it is UTF-16 I don t see why any browser would use UCS-2  Also see  https   mathiasbynens be notes javascript-encoding  Yes I know the question is 4 years old but I needed this answer for myself

User · Answer

Here is the same function that  BrunoLM posted converted to a String prototype function   String prototype getBytes   function        var bytes         for  var i   0  i  lt  this length    i        bytes push this charCodeAt i          return bytes       If you define the function as such  then you can call the  getBytes   method on any string   var str    Hello World    var bytes   str getBytes

User · Answer

The best solution I ve come up with at on the spot  though most likely crude  would be   String prototype getBytes   function         var bytes           for  var i   0  i  lt  this length  i              var charCode   this charCodeAt i           var cLen   Math ceil Math log charCode  Math log 256            for  var j   0  j  lt  cLen  j                  bytes push  charCode  lt  lt   j 8    amp  0xFF                       return bytes      Though I notice this question has been here for over a year

User · Answer

You don t need underscore  just use built-in map    x000D   x000D  var string    Hello World    x000D   x000D  document write string split     map function c    return c charCodeAt         x000D   x000D   x000D

User · Answer

The easiest way in 2018 should be TextEncoder but the returned element is not byte array  it is Uint8Array   And not all browsers support it   let utf8Encode   new TextEncoder    utf8Encode encode  eee    gt  Uint8Array   101  101  101

User · Answer

String prototype encodeHex   function          return this split     map e   gt  e charCodeAt        String prototype decodeHex   function              return this map e   gt  String fromCharCode e   join

User · Answer

If you are looking for a solution that works in node js  you can use this   var myBuffer       var str    Stack Overflow   var buffer   new Buffer str   utf16le    for  var i   0  i  lt  buffer length  i          myBuffer push buffer i       console log myBuffer

User · Answer

I suppose C  and Java produce equal byte arrays  If you have non-ASCII characters  it s not enough to add an additional 0  My example contains a few special characters   var str    Hell        O    var bytes       var charCode   for  var i   0  i  lt  str length    i        charCode   str charCodeAt i       bytes push  charCode  amp  0xFF00   gt  gt  8       bytes push charCode  amp  0xFF      alert bytes join           0 72 0 101 0 108 0 108 0 32 0 246 0 32 32 172 0 32 3 169 0 32 216 52 221 30   I don t know if C  places BOM  Byte Order Marks   but if using UTF-16  Java String getBytes adds following bytes  254 255   String s    Hell        O       now add a character outside the BMP  Basic Multilingual Plane     we take the violin-symbol  U 1D11E  MUSICAL SYMBOL G CLEF s    new String Character toChars 0x1D11E       surrogate codepoints are  d834  dd1e  so one could also write   ud834 udd1e   byte   bytes   s getBytes  UTF-16    for  byte aByte   bytes        System out print  0xFF  amp  aByte               254 255 0 72 0 101 0 108 0 108 0 32 0 246 0 32 32 172 0 32 3 169 0 32 216 52 221 30   Edit   Added a special character  U 1D11E  MUSICAL SYMBOL G CLEF  outside BPM  so taking not only 2 bytes in UTF-16  but 4   Current JavaScript versions use  UCS-2  internally  so this symbol takes the space of 2 normal characters   I m not sure but when using charCodeAt it seems we get exactly the surrogate codepoints also used in UTF-16  so non-BPM characters are handled correctly   This problem is absolutely non-trivial  It might depend on the used JavaScript versions and engines  So if you want reliable solutions  you should have a look at    https   github com koichik node-codepoint  http   mathiasbynens be notes javascript-escapes Mozilla Developer Network  charCodeAt BigEndian vs  LittleEndian

User · Answer

I know the question is almost 4 years old  but this is what worked smoothly with me    x000D   x000D  String prototype encodeHex   function      x000D    var bytes       x000D    for  var i   0  i  lt  this length    i    x000D      bytes push this charCodeAt i    x000D      x000D    return bytes  x000D     x000D   x000D  Array prototype decodeHex   function          x000D    var str       x000D    var hex   this toString   split       x000D    for  var i   0  i  lt  hex length  i      x000D      str push String fromCharCode hex i     x000D      x000D    return str toString   replace    g       x000D     x000D   x000D  var str    Hello World    x000D  var bytes   str encodeHex    x000D   x000D  alert  The Hexa Code is    bytes   The original string is     bytes decodeHex     x000D   x000D   x000D    or  if you want to work with strings only  and no Array  you can use    x000D   x000D  String prototype encodeHex   function      x000D    var bytes       x000D    for  var i   0  i  lt  this length    i    x000D      bytes push this charCodeAt i    x000D      x000D    return bytes toString    x000D     x000D   x000D  String prototype decodeHex   function          x000D    var str       x000D    var hex   this split       x000D    for  var i   0  i  lt  hex length  i      x000D      str push String fromCharCode hex i     x000D      x000D    return str toString   replace    g       x000D     x000D   x000D  var str    Hello World    x000D  var bytes   str encodeHex    x000D   x000D  alert  The Hexa Code is    bytes   The original string is     bytes decodeHex     x000D   x000D   x000D

User · Answer

UTF-16 Byte Array  JavaScript encodes strings as UTF-16  just like C  s UnicodeEncoding  so the byte arrays should match exactly using charCodeAt    and splitting each returned byte pair into 2 separate bytes  as in   function strToUtf16Bytes str      const bytes         for  ii   0  ii  lt  str length  ii          const code   str charCodeAt ii      x00-xFFFF     bytes push code  amp  255  code  gt  gt  8      low  high       return bytes      For example   strToUtf16Bytes            60  216  53  223     However  If you want to get a UTF-8 byte array  you must transcode the bytes   UTF-8 Byte Array  The solution feels somewhat non-trivial  but I used the code below in a high-traffic production environment with great success  original source    Also  for the interested reader   I published my unicode helpers that help me work with string lengths reported by other languages such as PHP          Convert a string to a unicode byte array     param  string  str     return  Array  of bytes     export function strToUtf8Bytes str      const utf8         for  let ii   0  ii  lt  str length  ii          let charCode   str charCodeAt ii       if  charCode  lt  0x80  utf8 push charCode       else if  charCode  lt  0x800          utf8 push 0xc0    charCode  gt  gt  6   0x80    charCode  amp  0x3f          else if  charCode  lt  0xd800    charCode  gt   0xe000          utf8 push 0xe0    charCode  gt  gt  12   0x80     charCode  gt  gt  6   amp  0x3f   0x80    charCode  amp  0x3f          else         ii             Surrogate pair           UTF-16 encodes 0x10000-0x10FFFF by subtracting 0x10000 and          splitting the 20 bits of 0x0-0xFFFFF into two halves       charCode   0x10000      charCode  amp  0x3ff   lt  lt  10     str charCodeAt ii   amp  0x3ff          utf8 push          0xf0    charCode  gt  gt  18           0x80     charCode  gt  gt  12   amp  0x3f           0x80     charCode  gt  gt  6   amp  0x3f           0x80    charCode  amp  0x3f                        return utf8

User · Answer

Since I cannot comment on the answer  I d build on Jin Izzraeel s answer   var myBuffer       var str    Stack Overflow   var buffer   new Buffer str   utf16le    for  var i   0  i  lt  buffer length  i          myBuffer push buffer i       console log myBuffer      by saying that you could use this if you want to use a Node js buffer in your browser   https   github com feross buffer  Therefore  Tom Stickel s objection is not valid  and the answer is indeed a valid answer

[javascript] How to convert a String to Bytearray

Examples related to javascript