Conversion between UTF-8 ArrayBuffer and String

Question

I have an ArrayBuffer which contains a string encoded using UTF-8 and I can t find a standard way of converting such ArrayBuffer into a JS String  which I understand is encoded using UTF-16    I ve seen this code in numerous places  but I fail to see how it would work with any UTF-8 code points that are longer than 1 byte   return String fromCharCode apply null  new Uint8Array data      Similarly  I can t find a standard way of converting from a String to a UTF-8 encoded ArrayBuffer

User · Answer

Using TextEncoder and TextDecoder  var uint8array   new TextEncoder  utf-8   encode  Plain Text    var string   new TextDecoder   decode uint8array   console log uint8array  string

User · Answer

If you are doing this in browser there are no character encoding libraries built-in  but you can get by with   function pad n        return n length  lt  2    0    n   n     var array   new Uint8Array data   var str       for  var i   0  len   array length  i  lt  len    i         str            pad array i  toString 16       str   decodeURIComponent str     Here s a demo that decodes a 3-byte UTF-8 unit  http   jsfiddle net Z9pQE

User · Answer

The methods readAsArrayBuffer and readAsText from a FileReader object converts a Blob object to an ArrayBuffer or to a DOMString asynchronous   A Blob object type can be created from a raw text or byte array  for example   let blob   new Blob  text     type   text plain       let reader   new FileReader    reader onload   event   gt        let buffer   event target result     reader readAsArrayBuffer blob     I think it s better to pack up this in a promise   function textToByteArray text        let blob   new Blob  text     type   text plain          let reader   new FileReader        let done   function             reader onload   event   gt                done new Uint8Array event target result               reader readAsArrayBuffer blob        return   done  function callback    done   callback         function byteArrayToText bytes  encoding        let blob   new Blob  bytes     type   application octet-stream          let reader   new FileReader        let done   function             reader onload   event   gt                done event target result               if encoding    reader readAsText blob  encoding     else   reader readAsText blob          return   done  function callback    done   callback         let text     uD83D uDCA9    u2661   textToByteArray text  done bytes   gt        console log bytes       byteArrayToText bytes   UTF-8   done text   gt                 console log text

User · Answer

The latest answers to these type of questions  using nowadays methods  is here  Converting between strings and ArrayBuffers

User · Answer

function stringToUint string        var string   btoa unescape encodeURIComponent string             charList   string split              uintArray           for  var i   0  i  lt  charList length  i              uintArray push charList i  charCodeAt 0              return new Uint8Array uintArray      function uintToString uintArray        var encodedString   String fromCharCode apply null  uintArray           decodedString   decodeURIComponent escape atob encodedString         return decodedString      I have done  with some help from the internet  these little functions  they should solve your problems  Here is the working JSFiddle   EDIT   Since the source of the Uint8Array is external and you can t use atob you just need to remove it working fiddle    function uintToString uintArray        var encodedString   String fromCharCode apply null  uintArray           decodedString   decodeURIComponent escape encodedString        return decodedString      Warning  escape and unescape is removed from web standards  See this

User · Answer

There s a polyfill for Encoding over on Github  text-encoding   It s easy for Node or the browser  and the Readme advises the following   var uint8array   TextEncoder encoding  encode string   var string   TextDecoder encoding  decode uint8array     If I recall   utf-8  is the encoding you need  and of course you ll need to wrap your buffer   var uint8array   new Uint8Array utf8buffer     Hope it works as well for you as it has for me

User · Answer

The main problem of programmers looking for conversion from byte array into a string is UTF-8 encoding  compression  of unicode characters  This code will help you   var getString   function  strBytes         var MAX SIZE   0x4000      var codeUnits           var highSurrogate      var lowSurrogate      var index   -1       var result            while    index  lt  strBytes length            var codePoint   Number strBytes index             if  codePoint      codePoint  amp  0x7F                else if  0xF0      codePoint  amp  0xF0                 codePoint    0xF0              codePoint    codePoint  lt  lt  6     strBytes   index    0x80               codePoint    codePoint  lt  lt  6     strBytes   index    0x80               codePoint    codePoint  lt  lt  6     strBytes   index    0x80             else if  0xE0      codePoint  amp  0xE0                 codePoint    0xE0              codePoint    codePoint  lt  lt  6     strBytes   index    0x80               codePoint    codePoint  lt  lt  6     strBytes   index    0x80             else if  0xC0      codePoint  amp  0xC0                 codePoint    0xC0              codePoint    codePoint  lt  lt  6     strBytes   index    0x80                      if   isFinite codePoint     codePoint  lt  0    codePoint  gt  0x10FFFF    Math floor codePoint     codePoint              throw RangeError  Invalid code point      codePoint            if  codePoint  lt   0xFFFF              codeUnits push codePoint           else               codePoint -  0x10000              highSurrogate    codePoint  gt  gt  10    0xD800              lowSurrogate    codePoint   0x400    0xDC00              codeUnits push highSurrogate  lowSurrogate                     if  index   1    strBytes length    codeUnits length  gt  MAX SIZE                result    String fromCharCode apply null  codeUnits               codeUnits length   0                       return result      All the best

User · Answer

This should work      http   www onicos com staff iz amuse javascript expert utf txt     utf js - UTF-8  lt   gt  UTF-16 convertion       Copyright  C  1999 Masanao Izumo  lt iz onicos co jp gt     Version  1 0    LastModified  Dec 25 1999    This library is free   You can redistribute it and or modify it       function Utf8ArrayToStr array      var out  i  len  c    var char2  char3     out         len   array length    i   0    while  i  lt  len        c   array i         switch  c  gt  gt  4               case 0  case 1  case 2  case 3  case 4  case 5  case 6  case 7             0xxxxxxx         out    String fromCharCode c           break        case 12  case 13             110x xxxx   10xx xxxx         char2   array i             out    String fromCharCode   c  amp  0x1F   lt  lt  6     char2  amp  0x3F            break        case 14             1110 xxxx  10xx xxxx  10xx xxxx         char2   array i             char3   array i             out    String fromCharCode   c  amp  0x0F   lt  lt  12                                         char2  amp  0x3F   lt  lt  6                                         char3  amp  0x3F   lt  lt  0            break                  return out      It s somewhat cleaner as the other solutions because it doesn t use any hacks nor depends on Browser JS functions  e g  works also in other JS environments   Check out the JSFiddle demo   Also see the related questions  here  here

User · Answer

If you don t want to use any external polyfill library  you can use this function provided by the Mozilla Developer Network website    x000D   x000D  function utf8ArrayToString aBytes    x000D      var sView       x000D       x000D      for  var nPart  nLen   aBytes length  nIdx   0  nIdx  lt  nLen  nIdx      x000D          nPart   aBytes nIdx   x000D           x000D          sView    String fromCharCode  x000D              nPart  gt  251  amp  amp  nPart  lt  254  amp  amp  nIdx   5  lt  nLen      six bytes    x000D                      nPart - 252  lt  lt  30  may be not so safe in ECMAScript  So        x000D                   nPart - 252    1073741824    aBytes   nIdx  - 128  lt  lt  24     aBytes   nIdx  - 128  lt  lt  18     aBytes   nIdx  - 128  lt  lt  12     aBytes   nIdx  - 128  lt  lt  6    aBytes   nIdx  - 128 x000D                nPart  gt  247  amp  amp  nPart  lt  252  amp  amp  nIdx   4  lt  nLen      five bytes    x000D                   nPart - 248  lt  lt  24     aBytes   nIdx  - 128  lt  lt  18     aBytes   nIdx  - 128  lt  lt  12     aBytes   nIdx  - 128  lt  lt  6    aBytes   nIdx  - 128 x000D                nPart  gt  239  amp  amp  nPart  lt  248  amp  amp  nIdx   3  lt  nLen      four bytes    x000D                   nPart - 240  lt  lt  18     aBytes   nIdx  - 128  lt  lt  12     aBytes   nIdx  - 128  lt  lt  6    aBytes   nIdx  - 128 x000D                nPart  gt  223  amp  amp  nPart  lt  240  amp  amp  nIdx   2  lt  nLen      three bytes    x000D                   nPart - 224  lt  lt  12     aBytes   nIdx  - 128  lt  lt  6    aBytes   nIdx  - 128 x000D                nPart  gt  191  amp  amp  nPart  lt  224  amp  amp  nIdx   1  lt  nLen      two bytes    x000D                   nPart - 192  lt  lt  6    aBytes   nIdx  - 128 x000D                   nPart  lt  127         one byte    x000D                  nPart x000D             x000D        x000D       x000D      return sView  x000D    x000D   x000D  let str   utf8ArrayToString  50 72 226 130 130 32 43 32 79 226 130 130 32 226 135 140 32 50 72 226 130 130 79    x000D   x000D     Must show 2H2   O2   2H2O x000D  console log str   x000D   x000D   x000D

[javascript] Conversion between UTF-8 ArrayBuffer and String

Examples related to javascript

Examples related to string

Examples related to utf-8

Examples related to arraybuffer