Uint8Array to string in Javascript

Question

I have some UTF-8 encoded data living in a range of Uint8Array elements in Javascript. Is there an efficient way to decode these out to a regular javascript string (I believe Javascript uses 16 bit Unicode)? I dont want to add one character at the time as the string concaternation would become to CPU intensive.

This question is related to javascript

User · Answer

Try these functions    var JsonToArray   function json        var str   JSON stringify json  null  0       var ret   new Uint8Array str length       for  var i   0  i  lt  str length  i              ret i    str charCodeAt i             return ret     var binArrayToJson   function binArray        var str           for  var i   0  i  lt  binArray length  i              str    String fromCharCode parseInt binArray i               return JSON parse str      source  https   gist github com tomfa 706d10fed78c497731ac  kudos to Tomfa

User · Answer

If you can t use the TextDecoder API because it is not supported on IE    You can use the FastestSmallestTextEncoderDecoder polyfill recommended by the Mozilla Developer Network website  You can use this function also provided at the MDN website     x000D   x000D  function utf8ArrayToString aBytes    x000D      var sView       x000D       x000D      for  var nPart  nLen   aBytes length  nIdx   0  nIdx  lt  nLen  nIdx      x000D          nPart   aBytes nIdx   x000D           x000D          sView    String fromCharCode  x000D              nPart  gt  251  amp  amp  nPart  lt  254  amp  amp  nIdx   5  lt  nLen      six bytes    x000D                      nPart - 252  lt  lt  30  may be not so safe in ECMAScript  So        x000D                   nPart - 252    1073741824    aBytes   nIdx  - 128  lt  lt  24     aBytes   nIdx  - 128  lt  lt  18     aBytes   nIdx  - 128  lt  lt  12     aBytes   nIdx  - 128  lt  lt  6    aBytes   nIdx  - 128 x000D                nPart  gt  247  amp  amp  nPart  lt  252  amp  amp  nIdx   4  lt  nLen      five bytes    x000D                   nPart - 248  lt  lt  24     aBytes   nIdx  - 128  lt  lt  18     aBytes   nIdx  - 128  lt  lt  12     aBytes   nIdx  - 128  lt  lt  6    aBytes   nIdx  - 128 x000D                nPart  gt  239  amp  amp  nPart  lt  248  amp  amp  nIdx   3  lt  nLen      four bytes    x000D                   nPart - 240  lt  lt  18     aBytes   nIdx  - 128  lt  lt  12     aBytes   nIdx  - 128  lt  lt  6    aBytes   nIdx  - 128 x000D                nPart  gt  223  amp  amp  nPart  lt  240  amp  amp  nIdx   2  lt  nLen      three bytes    x000D                   nPart - 224  lt  lt  12     aBytes   nIdx  - 128  lt  lt  6    aBytes   nIdx  - 128 x000D                nPart  gt  191  amp  amp  nPart  lt  224  amp  amp  nIdx   1  lt  nLen      two bytes    x000D                   nPart - 192  lt  lt  6    aBytes   nIdx  - 128 x000D                   nPart  lt  127         one byte    x000D                  nPart x000D             x000D        x000D       x000D      return sView  x000D    x000D   x000D  let str   utf8ArrayToString  50 72 226 130 130 32 43 32 79 226 130 130 32 226 135 140 32 50 72 226 130 130 79    x000D   x000D     Must show 2H2   O2   2H2O x000D  console log str   x000D   x000D   x000D

User · Answer

The solution given by Albert works well as long as the provided function is invoked infrequently and is only used for arrays of modest size, otherwise it is egregiously inefficient. Here is an enhanced vanilla JavaScript solution that works for both Node and browsers and has the following advantages:

• Works efficiently for all octet array sizes

• Generates no intermediate throw-away strings

• Supports 4-byte characters on modern JS engines (otherwise "?" is substituted)

var utf8ArrayToStr = (function () {
    var charCache = new Array(128);  // Preallocate the cache for the common single byte chars
    var charFromCodePt = String.fromCodePoint || String.fromCharCode;
    var result = [];

    return function (array) {
        var codePt, byte1;
        var buffLen = array.length;

        result.length = 0;

        for (var i = 0; i < buffLen;) {
            byte1 = array[i++];

            if (byte1 <= 0x7F) {
                codePt = byte1;
            } else if (byte1 <= 0xDF) {
                codePt = ((byte1 & 0x1F) << 6) | (array[i++] & 0x3F);
            } else if (byte1 <= 0xEF) {
                codePt = ((byte1 & 0x0F) << 12) | ((array[i++] & 0x3F) << 6) | (array[i++] & 0x3F);
            } else if (String.fromCodePoint) {
                codePt = ((byte1 & 0x07) << 18) | ((array[i++] & 0x3F) << 12) | ((array[i++] & 0x3F) << 6) | (array[i++] & 0x3F);
            } else {
                codePt = 63;    // Cannot convert four byte code points, so use "?" instead
                i += 3;
            }

            result.push(charCache[codePt] || (charCache[codePt] = charFromCodePt(codePt)));
        }

        return result.join('');
    };
})();

User · Answer

This should work      http   www onicos com staff iz amuse javascript expert utf txt     utf js - UTF-8  lt   gt  UTF-16 convertion       Copyright  C  1999 Masanao Izumo  lt iz onicos co jp gt     Version  1 0    LastModified  Dec 25 1999    This library is free   You can redistribute it and or modify it       function Utf8ArrayToStr array        var out  i  len  c      var char2  char3       out           len   array length      i   0      while i  lt  len        c   array i         switch c  gt  gt  4               case 0  case 1  case 2  case 3  case 4  case 5  case 6  case 7             0xxxxxxx         out    String fromCharCode c           break        case 12  case 13             110x xxxx   10xx xxxx         char2   array i             out    String fromCharCode   c  amp  0x1F   lt  lt  6     char2  amp  0x3F            break        case 14             1110 xxxx  10xx xxxx  10xx xxxx         char2   array i             char3   array i             out    String fromCharCode   c  amp  0x0F   lt  lt  12                             char2  amp  0x3F   lt  lt  6                             char3  amp  0x3F   lt  lt  0            break                   return out      It s somewhat cleaner as the other solutions because it doesn t use any hacks nor depends on Browser JS functions  e g  works also in other JS environments   Check out the JSFiddle demo   Also see the related questions  here and here

User · Answer

In Node  Buffer instances are also Uint8Array instances   so buf toString   works in this case

User · Answer

I was frustrated to see that people were not showing how to go both ways or showing that things work on none trivial UTF8 strings. I found a post on codereview.stackexchange.com that has some code that works well. I used it to turn ancient runes into bytes, to test some crypo on the bytes, then convert things back into a string. The working code is on github here. I renamed the methods for clarity:

// https://codereview.stackexchange.com/a/3589/75693
function bytesToSring(bytes) {
    var chars = [];
    for(var i = 0, n = bytes.length; i < n;) {
        chars.push(((bytes[i++] & 0xff) << 8) | (bytes[i++] & 0xff));
    }
    return String.fromCharCode.apply(null, chars);
}

// https://codereview.stackexchange.com/a/3589/75693
function stringToBytes(str) {
    var bytes = [];
    for(var i = 0, n = str.length; i < n; i++) {
        var char = str.charCodeAt(i);
        bytes.push(char >>> 8, char & 0xFF);
    }
    return bytes;
}

The unit test uses this UTF-8 string:

    // http://kermitproject.org/utf8.html
    // From the Anglo-Saxon Rune Poem (Rune version) 
    const secretUtf8 = `?????????????????????????????
?????????????????????????????????????????
?????????????????????????????????????`;

Note that the string length is only 117 characters but the byte length, when encoded, is 234.

If I uncomment the console.log lines I can see that the string that is decoded is the same string that was encoded (with the bytes passed through Shamir's secret sharing algorithm!):

User · Answer

Do what  Sudhir said  and then to get a String out of the comma seperated list of numbers use   for  var i 0  i lt unitArr byteLength  i                  myString    String fromCharCode unitArr i               This will give you the string you want   if it s still relevant

User · Answer

Found in one of the Chrome sample applications  although this is meant for larger blocks of data where you re okay with an asynchronous conversion          Converts an array buffer to a string        private     param  ArrayBuffer  buf The buffer to convert     param  Function  callback The function to call when conversion is complete     function  arrayBufferToString buf  callback      var bb   new Blob  new Uint8Array buf       var f   new FileReader      f onload   function e        callback e target result          f readAsText bb

User · Answer

class UTF8  static encode str string  return new UTF8   encode str   static decode data Uint8Array  return new UTF8   decode data    private EOF byte number   -1  private EOF code point number   -1  private encoderError code point        console error  UTF8 encoderError  code point    private decoderError fatal  opt code point   number       if  fatal  console error  UTF8 decoderError  opt code point      return opt code point    0xFFFD    private inRange a number  min number  max number        return min  lt   a  amp  amp  a  lt   max    private div n number  d number        return Math floor n   d     private stringToCodePoints string string             type  Array  lt number gt          let cps              Based on http   www w3 org TR WebIDL  idl-DOMString     let i   0  n   string length      while  i  lt  string length            let c   string charCodeAt i           if   this inRange c  0xD800  0xDFFF                 cps push c             else if  this inRange c  0xDC00  0xDFFF                 cps push 0xFFFD             else       inRange c  0xD800  0xDBFF               if  i    n - 1                    cps push 0xFFFD                 else                   let d   string charCodeAt i   1                   if  this inRange d  0xDC00  0xDFFF                         let a   c  amp  0x3FF                      let b   d  amp  0x3FF                      i    1                      cps push 0x10000    a  lt  lt  10    b                     else                       cps push 0xFFFD                                                     i    1            return cps     private encode str string  Uint8Array       let pos number   0      let codePoints   this stringToCodePoints str       let outputBytes            while  codePoints length  gt  pos            let code point number   codePoints pos              if  this inRange code point  0xD800  0xDFFF                 this encoderError code point                     else if  this inRange code point  0x0000  0x007f                 outputBytes push code point             else               let count   0  offset   0              if  this inRange code point  0x0080  0x07FF                     count   1                  offset   0xC0                else if  this inRange code point  0x0800  0xFFFF                     count   2                  offset   0xE0                else if  this inRange code point  0x10000  0x10FFFF                     count   3                  offset   0xF0                             outputBytes push this div code point  Math pow 64  count     offset                while  count  gt  0                    let temp   this div code point  Math pow 64  count - 1                    outputBytes push 0x80    temp   64                    count -  1                                    return new Uint8Array outputBytes      private decode data Uint8Array  string       let fatal boolean   false      let pos number   0      let result string           let code point number      let utf8 code point   0      let utf8 bytes needed   0      let utf8 bytes seen   0      let utf8 lower boundary   0       while  data length  gt  pos            let  byte   data pos              if   byte    this EOF byte                if  utf8 bytes needed    0                    code point   this decoderError fatal                 else                   code point   this EOF code point                          else               if  utf8 bytes needed    0                    if  this inRange  byte  0x00  0x7F                         code point    byte                    else                       if  this inRange  byte  0xC2  0xDF                             utf8 bytes needed   1                          utf8 lower boundary   0x80                          utf8 code point    byte - 0xC0                        else if  this inRange  byte  0xE0  0xEF                             utf8 bytes needed   2                          utf8 lower boundary   0x800                          utf8 code point    byte - 0xE0                        else if  this inRange  byte  0xF0  0xF4                             utf8 bytes needed   3                          utf8 lower boundary   0x10000                          utf8 code point    byte - 0xF0                        else                           this decoderError fatal                                             utf8 code point   utf8 code point   Math pow 64  utf8 bytes needed                       code point   null                                  else if   this inRange  byte  0x80  0xBF                     utf8 code point   0                  utf8 bytes needed   0                  utf8 bytes seen   0                  utf8 lower boundary   0                  pos--                  code point   this decoderError fatal   byte                 else                   utf8 bytes seen    1                  utf8 code point   utf8 code point     byte - 0x80    Math pow 64  utf8 bytes needed - utf8 bytes seen                    if  utf8 bytes seen     utf8 bytes needed                        code point   null                    else                       let cp   utf8 code point                      let lower boundary   utf8 lower boundary                      utf8 code point   0                      utf8 bytes needed   0                      utf8 bytes seen   0                      utf8 lower boundary   0                      if  this inRange cp  lower boundary  0x10FFFF   amp  amp   this inRange cp  0xD800  0xDFFF                             code point   cp                        else                           code point   this decoderError fatal   byte                                                                              Decode string         if  code point     null  amp  amp  code point     this EOF code point                if  code point  lt   0xFFFF                    if  code point  gt  0 result    String fromCharCode code point                 else                   code point -  0x10000                  result    String fromCharCode 0xD800     code point  gt  gt  10   amp  0x3ff                    result    String fromCharCode 0xDC00    code point  amp  0x3ff                                      return result

User · Answer

Here s what I use   var str   String fromCharCode apply null  uint8Arr

User · Answer

TextEncoder and TextDecoder from the Encoding standard  which is polyfilled by the stringencoding library  converts between strings and ArrayBuffers   var uint8array   new TextEncoder  utf-8   encode        var string   new TextDecoder  utf-8   decode uint8array

User · Answer

I am using this Typescript snippet   function UInt8ArrayToString uInt8Array  Uint8Array   string       var s  string            for var i  number   0  i  lt  uInt8Array byteLength  i                  if  i  gt  0               s                  s    uInt8Array i             s             return s      Remove the type annotations if you need the JavaScript version  Hope this helps

User · Answer

In NodeJS  we have Buffers available  and string conversion with them is really easy  Better  it s easy to convert a Uint8Array to a Buffer  Try this code  it s worked for me in Node for basically any conversion involving Uint8Arrays   let str   Buffer from uint8arr buffer  toString      We re just extracting the ArrayBuffer from the Uint8Array and then converting that to a proper NodeJS Buffer  Then we convert the Buffer to a string  you can throw in a hex or base64 encoding if you want    If we want to convert back to a Uint8Array from a string  then we d do this   let uint8arr   new Uint8Array Buffer from str      Be aware that if you declared an encoding like base64 when converting to a string  then you d have to use Buffer from str   base64   if you used base64  or whatever other encoding you used   This will not work in the browser without a module  NodeJS Buffers just don t exist in the browser  so this method won t work unless you add Buffer functionality to the browser  That s actually pretty easy to do though  just use a module like this  which is both small and fast

[javascript] Uint8Array to string in Javascript

The answer is

Examples related to javascript

Tags