[javascript] How to convert UTF8 string to byte array?

Assuming the question is about a DOMString as input and the goal is to get an Array, that when interpreted as string (e.g. written to a file on disk), would be UTF-8 encoded:

Now that nearly all modern browsers support Typed Arrays, it'd be ashamed if this approach is not listed:

  • According to the W3C, software supporting the File API should accept DOMStrings in their Blob constructor (see also: String encoding when constructing a Blob)
  • Blobs can be converted to an ArrayBuffer using the .readAsArrayBuffer() function of a File Reader
  • Using a DataView or constructing a Typed Array with the buffer read by the File Reader, one can access every single byte of the ArrayBuffer

Example:

// Create a Blob with an Euro-char (U+20AC)
var b = new Blob(['€']);
var fr = new FileReader();

fr.onload = function() {
    ua = new Uint8Array(fr.result);
    // This will log "3|226|130|172"
    //                  E2  82  AC
    // In UTF-16, it would be only 2 bytes long
    console.log(
        fr.result.byteLength + '|' + 
        ua[0]  + '|' + 
        ua[1] + '|' + 
        ua[2] + ''
    );
};
fr.readAsArrayBuffer(b);

Play with that on JSFiddle. I haven't benchmarked this yet but I can imagine this being efficient for large DOMStrings as input.