Javascript Unicode string to hex

Question

I m trying to convert a unicode string to a hexadecimal representation in javascript   This is what I have    function convertFromHex hex        var hex   hex toString     force conversion     var str           for  var i   0  i  lt  hex length  i    2          str    String fromCharCode parseInt hex substr i  2   16        return str     function convertToHex str        var hex           for var i 0 i lt str length i              hex       str charCodeAt i  toString 16             return hex      But if fails on unicode characters  like chinese    Input       Output     o  W  Any ideas  Can this be done in javascript

User · Answer

It depends on what encoding you use  If you want to convert utf-8 encoded hex to string  use this     function fromHex hex str     try      str   decodeURIComponent hex replace       g    1          catch e       str   hex     console log  invalid hex input      hex        return str     For the other direction use this     function toHex str hex     try      hex   unescape encodeURIComponent str        split     map function v         return v charCodeAt 0  toString 16         join           catch e       hex   str     console log  invalid text input      str        return hex

User · Answer

A more up to date solution  for encoding      This is the same for all of the below  and    you probably won t need it except for debugging    in most cases  function bytesToHex bytes      return Array from      bytes      byte   gt  byte toString 16  padStart 2   0       join            You almost certainly want UTF-8  which is    now natively supported  function stringToUTF8Bytes string      return new TextEncoder   encode string         But you might want UTF-16 for some reason      charCodeAt index  will return the underlying    UTF-16 code-units  not code-points    so you    just need to format them in whichever endian order you want  function stringToUTF16Bytes string  littleEndian      const bytes   new Uint8Array string length   2        Using DataView is the only way to get a specific      endianness    const view   new DataView bytes buffer     for  let i   0  i    string length  i          view setUint16 i  string charCodeAt i   littleEndian         return bytes        And you might want UTF-32 in even weirder cases     Fortunately  iterating a string gives the code    points  which are identical to the UTF-32 encoding     though you still have the endianess issue  function stringToUTF32Bytes string  littleEndian      const codepoints   Array from string  c   gt  c codePointAt 0      const bytes   new Uint8Array codepoints length   4        Using DataView is the only way to get a specific      endianness    const view   new DataView bytes buffer     for  let i   0  i    codepoints length  i          view setUint32 i  codepoints i   littleEndian         return bytes      Examples   bytesToHex stringToUTF8Bytes  hello            68656c6c6f20e6bca2e5ad9720f09f918d  bytesToHex stringToUTF16Bytes  hello       false       00680065006c006c006f00206f225b570020d83ddc4d  bytesToHex stringToUTF16Bytes  hello       true       680065006c006c006f002000226f575b20003dd84ddc  bytesToHex stringToUTF32Bytes  hello       false       00000068000000650000006c0000006c0000006f0000002000006f2200005b57000000200001f44d  bytesToHex stringToUTF32Bytes  hello       true       68000000650000006c0000006c0000006f00000020000000226f0000575b0000200000004df40100    For decoding  it s generally a lot simpler  you just need   function hexToBytes hex        const bytes   new Uint8Array hex length   2       for  let i   0  i     bytes length  i              bytes i    parseInt hex substr i   2  2   16             return bytes      then use the encoding parameter of TextDecoder      UTF-8 is default new TextDecoder   decode hexToBytes  68656c6c6f20e6bca2e5ad9720f09f918d        but you can also use  new TextDecoder  UTF-16LE   decode hexToBytes  680065006c006c006f002000226f575b20003dd84ddc    new TextDecoder  UTF-16BE   decode hexToBytes  00680065006c006c006f00206f225b570020d83ddc4d         hello        Here s the list of allowed encoding names  https   www w3 org TR encoding  names-and-labels  You might notice UTF-32 is not on that list  which is a pain  so   function bytesToStringUTF32 bytes  littleEndian      const view   new DataView bytes buffer     const codepoints   new Uint32Array view byteLength   4     for  let i   0  i     codepoints length  i          codepoints i    view getUint32 i   4  littleEndian         return String fromCodePoint    codepoints       Then   bytesToStringUTF32 hexToBytes  00000068000000650000006c0000006c0000006f0000002000006f2200005b57000000200001f44d    false  bytesToStringUTF32 hexToBytes  68000000650000006c0000006c0000006f00000020000000226f0000575b0000200000004df40100    true      hello

User · Answer

Here is my take  these functions convert a UTF8 string to a proper HEX without the extra zeroes padding  A real UTF8 string has characters with 1  2  3 and 4 bytes length   While working on this I found a couple key things that solved my problems    str split     doesn t handle multi-byte characters like emojis correctly  The proper modern way to handle this is with Array from str  encodeURIComponent   and decodeURIComponent   are great tools to convert between string and hex  They are pretty standard  they handle UTF8 correctly   Most  ASCII characters  codes 0 - 127  don t get URI encoded  so they need to handled separately  But c charCodeAt 0  toString 16  works perfectly for those       function utf8ToHex str          return Array from str  map c   gt           c charCodeAt 0   lt  128   c charCodeAt 0  toString 16             encodeURIComponent c  replace     g     toLowerCase           join                 function hexToUtf8  function hex          return decodeURIComponent       hex match    1 2  g  join                Demo  https   jsfiddle net lyquix k2tjbrvq

User · Answer

Remember that a JavaScript code unit is 16 bits wide  Therefore the hex string form will be 4 digits per code unit   usage   var str     u6f22 u5b57        u6f22 u5b57           alert str hexEncode   hexDecode       String to hex form   String prototype hexEncode   function        var hex  i       var result           for  i 0  i lt this length  i              hex   this charCodeAt i  toString 16           result      000  hex  slice -4              return result     Back again   String prototype hexDecode   function        var j      var hexes   this match    1 4  g             var back           for j   0  j lt hexes length  j              back    String fromCharCode parseInt hexes j   16               return back

User · Answer

Here is a tweak of McDowell s algorithm that doesn t pad the result     function toHex str        var result           for  var i 0  i lt str length  i            result    str charCodeAt i  toString 16             return result

User · Answer

how do you get   u6f22 u5b57  from    in JavaScript    These are JavaScript Unicode escape sequences e g   u12AB  To convert them  you could iterate over every code unit in the string  call  toString 16  on it  and go from there   However  it is more efficient to also use hexadecimal escape sequences e g   xAA in the output wherever possible   Also note that ASCII symbols such as A  b  and - probably don   t need to be escaped   I   ve written a small JavaScript library that does all this for you  called jsesc  It has lots of options to control the output   Here   s an online demo of the tool in action  http   mothereff in js-escapes 1 E6 BC A2 E5 AD 97      Your question was tagged as utf-8  Reading the rest of your question  UTF-8 encoding decoding didn   t seem to be what you wanted here  but in case you ever need it  use utf8 js  online demo

User · Answer

Here you go   D       split     reduce  hex c   gt hex  c charCodeAt 0  toString 16  padStart 4  0           6f225b57     for non unicode   hi  split     reduce  hex c   gt hex  c charCodeAt 0  toString 16  padStart 2  0           6869     ASCII  utf-8  binary HEX string to string   68656c6c6f20776f726c6421  match    1 2  g  reduce  acc char   gt acc String fromCharCode parseInt char  16         String to ASCII  utf-8  binary HEX string   hello world   split     reduce  hex c   gt hex  c charCodeAt 0  toString 16  padStart 2  0         --- unicode ---  String to UNICODE  utf-16  binary HEX string   hello world   split     reduce  hex c   gt hex  c charCodeAt 0  toString 16  padStart 4  0         UNICODE  utf-16  binary HEX string to string   00680065006c006c006f00200077006f0072006c00640021  match    1 4  g  reduce  acc char   gt acc String fromCharCode parseInt char  16

[javascript] Javascript: Unicode string to hex

Examples related to javascript

Examples related to unicode

Examples related to utf-8

Examples related to hex