Convert array of indices to 1-hot encoded numpy array

Question

Let s say I have a 1d numpy array a   array  1 0 3    I would like to encode this as a 2D one-hot array b   array   0 1 0 0    1 0 0 0    0 0 0 1     Is there a quick way to do this   Quicker than just looping over a to set elements of b  that is

User · Answer

Just to elaborate on the excellent answer from K3---rnc, here is a more generic version:

def onehottify(x, n=None, dtype=float):
    """1-hot encode x with the max value n (computed from data if n is None)."""
    x = np.asarray(x)
    n = np.max(x) + 1 if n is None else n
    return np.eye(n, dtype=dtype)[x]

Also, here is a quick-and-dirty benchmark of this method and a method from the currently accepted answer by YXD (slightly changed, so that they offer the same API except that the latter works only with 1D ndarrays):

def onehottify_only_1d(x, n=None, dtype=float):
    x = np.asarray(x)
    n = np.max(x) + 1 if n is None else n
    b = np.zeros((len(x), n), dtype=dtype)
    b[np.arange(len(x)), x] = 1
    return b

The latter method is ~35% faster (MacBook Pro 13 2015), but the former is more general:

>>> import numpy as np
>>> np.random.seed(42)
>>> a = np.random.randint(0, 9, size=(10_000,))
>>> a
array([6, 3, 7, ..., 5, 8, 6])
>>> %timeit onehottify(a, 10)
188 µs ± 5.03 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> %timeit onehottify_only_1d(a, 10)
139 µs ± 2.78 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

User · Answer

p will be a 2d ndarray  We want to know which value is the highest in a row  to put there 1 and everywhere else 0      clean and easy solution   max elements i   np expand dims np argmax p  axis 1   axis 1  one hot   np zeros p shape  np put along axis one hot  max elements i  1  axis 1

User · Answer

I recently ran into a problem of same kind and found said solution which turned out to be only satisfying if you have numbers that go within a certain formation  For example if you want to one-hot encode following list   all good list    0 1 2 3 4    go ahead  the posted solutions are already mentioned above  But what if considering this data   problematic list    0 23 12 89 10    If you do it with methods mentioned above  you will likely end up with 90 one-hot columns  This is because all answers include something like n   np max a  1  I found a more generic solution that worked out for me and wanted to share with you   import numpy as np import sklearn sklb   sklearn preprocessing LabelBinarizer   a   np asarray  1 2 44 3 2   n   np unique a  sklb fit n  b   sklb transform a    I hope someone encountered same restrictions on above solutions and this might come in handy

User · Answer

Here is an example function that I wrote to do this based upon the answers above and my own use case   def label vector to one hot vector vector  one hot size 10               Use to convert a column vector to a  one-hot  matrix      Example          vector    2    0    1           one hot size  3         returns                 0    0    1                   1    0    0                   0    1    0         Parameters          vector  np array   of size  n  1  to be converted         one hot size  int  optional  size of  one-hot  row vector      Returns          np array size  vector size  one hot size   converted to a  one-hot  matrix             squeezed vector   np squeeze vector  axis -1       one hot   np zeros  squeezed vector size  one hot size        one hot np arange squeezed vector size   squeezed vector    1      return one hot  label vector to one hot vector vector   2    0    1    one hot size 3

User · Answer

Using a Neuraxle pipeline step    Set up your example   import numpy as np a   np array  1 0 3   b   np array   0 1 0 0    1 0 0 0    0 0 0 1       Do the actual conversion   from neuraxle steps numpy import OneHotEncoder encoder   OneHotEncoder nb columns 4  b pred   encoder transform a     Assert it works   assert b pred    b   Link to documentation  neuraxle steps numpy OneHotEncoder

User · Answer

Such type of encoding are usually part of numpy array  If you are using a numpy array like this    a   np array  1 0 3     then there is very simple way to convert that to 1-hot encoding  out    np arange 4     a   None   astype np float32    That s it

User · Answer

For 1-hot-encoding     one hot encode pandas get dummies array    For Example  ENJOY CODING

User · Answer

I think the short answer is no  For a more generic case in n dimensions  I came up with this     For 2-dimensional data  4 values a   np array   0  1  2    3  2  1    z   np zeros list a shape     4   z list np indices z shape  -1       a     1   I am wondering if there is a better solution -- I don t like that I have to create those lists in the last two lines  Anyway  I did some measurements with timeit and it seems that the numpy-based  indices arange  and the iterative versions perform about the same

User · Answer

In case you are using keras  there is a built in utility for that   from keras utils np utils import to categorical     categorical labels   to categorical int labels  num classes 3    And it does pretty much the same as  YXD s answer  see source-code

User · Answer

You can use  sklearn preprocessing LabelBinarizer   Example   import sklearn preprocessing a    1 0 3  label binarizer   sklearn preprocessing LabelBinarizer   label binarizer fit range max a  1   b   label binarizer transform a  print   0   format b     output     0 1 0 0    1 0 0 0    0 0 0 1     Amongst other things  you may initialize sklearn preprocessing LabelBinarizer   so that the output of transform is sparse

User · Answer

gt  gt  gt  values    1  0  3   gt  gt  gt  n values   np max values    1  gt  gt  gt  np eye n values  values  array    0    1    0    0             1    0    0    0             0    0    0    1

User · Answer

Here s a dimensionality-independent standalone solution   This will convert any N-dimensional array arr of nonnegative integers to a one-hot N 1-dimensional array one hot  where one hot i 1     i N c    1 means arr i 1     i N    c  You can recover the input via np argmax one hot  -1   def expand integer grid arr  n classes                 param arr  N dim array of size i 1       i N      param n classes  C      returns  one-hot N 1 dim array of size i 1       i N  C      rtype  ndarray              one hot   np zeros arr shape    n classes        axes ranges    range arr shape i   for i in range arr ndim       flat grids      ravel   for   in np meshgrid  axes ranges  indexing  ij        one hot flat grids    arr ravel       1     assert  one hot sum -1     1  all        assert np allclose np argmax one hot  -1   arr       return one hot

User · Answer

Use the following code  It works best   def one hot encode x           argument         - x  a list of labels     return         - one hot encoding matrix  number of labels  number of class      encoded   np zeros  len x   10    for idx  val in enumerate x       encoded idx  val    1  return encoded   Found it here P S You don t need to go into the link

User · Answer

Here is a function that converts a 1-D vector to a 2-D one-hot array      usr bin env python import numpy as np  def convertToOneHot vector  num classes None               Converts an input 1-D vector of integers into an output     2-D array of one-hot vectors  where an i th input value     of j will set a  1  in the i th row  j th column of the     output array       Example          v   np array  1  0  4           one hot v   convertToOneHot v          print one hot v            0 1 0 0 0            1 0 0 0 0            0 0 0 0 1                assert isinstance vector  np ndarray      assert len vector   gt  0      if num classes is None          num classes   np max vector  1     else          assert num classes  gt  0         assert num classes  gt   np max vector       result   np zeros shape  len vector   num classes       result np arange len vector    vector    1     return result astype int    Below is some example usage    gt  gt  gt  a   np array  1  0  3     gt  gt  gt  convertToOneHot a  array   0  1  0  0           1  0  0  0           0  0  0  1      gt  gt  gt  convertToOneHot a  num classes 10  array   0  1  0  0  0  0  0  0  0  0           1  0  0  0  0  0  0  0  0  0           0  0  0  1  0  0  0  0  0  0

User · Answer

If using tensorflow  there is one hot    import tensorflow as tf import numpy as np  a   np array  1  0  3   depth   4 b   tf one hot a  depth     lt tf Tensor  shape  3  3   dtype float32  numpy    array   0   1   0              1   0   0              0   0   0     dtype float32  gt

User · Answer

You can also use eye function of numpy   numpy eye number of classes  vector containing the labels

User · Answer

I am adding for completion a simple function  using only numpy operators       def probs to onehot output probabilities           argmax indices array   np argmax output probabilities  axis 1          onehot output array   np eye np unique argmax indices array  shape 0   argmax indices array reshape -1           return onehot output array   It takes as input a probability matrix  e g          0 03038822 0 65810204 0 16549407 0 3797123              0 02771272 0 2760752  0 3280924  0 33458805       And it will return       0 1 0 0         0 0 0 1

User · Answer

Here is what I find useful   def one hot a  num classes     return np squeeze np eye num classes  a reshape -1      Here num classes stands for number of classes you have  So if you have a vector with shape of  10000   this function transforms it to  10000 C   Note that a is zero-indexed  i e  one hot np array  0  1    2  will give   1  0    0  1     Exactly what you wanted to have I believe   PS  the source is Sequence models - deeplearning ai

User · Answer

You can use the following code for converting into a one-hot vector   let x is the normal class vector having a single column with classes 0 to some number   import numpy as np np eye x max   1  x    if 0 is not a class  then remove  1

User · Answer

Your array a defines the columns of the nonzero elements in the output array  You need to also define the rows and then use fancy indexing    gt  gt  gt  a   np array  1  0  3    gt  gt  gt  b   np zeros  a size  a max   1    gt  gt  gt  b np arange a size  a    1  gt  gt  gt  b array    0    1    0    0             1    0    0    0             0    0    0    1

[python] Convert array of indices to 1-hot encoded numpy array

Examples related to python

Examples related to numpy

Examples related to machine-learning

Examples related to numpy-ndarray

Examples related to one-hot-encoding