Convert categorical data in pandas dataframe

Question

I have a dataframe with this type of data (too many columns):

col1        int64
col2        int64
col3        category
col4        category
col5        category

Columns seems like this:

Name: col3, dtype: category
Categories (8, object): [B, C, E, G, H, N, S, W]

I want to convert all value in columns to integer like this:

[1, 2, 3, 4, 5, 6, 7, 8]

I solved this for one column by this:

dataframe['c'] = pandas.Categorical.from_array(dataframe.col3).codes

Now I have two columns in my dataframe - old col3 and new c and need to drop old columns.

That's bad practice. It's work but in my dataframe many columns and I don't want do it manually.

How do this pythonic and just cleverly?

User · Answer

If your concern was only that you making a extra column and deleting it later, just dun use a new column at the first place.

dataframe = pd.DataFrame({'col1':[1,2,3,4,5], 'col2':list('abcab'),  'col3':list('ababb')})
dataframe.col3 = pd.Categorical.from_array(dataframe.col3).codes

You are done. Now as Categorical.from_array is deprecated, use Categorical directly

dataframe.col3 = pd.Categorical(dataframe.col3).codes

If you also need the mapping back from index to label, there is even better way for the same

dataframe.col3, mapping_index = pd.Series(dataframe.col3).factorize()

check below

print(dataframe)
print(mapping_index.get_loc("c"))

User · Answer

You can do it less code like below   f   pd DataFrame   col1   1 2 3 4 5    col2  list  abcab    col3  list  ababb      f  col1    f  col1   astype  category   cat codes f  col2    f  col2   astype  category   cat codes f  col3    f  col3   astype  category   cat codes  f

User · Answer

Quickbeam2k1  see below                            -  dataset pd read csv  Data2 csv   np set printoptions threshold np nan  X   dataset iloc      values   Using sklearn   from sklearn preprocessing import LabelEncoder labelencoder X LabelEncoder   X   0    labelencoder X fit transform X   0

User · Answer

First  to convert a Categorical column to its numerical codes  you can do this easier with  dataframe  c   cat codes  Further  it is possible to select automatically all columns with a certain dtype in a dataframe using select dtypes  This way  you can apply above operation on multiple and automatically selected columns   First making an example dataframe   In  75   df   pd DataFrame   col1   1 2 3 4 5    col2  list  abcab      col3  list  ababb      In  76   df  col2     df  col2   astype  category    In  77   df  col3     df  col3   astype  category    In  78   df dtypes Out 78   col1       int64 col2    category col3    category dtype  object   Then by using select dtypes to select the columns  and then applying  cat codes on each of these columns  you can get the following result   In  80   cat columns   df select dtypes   category    columns  In  81   cat columns Out 81   Index  u col2   u col3    dtype  object    In  83   df cat columns    df cat columns  apply lambda x  x cat codes   In  84   df Out 84      col1  col2  col3 0     1     0     0 1     2     1     1 2     3     2     0 3     4     0     1 4     5     1     1

User · Answer

For a certain column  if you don t care about the ordering  use this  df  col1 num     df  col1   apply lambda x  np where df  col1   unique    x  0  0     If you care about the ordering  specify them as a list and use this  df  col1 num     df  col1   apply lambda x    first    second    third   index x

User · Answer

This works for me   pandas factorize    B    C    D    B     0    Output    0  1  2  0

User · Answer

One of the simplest ways to convert the categorical variable into dummy/indicator variables is to use get_dummies provided by pandas. Say for example we have data in which sex is a categorical value (male & female) and you need to convert it into a dummy/indicator here is how to do it.

_x000D_

tranning_data = pd.read_csv("../titanic/train.csv")
features = ["Age", "Sex", ] //here sex is catagorical value
X_train = pd.get_dummies(tranning_data[features])
print(X_train)

Age Sex_female Sex_male
20    0          1
33    1          0
40    1          0
22    1          0
54    0          1

_x000D_

User · Answer

What I do is  I replace values  Like this- df  col   replace to replace   category 1    category 2    category 3    value  1  2  3   inplace True   In this way  if the col column has categorical values  they get replaced by the numerical values

User · Answer

Answers here seem outdated  Pandas now has a factorize   function and you can create categories as  df col factorize     Function signature  pandas factorize values  sort False  na sentinel - 1  size hint None

User · Answer

For converting categorical data in column C of dataset data  we need to do the following   from sklearn preprocessing import LabelEncoder  labelencoder  LabelEncoder    initializing an object of class LabelEncoder data  C     labelencoder fit transform data  C     fitting and transforming the desired categorical column

User · Answer

Here multiple columns need to be converted. So, one approach i used is ..

for col_name in df.columns:
    if(df[col_name].dtype == 'object'):
        df[col_name]= df[col_name].astype('category')
        df[col_name] = df[col_name].cat.codes

This converts all string / object type columns to categorical. Then applies codes to each type of category.

[python] Convert categorical data in pandas dataframe

The answer is

Examples related to python

Examples related to pandas

Tags