You may not pass str
to fit this kind of classifier.
For example, if you have a feature column named 'grade' which has 3 different grades:
A,B and C.
you have to transfer those str
"A","B","C" to matrix by encoder like the following:
A = [1,0,0]
B = [0,1,0]
C = [0,0,1]
because the str
does not have numerical meaning for the classifier.
In scikit-learn, OneHotEncoder
and LabelEncoder
are available in inpreprocessing
module.
However OneHotEncoder
does not support to fit_transform()
of string.
"ValueError: could not convert string to float" may happen during transform.
You may use LabelEncoder
to transfer from str
to continuous numerical values. Then you are able to transfer by OneHotEncoder
as you wish.
In the Pandas dataframe, I have to encode all the data which are categorized to dtype:object
. The following code works for me and I hope this will help you.
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
for column_name in train_data.columns:
if train_data[column_name].dtype == object:
train_data[column_name] = le.fit_transform(train_data[column_name])
else:
pass