I need to fit RandomForestRegressor
from sklearn.ensemble
.
forest = ensemble.RandomForestRegressor(**RF_tuned_parameters)
model = forest.fit(train_fold, train_y)
yhat = model.predict(test_fold)
This code always worked until I made some preprocessing of data (train_y
).
The error message says:
DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().
model = forest.fit(train_fold, train_y)
Previously train_y
was a Series, now it's numpy array (it is a column-vector). If I apply train_y.ravel()
, then it becomes a row vector and no error message appears, through the prediction step takes very long time (actually it never finishes...).
In the docs of RandomForestRegressor
I found that train_y
should be defined as y : array-like, shape = [n_samples] or [n_samples, n_outputs]
Any idea how to solve this issue?
This question is related to
python
pandas
numpy
scikit-learn
Change this line:
model = forest.fit(train_fold, train_y)
to:
model = forest.fit(train_fold, train_y.values.ravel())
Edit:
.values
will give the values in an array. (shape: (n,1)
.ravel
will convert that array shape to (n, )
Another way of doing this is to use ravel
model = forest.fit(train_fold, train_y.values.reshape(-1,))
Y = y.values[:,0]
Y - formated_train_y
y - train_y
I had the same problem. The problem was that the labels were in a column format while it expected it in a row.
use np.ravel()
knn.score(training_set, np.ravel(training_labels))
Hope this solves it.
format_train_y=[]
for n in train_y:
format_train_y.append(n[0])
use below code:
model = forest.fit(train_fold, train_y.ravel())
if you are still getting slap by error as identical as below ?
Unknown label type: %r" % y
use this code:
y = train_y.ravel()
train_y = np.array(y).astype(int)
model = forest.fit(train_fold, train_y)
With neuraxle, you can easily solve this :
p = Pipeline([
# expected outputs shape: (n, 1)
OutputTransformerWrapper(NumpyRavel()),
# expected outputs shape: (n, )
RandomForestRegressor(**RF_tuned_parameters)
])
p, outputs = p.fit_transform(data_inputs, expected_outputs)
Neuraxle is a sklearn-like framework for hyperparameter tuning and AutoML in deep learning projects !
I also encountered this situation when I was trying to train a KNN classifier. but it seems that the warning was gone after I changed:
knn.fit(X_train,y_train)
to
knn.fit(X_train, np.ravel(y_train,order='C'))
Ahead of this line I used import numpy as np
.
Source: Stackoverflow.com