Accuracy Score ValueError Can t Handle mix of binary and continuous target

Question

I m using linear model LinearRegression from scikit-learn as a predictive model  It works and it s perfect  I have a problem to evaluate the predicted results using the accuracy score metric  This is my true Data   array  1  1  0  0  0  0  1  1  0  0  1  1  0  0  0  0  0    My predicted Data  array   0 07094605   0 1994941    0 19270157   0 13379635   0 04654469      0 09212494   0 19952108   0 12884365   0 15685076  -0 01274453      0 32167554   0 32167554  -0 10023553   0 09819648  -0 06755516      0 25390082   0 17248324    My code  accuracy score y true  y pred  normalize False   Error message  ValueError  Can t handle mix of binary and continuous target

User · Answer

The sklearn metrics accuracy score y true  y pred  method defines y pred as    y pred   1d array-like  or label indicator array   sparse matrix   Predicted labels  as returned by a classifier   Which means y pred has to be an array of 1 s or 0 s  predicated labels   They should not be probabilities   The predicated labels  1 s and 0 s  and or predicted probabilites can be generated using the LinearRegression   model s methods predict   and predict proba   respectively   1  Generate predicted labels   LR   linear model LinearRegression   y preds LR predict X test  print y preds    output     1 1 0 1    y preds can now be used for the accuracy score   method  accuracy score y true  y pred   2  Generate probabilities for labels   Some metrics such as  precision recall curve y true  probas pred   require probabilities  which can be generated as follows   LR   linear model LinearRegression   y preds LR predict proba X test  print y preds    output    0 87812372 0 77490434 0 30319547 0 84999743

User · Answer

accuracy score is a classification metric  you cannot use it for a regression problem   You can see the available regression metrics here

User · Answer

The error is because difference in datatypes of y pred and y true  y true might be dataframe and y pred is arraylist  If you convert both to arrays  then issue will get resolved

User · Answer

Despite the plethora of wrong answers here that attempt to circumvent the error by numerically manipulating the predictions  the root cause of your error is a theoretical and not computational issue  you are trying to use a classification metric  accuracy  in a regression  i e  numeric prediction  model  LinearRegression   which is meaningless    Just like the majority of performance metrics  accuracy compares apples to apples  i e true labels of 0 1 with predictions again of 0 1   so  when you ask the function to compare binary true labels  apples  with continuous predictions  oranges   you get an expected error  where the message tells you exactly what the problem is from a computational point of view   Classification metrics can t handle a mix of binary and continuous target   Despite that the message doesn t tell you directly that you are trying to compute a metric that is invalid for your problem  and we shouldn t actually expect it to go that far   it is certainly a good thing that scikit-learn at least gives you a direct and explicit warning that you are attempting something wrong  this is not necessarily the case with other frameworks - see for example the behavior of Keras in a very similar situation  where you get no warning at all  and one just ends up complaining for low  accuracy  in a regression setting     I am super-surprised with all the other answers here  including the accepted  amp  highly upvoted one  effectively suggesting to manipulate the predictions in order to simply get rid of the error  it s true that  once we end up with a set of numbers  we can certainly start mingling with them in various ways  rounding  thresholding etc  in order to make our code behave  but this of course does not mean that our numeric manipulations are meaningful in the specific context of the ML problem we are trying to solve   So  to wrap up  the problem is that you are applying a metric  accuracy  that is inappropriate for your model  LinearRegression   if you are in a classification setting  you should change your model  e g  use LogisticRegression instead   if you are in a regression  i e  numeric prediction  setting  you should change the metric  Check the list of metrics available in scikit-learn  where you can confirm that accuracy is used only in classification   Compare also the situation with a recent SO question  where the OP is trying to get the accuracy of a list of models   models      models append   SVM   svm SVC     models append   LR   LogisticRegression     models append   LDA   LinearDiscriminantAnalysis     models append   KNN   KNeighborsClassifier     models append   CART   DecisionTreeClassifier     models append   NB   GaussianNB      models append   SGDRegressor   linear model SGDRegressor      ValueError  Classification metrics can t handle a mix of binary and continuous targets  models append   BayesianRidge   linear model BayesianRidge      ValueError  Classification metrics can t handle a mix of binary and continuous targets  models append   LassoLars   linear model LassoLars      ValueError  Classification metrics can t handle a mix of binary and continuous targets  models append   ARDRegression   linear model ARDRegression      ValueError  Classification metrics can t handle a mix of binary and continuous targets  models append   PassiveAggressiveRegressor   linear model PassiveAggressiveRegressor      ValueError  Classification metrics can t handle a mix of binary and continuous targets  models append   TheilSenRegressor   linear model TheilSenRegressor      ValueError  Classification metrics can t handle a mix of binary and continuous targets  models append   LinearRegression   linear model LinearRegression      ValueError  Classification metrics can t handle a mix of binary and continuous targets   where the first 6 models work OK  while all the rest  commented-out  ones give the same error  By now  you should be able to convince yourself that all the commented-out models are regression  and not classification  ones  hence the justified error   A last important note  it may sound legitimate for someone to claim      OK  but I want to use linear regression and then just   round threshold the outputs  effectively treating the predictions as    probabilities  and thus converting the model into a classifier   Actually  this has already been suggested in several other answers here  implicitly or not  again  this is an invalid approach  and the fact that you have negative predictions should have already alerted you that they cannot be interpreted as probabilities   Andrew Ng  in his popular Machine Learning course at Coursera  explains why this is a bad idea - see his Lecture 6 1 - Logistic Regression   Classification at Youtube  explanation starts at   3 00   as well as section 4 2 Why Not Linear Regression  for classification   of the  highly recommended and freely available  textbook An Introduction to Statistical Learning by Hastie  Tibshirani and coworkers

User · Answer

The problem is that the true y is binary  zeros and ones   while your predictions are not  You probably generated probabilities and not predictions  hence the result    Try instead to generate class membership  and it should work

User · Answer

I was facing the same issue The dtypes of y test and y pred were different  Make sure that the dtypes are same for both

User · Answer

EDIT  after comment   the below will solve the coding issue  but is highly not recommended to use this approach because a linear regression model is a very poor classifier  which will very likely not separate the classes correctly    Read the well written answer below by  desertnaut  explaining why this error is an hint of something wrong in the machine learning approach rather than something you have to  fix    accuracy score y true  y pred round    normalize False

User · Answer

Just use  y pred    y pred  gt  0 5  accuracy score y true  y pred  normalize False

[python] Accuracy Score ValueError: Can't Handle mix of binary and continuous target

Examples related to python

Examples related to machine-learning

Examples related to scikit-learn

Examples related to linear-regression

Examples related to prediction