LogisticRegression Unknown label type continuous using sklearn in python

Question

I have the following code to test some of most popular ML algorithms of sklearn python library   import numpy as np from sklearn                        import metrics  svm from sklearn linear model           import LinearRegression from sklearn linear model           import LogisticRegression from sklearn tree                   import DecisionTreeClassifier from sklearn neighbors              import KNeighborsClassifier from sklearn discriminant analysis  import LinearDiscriminantAnalysis from sklearn naive bayes            import GaussianNB from sklearn svm                    import SVC  trainingData      np array    2 3  4 3  2 5     1 3  5 2  5 2     3 3  2 9  0 8     3 1  4 3  4 0      trainingScores    np array   3 4  7 5  4 5  1 6    predictionData    np array    2 5  2 4  2 7     2 7  3 2  1 2      clf   LinearRegression   clf fit trainingData  trainingScores  print  LinearRegression   print clf predict predictionData    clf   svm SVR   clf fit trainingData  trainingScores  print  SVR   print clf predict predictionData    clf   LogisticRegression   clf fit trainingData  trainingScores  print  LogisticRegression   print clf predict predictionData    clf   DecisionTreeClassifier   clf fit trainingData  trainingScores  print  DecisionTreeClassifier   print clf predict predictionData    clf   KNeighborsClassifier   clf fit trainingData  trainingScores  print  KNeighborsClassifier   print clf predict predictionData    clf   LinearDiscriminantAnalysis   clf fit trainingData  trainingScores  print  LinearDiscriminantAnalysis   print clf predict predictionData    clf   GaussianNB   clf fit trainingData  trainingScores  print  GaussianNB   print clf predict predictionData    clf   SVC   clf fit trainingData  trainingScores  print  SVC   print clf predict predictionData     The first two works ok  but I got the following error in LogisticRegression call   root ubupc1  home ouhma  python stack py  LinearRegression   15 72023529   6 46666667  SVR   3 95570063  4 23426243  Traceback  most recent call last     File  stack py   line 28  in  lt module gt      clf fit trainingData  trainingScores    File   usr local lib python2 7 dist-packages sklearn linear model logistic py   line 1174  in fit     check classification targets y    File   usr local lib python2 7 dist-packages sklearn utils multiclass py   line 172  in check classification targets     raise ValueError  Unknown label type   r    y type  ValueError  Unknown label type   continuous    The input data is the same as in the previous calls  so what is going on here    And by the way  why there is a huge diference in the first prediction of LinearRegression   and SVR   algorithms  15 72 vs 3 95

User · Answer

I struggled with the same issue when trying to feed floats to the classifiers  I wanted to keep floats and not integers for accuracy  Try using regressor algorithms  For example   import numpy as np from sklearn import linear model from sklearn import svm  classifiers         svm SVR        linear model SGDRegressor        linear model BayesianRidge        linear model LassoLars        linear model ARDRegression        linear model PassiveAggressiveRegressor        linear model TheilSenRegressor        linear model LinearRegression     trainingData      np array    2 3  4 3  2 5     1 3  5 2  5 2     3 3  2 9  0 8     3 1  4 3  4 0      trainingScores    np array   3 4  7 5  4 5  1 6    predictionData    np array    2 5  2 4  2 7     2 7  3 2  1 2      for item in classifiers      print item      clf   item     clf fit trainingData  trainingScores      print clf predict predictionData    n

User · Answer

LogisticRegression is not for regression but classification    The Y variable must be the classification class    for example 0 or 1   And not a continuous variable   that would be a regression problem

User · Answer

You are passing floats to a classifier which expects categorical values as the target vector  If you convert it to int it will be accepted as input  although it will be questionable if that s the right way to do it     It would be better to  convert your training scores by using scikit s labelEncoder function   The same is true for your DecisionTree and KNeighbors qualifier   from sklearn import preprocessing from sklearn import utils  lab enc   preprocessing LabelEncoder   encoded   lab enc fit transform trainingScores   gt  gt  gt  array  1  3  2  0   dtype int64   print utils multiclass type of target trainingScores    gt  gt  gt  continuous  print utils multiclass type of target trainingScores astype  int      gt  gt  gt  multiclass  print utils multiclass type of target encoded    gt  gt  gt  multiclass

[python] LogisticRegression: Unknown label type: 'continuous' using sklearn in python

Examples related to python

Examples related to numpy

Examples related to scikit-learn