sklearn Found arrays with inconsistent numbers of samples when calling LinearRegression fit

Question

Just trying to do a simple linear regression but I m baffled by this error for   regr   LinearRegression   regr fit df2 iloc 1 1000  5  values  df2 iloc 1 1000  2  values    which produces   ValueError  Found arrays with inconsistent numbers of samples     1 999    These selections must have the same dimensions  and they should be numpy arrays  so what am I missing

User · Answer

To analyze two arrays  array1 and array2  they need to meet the following two requirements   1  They need to be a numpy ndarray  Check with   type array1    and type array2    If that is not the case for at least one of them perform   array1   numpy ndarray array1    or array2   numpy ndarray array2    2  The dimensions need to be as follows   array1 shape  shall give  N  1  array2 shape  shall give  N     N is the number of items that are in the array  To provide array1 with the right number of axes perform   array1   array1    numpy newaxis

User · Answer

Some days I faced the same problem  Reason was different sizes arrays

User · Answer

It looks like sklearn requires the data shape of  row number  column number    If your data shape is  row number    like  999     it does not work   By using numpy reshape    you should change the shape of the array to  999  1   e g  using  data data reshape  999 1     In my case  it worked with that

User · Answer

Seen on the Udacity deep learning foundation course   df   pd read csv  my csv       regr   LinearRegression   regr fit df   column x     df   column y

User · Answer

I faced a similar problem  The problem in my case was  Number of rows in X was not equal to number of rows in y    i e  number of entries in feature columns was not equal to number of entires in target variable since I had dropped some rows from freature columns

User · Answer

I think the  X  argument of regr fit needs to be a matrix  so the following should work   regr   LinearRegression   regr fit df2 iloc 1 1000   5   values  df2 iloc 1 1000  2  values

User · Answer

during train test split you might have done a mistake  x train x test y train y test sklearn model selection train test split X Y test size    The above code is correct  You might have done like below which is wrong  x train y train x test y test sklearn model selection train test split X Y test size

User · Answer

As it was mentioned above X argument must be a matrix or a numpy array with known dimensions  So you could probably use this   df2 iloc 1 1000  5 some last index  values   So your dataframe would be converted to an array with known dimensions and you won t need to reshape it

User · Answer

expects X feature matrix    Try to put your features in a tuple like this   features     TV    Radio    Newspaper   X   data features

User · Answer

Looks like you are using pandas dataframe  from the name df2    You could also do the following   regr   LinearRegression   regr fit df2 iloc 1 1000  5  to frame    df2 iloc 1 1000  2  to frame      NOTE  I have removed  values  as that converts the pandas Series to numpy ndarray and numpy ndarray does not have attribute to frame

User · Answer

I encountered this error because I converted my data to an np array  I fixed the problem by converting my data to an np matrix instead and taking the transpose   ValueError  regr fit np array x list   np array y list    Correct   regr fit np transpose np matrix x list    np transpose np matrix y list

[scikit-learn] sklearn: Found arrays with inconsistent numbers of samples when calling LinearRegression.fit()

Examples related to scikit-learn