[scikit-learn] sklearn: Found arrays with inconsistent numbers of samples when calling LinearRegression.fit()

Just trying to do a simple linear regression but I'm baffled by this error for:

regr = LinearRegression()
regr.fit(df2.iloc[1:1000, 5].values, df2.iloc[1:1000, 2].values)

which produces:

ValueError: Found arrays with inconsistent numbers of samples: [  1 999]

These selections must have the same dimensions, and they should be numpy arrays, so what am I missing?

This question is related to scikit-learn

The answer is

I faced a similar problem. The problem in my case was, Number of rows in X was not equal to number of rows in y.

i.e. number of entries in feature columns was not equal to number of entires in target variable since I had dropped some rows from freature columns.

As it was mentioned above X argument must be a matrix or a numpy array with known dimensions. So you could probably use this:

df2.iloc[1:1000, 5:some_last_index].values

So your dataframe would be converted to an array with known dimensions and you won't need to reshape it

during train test split you might have done a mistake


The above code is correct

You might have done like below which is wrong


Looks like you are using pandas dataframe (from the name df2).

You could also do the following:

regr = LinearRegression()
regr.fit(df2.iloc[1:1000, 5].to_frame(), df2.iloc[1:1000, 2].to_frame())

NOTE: I have removed "values" as that converts the pandas Series to numpy.ndarray and numpy.ndarray does not have attribute to_frame().

To analyze two arrays (array1 and array2) they need to meet the following two requirements:

1) They need to be a numpy.ndarray

Check with

# and

If that is not the case for at least one of them perform

array1 = numpy.ndarray(array1)
# or
array2 = numpy.ndarray(array2)

2) The dimensions need to be as follows:

array1.shape #shall give (N, 1)
array2.shape #shall give (N,)

N is the number of items that are in the array. To provide array1 with the right number of axes perform:

array1 = array1[:, numpy.newaxis]

I think the "X" argument of regr.fit needs to be a matrix, so the following should work.

regr = LinearRegression()
regr.fit(df2.iloc[1:1000, [5]].values, df2.iloc[1:1000, 2].values)

Seen on the Udacity deep learning foundation course:

df = pd.read_csv('my.csv')
regr = LinearRegression()
regr.fit(df[['column x']], df[['column y']])

I encountered this error because I converted my data to an np.array. I fixed the problem by converting my data to an np.matrix instead and taking the transpose.

ValueError: regr.fit(np.array(x_list), np.array(y_list))

Correct: regr.fit(np.transpose(np.matrix(x_list)), np.transpose(np.matrix(y_list)))

Some days I faced the same problem. Reason was different sizes arrays.

expects X(feature matrix)

Try to put your features in a tuple like this:

features = ['TV', 'Radio', 'Newspaper']
X = data[features]