[python] Pandas concat: ValueError: Shape of passed values is blah, indices imply blah2

I'm trying to merge a (Pandas 14.1) dataframe and a series. The series should form a new column, with some NAs (since the index values of the series are a subset of the index values of the dataframe).

This works for a toy example, but not with my data (detailed below).

Example:

import pandas as pd
import numpy as np

df1 = pd.DataFrame(np.random.randn(6, 4), columns=['A', 'B', 'C', 'D'], index=pd.date_range('1/1/2011', periods=6, freq='D'))
df1

A   B   C   D
2011-01-01  -0.487926   0.439190    0.194810    0.333896
2011-01-02  1.708024    0.237587    -0.958100   1.418285
2011-01-03  -1.228805   1.266068    -1.755050   -1.476395
2011-01-04  -0.554705   1.342504    0.245934    0.955521
2011-01-05  -0.351260   -0.798270   0.820535    -0.597322
2011-01-06  0.132924    0.501027    -1.139487   1.107873

s1 = pd.Series(np.random.randn(3), name='foo', index=pd.date_range('1/1/2011', periods=3, freq='2D'))
s1

2011-01-01   -1.660578
2011-01-03   -0.209688
2011-01-05    0.546146
Freq: 2D, Name: foo, dtype: float64

pd.concat([df1, s1],axis=1)

A   B   C   D   foo
2011-01-01  -0.487926   0.439190    0.194810    0.333896    -1.660578
2011-01-02  1.708024    0.237587    -0.958100   1.418285    NaN
2011-01-03  -1.228805   1.266068    -1.755050   -1.476395   -0.209688
2011-01-04  -0.554705   1.342504    0.245934    0.955521    NaN
2011-01-05  -0.351260   -0.798270   0.820535    -0.597322   0.546146
2011-01-06  0.132924    0.501027    -1.139487   1.107873    NaN

The situation with the data (see below) seems basically identical - concatting a series with a DatetimeIndex whose values are a subset of the dataframe's. But it gives the ValueError in the title (blah1 = (5, 286) blah2 = (5, 276) ). Why doesn't it work?:

In[187]: df.head()
Out[188]:
high    low loc_h   loc_l
time                
2014-01-01 17:00:00 1.376235    1.375945    1.376235    1.375945
2014-01-01 17:01:00 1.376005    1.375775    NaN NaN
2014-01-01 17:02:00 1.375795    1.375445    NaN 1.375445
2014-01-01 17:03:00 1.375625    1.375515    NaN NaN
2014-01-01 17:04:00 1.375585    1.375585    NaN NaN
In [186]: df.index
Out[186]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2014-01-01 17:00:00, ..., 2014-01-01 21:30:00]
Length: 271, Freq: None, Timezone: None

In [189]: hl.head()
Out[189]:
2014-01-01 17:00:00    1.376090
2014-01-01 17:02:00    1.375445
2014-01-01 17:05:00    1.376195
2014-01-01 17:10:00    1.375385
2014-01-01 17:12:00    1.376115
dtype: float64

In [187]:hl.index
Out[187]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2014-01-01 17:00:00, ..., 2014-01-01 21:30:00]
Length: 89, Freq: None, Timezone: None

In: pd.concat([df, hl], axis=1)
Out: [stack trace] ValueError: Shape of passed values is (5, 286), indices imply (5, 276)

This question is related to python pandas

The answer is


I had a similar problem (join worked, but concat failed).

Check for duplicate index values in df1 and s1, (e.g. df1.index.is_unique)

Removing duplicate index values (e.g., df.drop_duplicates(inplace=True)) or one of the methods here https://stackoverflow.com/a/34297689/7163376 should resolve it.


To drop duplicate indices, use df = df.loc[df.index.drop_duplicates()]. C.f. pandas.pydata.org/pandas-docs/stable/generated/… – BallpointBen Apr 18 at 15:25

This is wrong but I can't reply directly to BallpointBen's comment due to low reputation. The reason its wrong is that df.index.drop_duplicates() returns a list of unique indices, but when you index back into the dataframe using those the unique indices it still returns all records. I think this is likely because indexing using one of the duplicated indices will return all instances of the index.

Instead, use df.index.duplicated(), which returns a boolean list (add the ~ to get the not-duplicated records):

df = df.loc[~df.index.duplicated()]

Aus_lacy's post gave me the idea of trying related methods, of which join does work:

In [196]:

hl.name = 'hl'
Out[196]:
'hl'
In [199]:

df.join(hl).head(4)
Out[199]:
high    low loc_h   loc_l   hl
2014-01-01 17:00:00 1.376235    1.375945    1.376235    1.375945    1.376090
2014-01-01 17:01:00 1.376005    1.375775    NaN NaN NaN
2014-01-01 17:02:00 1.375795    1.375445    NaN 1.375445    1.375445
2014-01-01 17:03:00 1.375625    1.375515    NaN NaN NaN

Some insight into why concat works on the example but not this data would be nice though!


Maybe it is simple, try this if you have a DataFrame. then make sure that both matrices or vectros that you're trying to combine have the same rows_name/index

I had the same issue. I changed the name indices of the rows to make them match each other here is an example for a matrix (principal component) and a vector(target) have the same row indicies (I circled them in the blue in the leftside of the pic)

Before, "when it was not working", I had the matrix with normal row indicies (0,1,2,3) while I had the vector with row indices (ID0, ID1, ID2, ID3) then I changed the vector's row indices to (0,1,2,3) and it worked for me.

enter image description here


Your indexes probably contains duplicated values.

import pandas as pd

T1_INDEX = [
    0,
    1,  # <= !!! if I write e.g.: "0" here then it fails
    0.2,
]
T1_COLUMNS = [
    'A', 'B', 'C', 'D'
]
T1 = [
    [1.0, 1.1, 1.2, 1.3],
    [2.0, 2.1, 2.2, 2.3],
    [3.0, 3.1, 3.2, 3.3],
]

T2_INDEX = [
    1.2,
    2.11,
]

T2_COLUMNS = [
    'D', 'E', 'F',
]
T2 = [
    [54.0, 5324.1, 3234.2],
    [55.0, 14.5324, 2324.2],
    # [3.0, 3.1, 3.2],
]
df1 = pd.DataFrame(T1, columns=T1_COLUMNS, index=T1_INDEX)
df2 = pd.DataFrame(T2, columns=T2_COLUMNS, index=T2_INDEX)


print(pd.concat([pd.DataFrame({})] + [df2, df1], axis=1))

My problem were different indices, the following code solved my problem.

df1.reset_index(drop=True, inplace=True)
df2.reset_index(drop=True, inplace=True)
df = pd.concat([df1, df2], axis=1)

Try sorting index after concatenating them

result=pd.concat([df1,df2]).sort_index()