[python] Pandas unstack problems: ValueError: Index contains duplicate entries, cannot reshape

I am trying to unstack a multi-index with pandas and I am keep getting:

ValueError: Index contains duplicate entries, cannot reshape

Given a dataset with four columns:

  • id (string)
  • date (string)
  • location (string)
  • value (float)

I first set a three-level multi-index:

In [37]: e.set_index(['id', 'date', 'location'], inplace=True)

In [38]: e
Out[38]: 
                                    value
id           date       location       
id1          2014-12-12 loc1        16.86
             2014-12-11 loc1        17.18
             2014-12-10 loc1        17.03
             2014-12-09 loc1        17.28

Then I try to unstack the location:

In [39]: e.unstack('location')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-39-bc1e237a0ed7> in <module>()
----> 1 e.unstack('location')
...
C:\Anaconda\envs\sandbox\lib\site-packages\pandas\core\reshape.pyc in _make_selectors(self)
    143 
    144         if mask.sum() < len(self.index):
--> 145             raise ValueError('Index contains duplicate entries, '
    146                              'cannot reshape')
    147 

ValueError: Index contains duplicate entries, cannot reshape

What is going on here?

This question is related to python pandas

The answer is


I had such problem. In my case problem was in data - my column 'information' contained 1 unique value and it caused error

UPDATE: to correct work 'pivot' pairs (id_user,information) cannot have duplicates

It works:

df2 = pd.DataFrame({'id_user':[1,2,3,4,4,5,5], 
'information':['phon','phon','phone','phone1','phone','phone1','phone'], 
'value': [1, '01.01.00', '01.02.00', 2, '01.03.00', 3, '01.04.00']})
df2.pivot(index='id_user', columns='information', values='value')

it doesn't work:

df2 = pd.DataFrame({'id_user':[1,2,3,4,4,5,5], 
'information':['phone','phone','phone','phone','phone','phone','phone'], 
'value': [1, '01.01.00', '01.02.00', 2, '01.03.00', 3, '01.04.00']})
df2.pivot(index='id_user', columns='information', values='value')

source: https://stackoverflow.com/a/37021196/6088984


There's a far more simpler solution to tackle this.

The reason why you get ValueError: Index contains duplicate entries, cannot reshape is because, once you unstack "Location", then the remaining index columns "id" and "date" combinations are no longer unique.

You can avoid this by retaining the default index column (row #) and while setting the index using "id", "date" and "location", add it in "append" mode instead of the default overwrite mode.

So use,

e.set_index(['id', 'date', 'location'], append=True)

Once this is done, your index columns will still have the default index along with the set indexes. And unstack will work.

Let me know how it works out.