I am trying to unstack a multi-index with pandas and I am keep getting:
ValueError: Index contains duplicate entries, cannot reshape
Given a dataset with four columns:
I first set a three-level multi-index:
In [37]: e.set_index(['id', 'date', 'location'], inplace=True)
In [38]: e
Out[38]:
value
id date location
id1 2014-12-12 loc1 16.86
2014-12-11 loc1 17.18
2014-12-10 loc1 17.03
2014-12-09 loc1 17.28
Then I try to unstack the location:
In [39]: e.unstack('location')
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-39-bc1e237a0ed7> in <module>()
----> 1 e.unstack('location')
...
C:\Anaconda\envs\sandbox\lib\site-packages\pandas\core\reshape.pyc in _make_selectors(self)
143
144 if mask.sum() < len(self.index):
--> 145 raise ValueError('Index contains duplicate entries, '
146 'cannot reshape')
147
ValueError: Index contains duplicate entries, cannot reshape
What is going on here?
I had such problem. In my case problem was in data - my column 'information' contained 1 unique value and it caused error
UPDATE: to correct work 'pivot' pairs (id_user,information) cannot have duplicates
It works:
df2 = pd.DataFrame({'id_user':[1,2,3,4,4,5,5],
'information':['phon','phon','phone','phone1','phone','phone1','phone'],
'value': [1, '01.01.00', '01.02.00', 2, '01.03.00', 3, '01.04.00']})
df2.pivot(index='id_user', columns='information', values='value')
it doesn't work:
df2 = pd.DataFrame({'id_user':[1,2,3,4,4,5,5],
'information':['phone','phone','phone','phone','phone','phone','phone'],
'value': [1, '01.01.00', '01.02.00', 2, '01.03.00', 3, '01.04.00']})
df2.pivot(index='id_user', columns='information', values='value')
There's a far more simpler solution to tackle this.
The reason why you get ValueError: Index contains duplicate entries, cannot reshape
is because, once you unstack "Location
", then the remaining index columns "id
" and "date
" combinations are no longer unique.
You can avoid this by retaining the default index column (row #) and while setting the index using "id
", "date
" and "location
", add it in "append
" mode instead of the default overwrite mode.
So use,
e.set_index(['id', 'date', 'location'], append=True)
Once this is done, your index columns will still have the default index along with the set indexes. And unstack
will work.
Let me know how it works out.
Source: Stackoverflow.com