What does ValueError cannot reindex from a duplicate axis mean

Question

I am getting a ValueError  cannot reindex from a duplicate axis when I am trying to set an index to a certain value  I tried to reproduce this with a simple example  but I could not do it   Here is my session inside of ipdb trace  I have a DataFrame with string index  and integer columns  float values  However when I try to create sum index for sum of all columns I am getting ValueError  cannot reindex from a duplicate axis error  I created a small DataFrame with the same characteristics  but was not able to reproduce the problem  what could I be missing   I don t really understand what ValueError  cannot reindex from a duplicate axismeans  what does this error message mean  Maybe this will help me diagnose the problem  and this is most answerable part of my question   ipdb gt  type affinity matrix   lt class  pandas core frame DataFrame  gt  ipdb gt  affinity matrix shape  333  10  ipdb gt  affinity matrix columns Int64Index  9315684  9315597  9316591  9320520  9321163  9320615  9321187  9319487  9319467  9320484   dtype  int64   ipdb gt  affinity matrix index Index  u 001   u 002   u 003   u 004   u 005   u 008   u 009   u 010   u 011   u 014   u 015   u 016   u 018   u 020   u 021   u 022   u 024   u 025   u 026   u 027   u 028   u 029   u 030   u 032   u 033   u 034   u 035   u 036   u 039   u 040   u 041   u 042   u 043   u 044   u 045   u 047   u 047   u 048   u 050   u 053   u 054   u 055   u 056   u 057   u 058   u 059   u 060   u 061   u 062   u 063   u 065   u 067   u 068   u 069   u 070   u 071   u 072   u 073   u 074   u 075   u 076   u 077   u 078   u 080   u 082   u 083   u 084   u 085   u 086   u 089   u 090   u 091   u 092   u 093   u 094   u 095   u 096   u 097   u 098   u 100   u 101   u 103   u 104   u 105   u 106   u 107   u 108   u 109   u 110   u 111   u 112   u 113   u 114   u 115   u 116   u 117   u 118   u 119   u 121   u 122         dtype  object    ipdb gt  affinity matrix values dtype dtype  float64   ipdb gt   sums  in affinity matrix index False   Here is the error   ipdb gt  affinity matrix loc  sums     affinity matrix sum axis 0      ValueError  cannot reindex from a duplicate axis   I tried to reproduce this with a simple example  but I failed  In  32   import pandas as pd  In  33   import numpy as np  In  34   a   np arange 35  reshape 5 7   In  35   df   pd DataFrame a    x    y    u    z    w    range 10  17    In  36   df values dtype Out 36   dtype  int64    In  37   df loc  sums     df sum axis 0   In  38   df Out 38          10  11  12  13  14  15   16 x      0   1   2   3   4   5    6 y      7   8   9  10  11  12   13 u     14  15  16  17  18  19   20 z     21  22  23  24  25  26   27 w     28  29  30  31  32  33   34 sums  70  75  80  85  90  95  100

User · Answer

I wasted couple of hours on the same issue  In my case  I had to reset index   of a dataframe before using apply function  Before merging  or looking up from another indexed dataset  you need to reset the index as 1 dataset can have only 1 Index

User · Answer

Indices with duplicate values often arise if you create a DataFrame by concatenating other DataFrames  IF you don t care about preserving the values of your index  and you want them to be unique values  when you concatenate the the data  set ignore index True   Alternatively  to overwrite your current index with a new one  instead of using df reindex    set   df index   new index

User · Answer

Simply skip the error using  values at the end   affinity matrix loc  sums     affinity matrix sum axis 0  values

User · Answer

In my case  this error popped up not because of duplicate values  but because I attempted to join a shorter Series to a Dataframe  both had the same index  but the Series had fewer rows  missing the top few   The following worked for my purposes   df head                             SensA date                            2018-04-03 13 54 47 274   -0 45 2018-04-03 13 55 46 484   -0 42 2018-04-03 13 56 56 235   -0 37 2018-04-03 13 57 57 207   -0 34 2018-04-03 13 59 34 636   -0 33  series head   date 2018-04-03 14 09 36 577    62 2 2018-04-03 14 10 28 138    63 5 2018-04-03 14 11 27 400    63 1 2018-04-03 14 12 39 623    62 6 2018-04-03 14 13 27 310    62 5 Name  SensA rrT  dtype  float64  df   series to frame   combine first df   df head 10                            SensA  SensA rrT date                            2018-04-03 13 54 47 274   -0 45        NaN 2018-04-03 13 55 46 484   -0 42        NaN 2018-04-03 13 56 56 235   -0 37        NaN 2018-04-03 13 57 57 207   -0 34        NaN 2018-04-03 13 59 34 636   -0 33        NaN 2018-04-03 14 00 34 565   -0 33        NaN 2018-04-03 14 01 19 994   -0 37        NaN 2018-04-03 14 02 29 636   -0 34        NaN 2018-04-03 14 03 31 599   -0 32        NaN 2018-04-03 14 04 30 779   -0 33        NaN 2018-04-03 14 05 31 733   -0 35        NaN 2018-04-03 14 06 33 290   -0 38        NaN 2018-04-03 14 07 37 459   -0 39        NaN 2018-04-03 14 08 36 361   -0 36        NaN 2018-04-03 14 09 36 577   -0 37       62 2

User · Answer

For people who are still struggling with this error  it can also happen if you accidentally create a duplicate column with the same name   Remove duplicate columns like so   df   df loc    df columns duplicated

User · Answer

Simple Fix that Worked for Me Run df reset index inplace True  before grouping  Thank you to this github comment for the solution  Remove inplace if you want it to return the dataframe

User · Answer

I came across this error today when I wanted to add a new column like this  df temp  REMARK TYPE     df REMARK apply lambda v  1 if str v    nan  else 0    I wanted to process the REMARK column of df temp to return 1 or 0  However I typed wrong variable with df  And it returned error like this   ---- gt  1 df temp  REMARK TYPE     df REMARK apply lambda v  1 if str v    nan  else 0    usr lib64 python2 7 site-packages pandas core frame pyc in   setitem   self  key  value     2417         else     2418               set column - gt  2419             self  set item key  value     2420     2421     def  setitem slice self  key  value     usr lib64 python2 7 site-packages pandas core frame pyc in  set item self  key  value     2483     2484         self  ensure valid index value  - gt  2485         value   self  sanitize column key  value     2486         NDFrame  set item self  key  value     2487    usr lib64 python2 7 site-packages pandas core frame pyc in  sanitize column self  key  value  broadcast     2633     2634         if isinstance value  Series   - gt  2635             value   reindexer value     2636     2637         elif isinstance value  DataFrame     usr lib64 python2 7 site-packages pandas core frame pyc in reindexer value     2625                       duplicate axis    2626                     if not value index is unique  - gt  2627                         raise e    2628     2629                       other  ValueError  cannot reindex from a duplicate axis   As you can see it  the right code should be  df temp  REMARK TYPE     df temp REMARK apply lambda v  1 if str v    nan  else 0    Because df and df temp have a different number of rows  So it returned ValueError  cannot reindex from a duplicate axis   Hope you can understand it and my answer can help other people to debug their code

User · Answer

This error usually rises when you join   assign to a column when the index has duplicate values  Since you are assigning to a row  I suspect that there is a duplicate value in affinity matrix columns  perhaps not shown in your question

User · Answer

I got this error when I tried adding a column from a different table  Indeed I got duplicate index values along the way  But it turned out I was just doing it wrong  I actually needed to df join the other table  This pointer might help someone in a similar situation

User · Answer

This can also be a cause for this    I solved my problem like this  It may happen even if you are trying to insert a dataframe type column inside dataframe you can try this df  my new   pd Series my new values

User · Answer

As others have said  you ve probably got duplicate values in your original index  To find them do this   df df index duplicated

[python] What does `ValueError: cannot reindex from a duplicate axis` mean?

Examples related to python

Examples related to pandas