For best performance I recommend doing DataFrame.drop_duplicates
followed up aggfunc='count'
.
Others are correct that aggfunc=pd.Series.nunique
will work. This can be slow, however, if the number of index
groups you have is large (>1000).
So instead of (to quote @Javier)
df2.pivot_table('X', 'Y', 'Z', aggfunc=pd.Series.nunique)
I suggest
df2.drop_duplicates(['X', 'Y', 'Z']).pivot_table('X', 'Y', 'Z', aggfunc='count')
This works because it guarantees that every subgroup (each combination of ('Y', 'Z')
) will have unique (non-duplicate) values of 'X'
.