I have data of the following form:
df = pd.DataFrame({
'group': [1, 1, 2, 3, 3, 3, 4],
'param': ['a', 'a', 'b', np.nan, 'a', 'a', np.nan]
})
print(df)
# group param
# 0 1 a
# 1 1 a
# 2 2 b
# 3 3 NaN
# 4 3 a
# 5 3 a
# 6 4 NaN
Non-null values within groups are always the same. I want to count the non-null value for each group (where it exists) once, and then find the total counts for each value.
I'm currently doing this in the following (clunky and inefficient) way:
param = []
for _, group in df[df.param.notnull()].groupby('group'):
param.append(group.param.unique()[0])
print(pd.DataFrame({'param': param}).param.value_counts())
# a 2
# b 1
I'm sure there's a way to do this more cleanly and without using a loop, but I just can't seem to work it out. Any help would be much appreciated.
This is just an add-on to the solution in case you want to compute not only unique values but other aggregate functions:
df.groupby(['group']).agg(['min','max','count','nunique'])
Hope you find it useful
I know it has been a while since this was posted, but I think this will help too. I wanted to count unique values and filter the groups by number of these unique values, this is how I did it:
df.groupby('group').agg(['min','max','count','nunique']).reset_index(drop=False)
Source: Stackoverflow.com