Pandas get topmost n records within each group

Question

Suppose I have pandas DataFrame like this    gt  gt  gt  df   pd DataFrame   id   1 1 1 2 2 2 2 3 4   value   1 2 3 1 2 3 4 1 1     gt  gt  gt  df    id  value 0   1      1 1   1      2 2   1      3 3   2      1 4   2      2 5   2      3 6   2      4 7   3      1 8   4      1   I want to get a new DataFrame with top 2 records for each id  like this      id  value 0   1      1 1   1      2 3   2      1 4   2      2 7   3      1 8   4      1   I can do it with numbering records within group after group by    gt  gt  gt  dfN   df groupby  id   apply lambda x x  value   reset index    reset index    gt  gt  gt  dfN    id  level 1  index  value 0   1        0      0      1 1   1        1      1      2 2   1        2      2      3 3   2        0      3      1 4   2        1      4      2 5   2        2      5      3 6   2        3      6      4 7   3        0      7      1 8   4        0      8      1  gt  gt  gt  dfN dfN  level 1    lt   1    id    value       id  value 0   1      1 1   1      2 3   2      1 4   2      2 7   3      1 8   4      1   But is there more effective elegant approach to do this  And also is there more elegant approach to number records within each group  like SQL window function row number

User · Answer

Sometimes sorting the whole data ahead is very time consuming   We can groupby first and doing topk for each group   g   df groupby   id    apply lambda x  x nlargest topk   value     reset index drop True

User · Answer

Since 0 14 1  you can now do nlargest and nsmallest on a groupby object   In  23   df groupby  id    value   nlargest 2  Out 23    id    1   2    3     1    2 2   6    4     5    3 3   7    1 4   8    1 dtype  int64   There s a slight weirdness that you get the original index in there as well  but this might be really useful depending on what your original index was   If you re not interested in it  you can do  reset index level 1  drop True  to get rid of it altogether    Note  From 0 17 1 you ll be able to do this on a DataFrameGroupBy too but for now it only works with Series and SeriesGroupBy

User · Answer

Did you try df groupby  id   head 2   Ouput generated     gt  gt  gt  df groupby  id   head 2         id  value id              1  0   1      1    1   1      2  2  3   2      1    4   2      2 3  7   3      1 4  8   4      1    Keep in mind that you might need to order sort before  depending on your data   EDIT  As mentioned by the questioner  use df groupby  id   head 2  reset index drop True  to remove the multindex and flatten the results    gt  gt  gt  df groupby  id   head 2  reset index drop True      id  value 0   1      1 1   1      2 2   2      1 3   2      2 4   3      1 5   4      1

[python] Pandas get topmost n records within each group

Examples related to python

Examples related to pandas

Examples related to greatest-n-per-group

Examples related to window-functions

Examples related to top-n