Pandas dataframe get first row of each group

Question

I have a pandas DataFrame like following   df   pd DataFrame   id     1 1 1 2 2 3 3 3 3 4 4 5 6 6 6 7 7                    value       first   second   second   first                                second   first   third   fourth                                fifth   second   fifth   first                                first   second   third   fourth   fifth       I want to group this by   id   value   and get the first row of each group           id   value 0        1   first 1        1  second 2        1  second 3        2   first 4        2  second 5        3   first 6        3   third 7        3  fourth 8        3   fifth 9        4  second 10       4   fifth 11       5   first 12       6   first 13       6  second 14       6   third 15       7  fourth 16       7   fifth   Expected outcome      id   value      1   first      2   first      3   first      4  second      5  first      6  first      7  fourth   I tried following which only gives the first row of the DataFrame  Any help regarding this is appreciated   In  25   for index  row in df iterrows                 df2   pd DataFrame df groupby   id   value    reset index   ix 0

User · Answer

maybe this is what you want  import pandas as pd idx   pd MultiIndex from product    state1   state2        county1   county2   county3   county4     df   pd DataFrame   pop    12 15 65 42 78 67 55 31    index idx                     pop state1 county1   12        county2   15        county3   65        county4   42 state2 county1   78        county2   67        county3   55        county4   31    df groupby level 0  group keys False  apply lambda x  x sort values  pop   ascending False   groupby level 0  head 3    gt  Out 29                    pop state1 county3   65        county4   42        county2   15 state2 county1   78        county2   67        county3   55

User · Answer

I d suggest to use  nth 0  rather than  first   if you need to get the first row    The difference between them is how they handle NaNs  so  nth 0  will return the first row of group no matter what are the values in this row  while  first   will eventually return the first not NaN value in each column   E g   if your dataset is    df   pd DataFrame   id     1 1 1 2 2 3 3 3 3 4 4                value       first   second   third   np NaN                           second   first   second   third                            fourth   first   second       gt  gt  gt  df groupby  id   nth 0      value id         1    first 2    NaN 3    first 4    first   And   gt  gt  gt  df groupby  id   first       value id         1    first 2    second 3    first 4    first

User · Answer

If you only need the first row from each group we can do with drop duplicates  Notice the function default method keep  first     df drop duplicates  id   Out 1027        id   value 0    1   first 3    2   first 5    3   first 9    4  second 11   5   first 12   6   first 15   7  fourth

User · Answer

gt  gt  gt  df groupby  id   first        value id         1    first 2    first 3    first 4   second 5    first 6    first 7   fourth   If you need id as column    gt  gt  gt  df groupby  id   first   reset index      id   value 0   1   first 1   2   first 2   3   first 3   4  second 4   5   first 5   6   first 6   7  fourth   To get n first records  you can use head      gt  gt  gt  df groupby  id   head 2  reset index drop True      id   value 0    1   first 1    1  second 2    2   first 3    2  second 4    3   first 5    3   third 6    4  second 7    4   fifth 8    5   first 9    6   first 10   6  second 11   7  fourth 12   7   fifth

User · Answer

This will give you the second row of each group  zero indexed  nth 0  is the same as first      df groupby  id   nth 1     Documentation  http   pandas pydata org pandas-docs stable groupby html taking-the-nth-row-of-each-group

[python] Pandas dataframe get first row of each group

Examples related to python

Examples related to pandas

Examples related to dataframe