Update a dataframe in pandas while iterating row by row

Question

I have a pandas data frame that looks like this  its a pretty big one              date      exer exp     ifor         mat   1092  2014-03-17  American   M  528 205  2014-04-19  1093  2014-03-17  American   M  528 205  2014-04-19  1094  2014-03-17  American   M  528 205  2014-04-19  1095  2014-03-17  American   M  528 205  2014-04-19     1096  2014-03-17  American   M  528 205  2014-05-17    now I would like to iterate row by row and as I go through each row  the value of ifor in each row can change depending on some conditions and I need to lookup another dataframe   Now  how do I update this as I iterate  Tried a few things none of them worked   for i  row in df iterrows        if  lt something gt           row  ifor     x     else          row  ifor     y      df ix i   ifor     x   None of these approaches seem to work  I don t see the values updated in the dataframe

User · Answer

You can assign values in the loop using df.set_value:

for i, row in df.iterrows():
    ifor_val = something
    if <condition>:
        ifor_val = something_else
    df.set_value(i,'ifor',ifor_val)

If you don't need the row values you could simply iterate over the indices of df, but I kept the original for-loop in case you need the row value for something not shown here.

update

df.set_value() has been deprecated since version 0.21.0 you can use df.at() instead:

for i, row in df.iterrows():
    ifor_val = something
    if <condition>:
        ifor_val = something_else
    df.at[i,'ifor'] = ifor_val

User · Answer

Pandas DataFrame object should be thought of as a Series of Series   In other words  you should think of it in terms of columns   The reason why this is important is because when you use pd DataFrame iterrows you are iterating through rows as Series   But these are not the Series that the data frame is storing and so they are new Series that are created for you while you iterate   That implies that when you attempt to assign tho them  those edits won t end up reflected in the original data frame   Ok  now that that is out of the way   What do we do   Suggestions prior to this post include    pd DataFrame set value is deprecated as of Pandas version 0 21 pd DataFrame ix is deprecated pd DataFrame loc is fine but can work on array indexers and you can do better   My recommendation Use pd DataFrame at  for i in df index      if  lt something gt           df at i   ifor     x     else          df at i   ifor     y   You can even change this to   for i in df index      df at i   ifor     x if  lt something gt  else y     Response to comment     and what if I need to use the value of the previous row for the if condition     for i in range 1  len df    1       j   df columns get loc  ifor       if  lt something gt           df iat i - 1  j    x     else          df iat i - 1  j    y

User · Answer

for i  row in df iterrows        if  lt something gt           df at i   ifor     x     else          df at i   ifor     y

User · Answer

It s better to use lambda functions using df apply   -   df  ifor     df apply lambda x   value  if  condition  else x  ifor    axis 1

User · Answer

A method you can use is itertuples    it iterates over DataFrame rows as namedtuples  with index value as first element of the tuple  And it is much much faster compared with iterrows    For itertuples    each row contains its Index in the DataFrame  and you can use loc to set the value    for row in df itertuples        if  lt something gt           df at row Index   ifor     x     else          df at row Index   ifor     x      df loc row Index   ifor     x   Under most cases  itertuples   is faster than iat or at   Thanks  SantiStSupery  using  at is much faster than loc

User · Answer

Well  if you are going to iterate anyhow  why don t use the simplest method of all  df  Column   values i   df  Column         for i in range len df        df  Column   values i    something update new value   Or if you want to compare the new values with old or anything like that  why not store it in a list and then append in the end   mylist  df  Column             for  lt condition gt       mylist append something update new value   df  Column     mylist

User · Answer

You should assign value by df ix i   exp   X or df loc i   exp   X instead of df ix i   ifor     x    Otherwise you are working on a view  and should get a warming   -c 1  SettingWithCopyWarning  A value is trying to be set on a copy of a slice from a DataFrame  Try using  loc row index col indexer    value instead  But certainly  loop probably should better be replaced by some vectorized algorithm to make the full use of DataFrame as  Phillip Cloud suggested

User · Answer

Increment the MAX number from a column   For Example    df1    sort ID  Column1 Column2  print df1    My output    Sort ID Column1 Column2 12         a    e 45         b    f 65         c    g 78         d    h     MAX   df1  Sort ID   max    This returns my Max Number    Now   I need to create a column in df2 and fill the column values which increments the MAX    Sort ID Column1 Column2 79      a1       e1 80      b1       f1 81      c1       g1 82      d1       h1     Note   df2 will initially contain only the Column1 and Column2   we need the Sortid column to be created and incremental of the MAX from df1

[python] Update a dataframe in pandas while iterating row by row

Examples related to python

Examples related to pandas

Examples related to updates

Examples related to dataframe