why should I make a copy of a data frame in pandas

Question

When selecting a sub dataframe from a parent dataframe  I noticed that some programmers make a copy of the data frame using the  copy   method  For example  X   my dataframe features list  copy       instead of just X   my dataframe features list   Why are they making a copy of the data frame  What will happen if I don t make a copy

User · Answer

Assumed you have data frame as below  df1      A    B    C    D 4 -1 0 -1 0 -1 0 -1 0 5 -1 0 -1 0 -1 0 -1 0 6 -1 0 -1 0 -1 0 -1 0 6 -1 0 -1 0 -1 0 -1 0   When you would like create another df2 which is identical to df1  without copy   df2 df1 df2      A    B    C    D 4 -1 0 -1 0 -1 0 -1 0 5 -1 0 -1 0 -1 0 -1 0 6 -1 0 -1 0 -1 0 -1 0 6 -1 0 -1 0 -1 0 -1 0   And would like modify the df2 value only as below   df2 iloc 0 0   changed   df2          A    B    C    D 4  changed -1 0 -1 0 -1 0 5       -1 -1 0 -1 0 -1 0 6       -1 -1 0 -1 0 -1 0 6       -1 -1 0 -1 0 -1 0   At the same time the df1 is changed as well  df1          A    B    C    D 4  changed -1 0 -1 0 -1 0 5       -1 -1 0 -1 0 -1 0 6       -1 -1 0 -1 0 -1 0 6       -1 -1 0 -1 0 -1 0   Since two df as same object  we can check it by using the id    id df1  140367679979600 id df2  140367679979600   So they as same object and one change another one will pass the same value as well      If we add the copy  and now df1 and df2 are considered as different object  if we do the same change to one of them the other will not change   df2 df1 copy   id df1  140367679979600 id df2  140367674641232  df1 iloc 0 0   changedback  df2          A    B    C    D 4  changed -1 0 -1 0 -1 0 5       -1 -1 0 -1 0 -1 0 6       -1 -1 0 -1 0 -1 0 6       -1 -1 0 -1 0 -1 0     Good to mention  when you subset the original dataframe  it is safe to add the copy as well in order to avoid the SettingWithCopyWarning

User · Answer

The primary purpose is to avoid chained indexing and eliminate the SettingWithCopyWarning   Here chained indexing is something like dfc  A   0    111  The document said chained indexing should be avoided in Returning a view versus a copy  Here is a slightly modified example from that document   In  1   import pandas as pd  In  2   dfc   pd DataFrame   A    aaa   bbb   ccc    B   1 2 3     In  3   dfc Out 3       A   B 0   aaa 1 1   bbb 2 2   ccc 3  In  4   aColumn   dfc  A    In  5   aColumn 0    111 SettingWithCopyWarning   A value is trying to be set on a copy of a slice from a DataFrame  In  6   dfc Out 6       A   B 0   111 1 1   bbb 2 2   ccc 3   Here the aColumn is a view and not a copy from the original DataFrame  so modifying aColumn will cause the original dfc be modified too  Next  if we index the row first   In  7   zero row   dfc loc 0   In  8   zero row  A     222 SettingWithCopyWarning   A value is trying to be set on a copy of a slice from a DataFrame  In  9   dfc Out 9       A   B 0   111 1 1   bbb 2 2   ccc 3   This time zero row is a copy  so the original dfc is not modified    From these two examples above  we see it s ambiguous whether or not you want to change the original DataFrame  This is especially dangerous if you write something like the following   In  10   dfc loc 0   A     333 SettingWithCopyWarning   A value is trying to be set on a copy of a slice from a DataFrame  In  11   dfc Out 11       A   B 0   111 1 1   bbb 2 2   ccc 3   This time it didn t work at all  Here we wanted to change dfc  but we actually modified an intermediate value dfc loc 0  that is a copy and is discarded immediately  It   s very hard to predict whether the intermediate value like dfc loc 0  or dfc  A   is a view or a copy  so it s not guaranteed whether or not original DataFrame will be updated  That s why chained indexing should be avoided  and pandas generates the SettingWithCopyWarning for this kind of chained indexing update   Now is the use of  copy    To eliminate the warning  make a copy to express your intention explicitly   In  12   zero row copy   dfc loc 0  copy    In  13   zero row copy  A     444   This time no warning   Since you are modifying a copy  you know the original dfc will never change and you are not expecting it to change  Your expectation matches the behavior  then the SettingWithCopyWarning disappears   Note  If you do want to modify the original DataFrame  the document suggests you use loc   In  14   dfc loc 0  A     555  In  15   dfc Out 15       A   B 0   555 1 1   bbb 2 2   ccc 3

User · Answer

It s necessary to mention that returning copy or view depends on kind of indexing   The pandas documentation says      Returning a view versus a copy      The rules about when a view on the data is returned are entirely   dependent on NumPy  Whenever an array of labels or a boolean vector   are involved in the indexing operation  the result will be a copy    With single label   scalar indexing and slicing  e g  df ix 3 6  or   df ix     A    a view will be returned

User · Answer

This expands on Paul s answer  In Pandas  indexing a DataFrame returns a reference to the initial DataFrame  Thus  changing the subset will change the initial DataFrame  Thus  you d want to use the copy if you want to make sure the initial DataFrame shouldn t change  Consider the following code   df   DataFrame   x    1 2    df sub   df 0 1  df sub x   -1 print df    You ll get    x 0 -1 1  2   In contrast  the following leaves df unchanged   df sub copy   df 0 1  copy   df sub copy x   -1

User · Answer

Because if you don t make a copy then the indices can still be manipulated elsewhere even if you assign the dataFrame to a different name   For example   df2   df func1 df2  func2 df    func1 can modify df by modifying df2  so to avoid that   df2   df copy   func1 df2  func2 df

User · Answer

In general it is safer to work on copies than on original data frames  except when you know that you won t be needing the original anymore and want to proceed with the manipulated version  Normally  you would still have some use for the original data frame to compare with the manipulated version  etc  Therefore  most people work on copies and merge at the end

[python] why should I make a copy of a data frame in pandas

Examples related to python

Examples related to pandas

Examples related to chained-assignment