Selecting excluding sets of columns in pandas

Question

I would like to create views or dataframes from an existing dataframe based on column selections   For example  I would like to create a dataframe df2 from a dataframe df1 that holds all columns from it except two of them  I tried doing the following  but it didn t work   import numpy as np import pandas as pd    Create a dataframe with columns A B C and D df   pd DataFrame np random randn 100  4   columns list  ABCD       Try to create a second dataframe df2 from df with all columns except  B  and D my cols   set df columns  my cols remove  B   remove  D      This returns an error   unhashable type  set   df2   df my cols    What am I doing wrong  Perhaps more generally  what mechanisms does pandas have to support the picking and exclusions of arbitrary sets of columns from a dataframe

User · Answer

In a similar vein  when reading a file  one may wish to exclude columns upfront  rather than wastefully reading unwanted data into memory and later discarding them   As of pandas 0 20 0  usecols now accepts callables 1  This update allows more flexible options for reading columns   skipcols         read csv      usecols lambda x  x not in skipcols    The latter pattern is essentially the inverse of the traditional usecols method - only specified columns are skipped     Given  Data in a file  import numpy as np import pandas as pd   df   pd DataFrame np random randn 100  4   columns list  ABCD     filename    foo csv  df to csv filename    Code  skipcols     B    D   df1   pd read csv filename  usecols lambda x  x not in skipcols  index col 0  df1   Output            A         C 0  0 062350  0 076924 1 -0 016872  1 091446 2  0 213050  1 646109 3 -1 196928  1 153497 4 -0 628839 -0 856529       Details  A DataFrame was written to a file   It was then read back as a separate DataFrame  now skipping unwanted columns  B and D      Note that for the OP s situation  since data is already created  the better approach is the accepted answer  which drops unwanted columns from an extant object   However  the technique presented here is most useful when directly reading data from files into a DataFrame   A request was raised for a  skipcols  option in this issue and was addressed in a later issue

User · Answer

Another option  without dropping or filtering in a loop  import numpy as np import pandas as pd    Create a dataframe with columns A B C and D df   pd DataFrame np random randn 100  4   columns list  ABCD       include the columns you want df df columns df columns isin   A    B         or more simply include columns  df   A    B       exclude columns you don t want df df columns  df columns isin   C   D         or even simpler since 0 24   with the caveat that it reorders columns alphabetically  df df columns difference   C    D

User · Answer

Here s how to create a copy of a DataFrame excluding a list of columns   df   pd DataFrame np random randn 100  4   columns list  ABCD    df2   df drop   B    D    axis 1    But be careful  You mention views in your question  suggesting that if you changed df  you d want df2 to change too   Like a view would in a database    This method doesn t achieve that    gt  gt  gt  df loc 0   A     999   Change the first value in df  gt  gt  gt  df head 1       A         B         C         D 0  999 -0 742688 -1 980673 -0 920133  gt  gt  gt  df2 head 1    df2 is unchanged  It s not a view  it s a copy            A         C 0  0 251262 -1 980673   Note also that this is also true of  piggybox s method   Although that method is nice and slick and Pythonic  I m not doing it down     For more on views vs  copies see this SO answer and this part of the Pandas docs which that answer refers to

User · Answer

You have 4 columns A B C D  Here is a better way to select the columns you need for the new dataframe -  df2   df1   A   D      if you wish to use column numbers instead  use -  df2   df1  0 3

User · Answer

You can either Drop the columns you do not need OR Select the ones you need    Using DataFrame drop df drop df columns  1  2    axis 1  inplace True     drop by Name df1   df1 drop   B    C    axis 1     Select the ones you want df1   df   a   d

User · Answer

You don t really need to convert that into a set   cols    col for col in df columns if col not in   B    D    df2   df cols

User · Answer

You just need to convert your set to a list  import pandas as pd df   pd DataFrame np random randn 100  4   columns list  ABCD    my cols   set df columns  my cols remove  B   my cols remove  D   my cols   list my cols  df2   df my cols

User · Answer

Also have a look into the built-in DataFrame filter function   Minimalistic but greedy approach  sufficient for the given df     df filter regex    BD      Conservative lazy approach  exact matches only    df filter regex       B D           Conservative and generic   exclude cols     B   C   df filter regex        0         format     join exclude cols

User · Answer

There is a new index method called difference  It returns the original columns  with the columns passed as argument removed    Here  the result is used to remove columns B and D from df   df2   df df columns difference   B    D       Note that it s a set-based method  so duplicate column names will cause issues  and the column order may be changed     Advantage over drop  you don t create a copy of the entire dataframe when you only need the list of columns  For instance  in order to drop duplicates on a subset of columns     may create a copy of the dataframe subset   df drop   B    D    axis 1  columns    does not create a copy the dataframe subset   df columns difference   B    D     df   df drop duplicates subset subset

[python] Selecting/excluding sets of columns in pandas

Examples related to python

Examples related to pandas

Examples related to dataframe