Delete column from pandas DataFrame

Question

When deleting a column in a DataFrame I use   del df  column name     And this works great  Why can t I use the following   del df column name   Since it is possible to access the column Series as df column name  I expected this to work

User · Answer

Deleting a column using iloc function of dataframe and slicing  when we have a typical column name with unwanted values  df   df iloc   1     removing an unnamed index column  Here 0 is the default row and 1 is 1st column so  1 where starts and stepping is taking default values  hence   1  is our parameter for deleting the first column

User · Answer

It s good practice to always use the    notation  One reason is that attribute notation  df column name  does not work for numbered indices   In  1   df   DataFrame   1  2  3    4  5  6     In  2   df 1  Out 2   0    2 1    5 Name  1  In  3   df 1   File   lt ipython-input-3-e4803c0d1066 gt    line 1     df 1          SyntaxError  invalid syntax

User · Answer

We can Remove or Delete a specified column or sprcified columns by drop   method    Suppose df is a dataframe    Column to be removed   column0     Code    df   df drop column0  axis 1    To remove multiple columns col1  col2          coln  we have to insert all the columns that needed to be removed in a list  Then remove them by drop   method      Code    df   df drop  col1  col2          coln   axis 1    I hope it would be helpful

User · Answer

df drop  columnname   axis  1  inplace   True   or else you can go with del df  colname    To delete multiple columns based on column numbers df drop df iloc   1 3   axis   1  inplace   True   To delete multiple columns based on columns names df drop   col1   col2     coln    axis   1  inplace   True

User · Answer

As you ve guessed  the right syntax is   del df  column name     It s difficult to make del df column name work simply as the result of syntactic limitations in Python  del df name  gets translated to df   delitem   name  under the covers by Python

User · Answer

If your original dataframe df is not too big  you have no memory constraints  and you only need to keep a few columns  or  if you don t know beforehand the names of all the extra columns that you do not need  then you might as well create a new dataframe with only the columns you need  new df   df   spam    sausage

User · Answer

TL DR  A lot of effort to find a marginally more efficient solution   Difficult to justify the added complexity while sacrificing the simplicity of df drop dlst  1  errors  ignore    df reindex axis np setdiff1d df columns values  dlst   1    Preamble Deleting a column is semantically the same as selecting the other columns   I ll show a few additional methods to consider     I ll also focus on the general solution of deleting multiple columns at once and allowing for the attempt to delete columns not present     Using these solutions are general and will work for the simple case as well     Setup Consider the pd DataFrame df and list to delete dlst  df   pd DataFrame dict zip  ABCDEFGHIJ   range 1  11     range 3   dlst   list  HIJKLM       df     A  B  C  D  E  F  G  H  I   J 0  1  2  3  4  5  6  7  8  9  10 1  1  2  3  4  5  6  7  8  9  10 2  1  2  3  4  5  6  7  8  9  10     dlst    H    I    J    K    L    M     The result should look like   df drop dlst  1  errors  ignore       A  B  C  D  E  F  G 0  1  2  3  4  5  6  7 1  1  2  3  4  5  6  7 2  1  2  3  4  5  6  7     Since I m equating deleting a column to selecting the other columns  I ll break it into two types    Label selection Boolean selection     Label Selection  We start by manufacturing the list array of labels that represent the columns we want to keep and without the columns we want to delete    df columns difference dlst   Index   A    B    C    D    E    F    G    dtype  object    np setdiff1d df columns values  dlst   array   A    B    C    D    E    F    G    dtype object   df columns drop dlst  errors  ignore    Index   A    B    C    D    E    F    G    dtype  object    list set df columns values tolist    difference dlst      does not preserve order   E    D    B    F    G    A    C     x for x in df columns values tolist   if x not in dlst     A    B    C    D    E    F    G        Columns from Labels For the sake of comparing the selection process  assume    cols    x for x in df columns values tolist   if x not in dlst    Then we can evaluate     df loc    cols  df cols  df reindex columns cols  df reindex axis cols  1    Which all evaluate to      A  B  C  D  E  F  G 0  1  2  3  4  5  6  7 1  1  2  3  4  5  6  7 2  1  2  3  4  5  6  7     Boolean Slice  We can construct an array list of booleans for slicing    df columns isin dlst   np in1d df columns values  dlst   x not in dlst for x in df columns values tolist     df columns values    None     dlst  all 1    Columns from Boolean For the sake of comparison    bools    x not in dlst for x in df columns values tolist       df loc   bools    Which all evaluate to      A  B  C  D  E  F  G 0  1  2  3  4  5  6  7 1  1  2  3  4  5  6  7 2  1  2  3  4  5  6  7     Robust Timing    Functions    setdiff1d   lambda df  dlst  np setdiff1d df columns values  dlst  difference   lambda df  dlst  df columns difference dlst  columndrop   lambda df  dlst  df columns drop dlst  errors  ignore   setdifflst   lambda df  dlst  list set df columns values tolist    difference dlst   comprehension   lambda df  dlst   x for x in df columns values tolist   if x not in dlst   loc   lambda df  cols  df loc    cols  slc   lambda df  cols  df cols  ridx   lambda df  cols  df reindex columns cols  ridxa   lambda df  cols  df reindex axis cols  1   isin   lambda df  dlst   df columns isin dlst  in1d   lambda df  dlst   np in1d df columns values  dlst  comp   lambda df  dlst   x not in dlst for x in df columns values tolist    brod   lambda df  dlst   df columns values    None     dlst  all 1    Testing    res1   pd DataFrame      index pd MultiIndex from product            loc slc ridx ridxa  split             setdiff1d difference columndrop setdifflst comprehension  split           names   Select    Label         columns  10  30  100  300  1000       dtype float    res2   pd DataFrame      index pd MultiIndex from product            loc  split             isin in1d comp brod  split           names   Select    Label         columns  10  30  100  300  1000       dtype float    res   res1 append res2  sort index    dres   pd Series index res columns  name  drop    for j in res columns      dlst   list range j       cols   list range j    2  j   j    2       d   pd DataFrame 1  range 10   cols      dres at j    timeit  d drop dlst  1  errors  ignore      from   main   import d  dlst   number 100      for s  l in res index          stmt       d     d  dlst    format s  l          setp    from   main   import d  dlst          format s  l          res at  s  l   j    timeit stmt  setp  number 100   rs   res   dres     rs                            10        30        100       300        1000 Select Label                                                            loc    brod           0 747373  0 861979  0 891144  1 284235   3 872157        columndrop     1 193983  1 292843  1 396841  1 484429   1 335733        comp           0 802036  0 732326  1 149397  3 473283  25 565922        comprehension  1 463503  1 568395  1 866441  4 421639  26 552276        difference     1 413010  1 460863  1 587594  1 568571   1 569735        in1d           0 818502  0 844374  0 994093  1 042360   1 076255        isin           1 008874  0 879706  1 021712  1 001119   0 964327        setdiff1d      1 352828  1 274061  1 483380  1 459986   1 466575        setdifflst     1 233332  1 444521  1 714199  1 797241   1 876425 ridx   columndrop     0 903013  0 832814  0 949234  0 976366   0 982888        comprehension  0 777445  0 827151  1 108028  3 473164  25 528879        difference     1 086859  1 081396  1 293132  1 173044   1 237613        setdiff1d      0 946009  0 873169  0 900185  0 908194   1 036124        setdifflst     0 732964  0 823218  0 819748  0 990315   1 050910 ridxa  columndrop     0 835254  0 774701  0 907105  0 908006   0 932754        comprehension  0 697749  0 762556  1 215225  3 510226  25 041832        difference     1 055099  1 010208  1 122005  1 119575   1 383065        setdiff1d      0 760716  0 725386  0 849949  0 879425   0 946460        setdifflst     0 710008  0 668108  0 778060  0 871766   0 939537 slc    columndrop     1 268191  1 521264  2 646687  1 919423   1 981091        comprehension  0 856893  0 870365  1 290730  3 564219  26 208937        difference     1 470095  1 747211  2 886581  2 254690   2 050536        setdiff1d      1 098427  1 133476  1 466029  2 045965   3 123452        setdifflst     0 833700  0 846652  1 013061  1 110352   1 287831     fig  axes   plt subplots 2  2  figsize  8  6   sharey True  for i   n  g  in enumerate   n  g xs n   for n  g in rs groupby  Select          ax   axes i    2  i   2      g plot bar ax ax  title n      ax legend  remove   fig tight layout     This is relative to the time it takes to run df drop dlst  1  errors  ignore     It seems like after all that effort  we only improve performance modestly     If fact the best solutions use reindex or reindex axis on the hack list set df columns values tolist    difference dlst     A close second and still very marginally better than drop is np setdiff1d   rs idxmin   pipe      lambda x  pd DataFrame          dict idx x values  val rs lookup x values  x index            x index                                idx       val 10      ridx  setdifflst   0 653431 30     ridxa  setdifflst   0 746143 100    ridxa  setdifflst   0 816207 300     ridx  setdifflst   0 780157 1000   ridxa  setdifflst   0 861622

User · Answer

In pandas 0 16 1  you can drop columns only if they exist per the solution posted by  eiTanLaVi   Prior to that version  you can achieve the same result via a conditional list comprehension   df drop  col for col in   col name 1   col name 2       col name N   if col in df            axis 1  inplace True

User · Answer

Use   columns     Col1    Col2        df drop columns  inplace True  axis 1    This will delete one or more columns in-place  Note that inplace True was added in pandas v0 13 and won t work on older versions  You d have to assign the result back in that case   df   df drop columns  axis 1

User · Answer

A nice addition is the ability to drop columns only if they exist  This way you can cover more use cases  and it will only drop the existing columns from the labels passed to it   Simply add errors  ignore   for example    df drop   col name 1    col name 2         col name N    inplace True  axis 1  errors  ignore      This is new from pandas 0 16 1 onward  Documentation is here

User · Answer

The best way to do this in pandas is to use drop  df   df drop  column name   1   where 1 is the axis number  0 for rows and 1 for columns   To delete the column without having to reassign df you can do  df drop  column name   axis 1  inplace True   Finally  to drop by column number instead of by column label  try this to delete  e g  the 1st  2nd and 4th columns  df   df drop df columns  0  1  3    axis 1     df columns is zero-based pd Index   Also working with  quot text quot  syntax for the columns  df drop   column nameA    column nameB    axis 1  inplace True   Note  Introduced in v0 21 0  October 27  2017   the drop   method accepts index columns keywords as an alternative to specifying the axis  So we can now just do  df drop columns   B    C

User · Answer

Pandas 0 21  answer  Pandas version 0 21 has changed the drop method slightly to include both the index and columns parameters to match the signature of the rename and reindex methods    df drop columns   column a    column c      Personally  I prefer using the axis parameter to denote columns or index because it is the predominant keyword parameter used in nearly all pandas methods  But  now you have some added choices in version 0 21

User · Answer

The actual question posed  missed by most answers here is  Why can t I use del df column name  At first we need to understand the problem  which requires us to dive into python magic methods  As Wes points out in his answer del df  column   maps to the python magic method df   delitem    column   which is implemented in pandas to drop the column However  as pointed out in the link above about python magic methods   In fact    del   should almost never be used because of the precarious circumstances under which it is called  use it with caution   You could argue that del df  column name   should not be used or encouraged  and thereby del df column name should not even be considered  However  in theory  del df column name could be implemeted to work in pandas using the magic method   delattr    This does however introduce certain problems  problems which the del df  column name   implementation already has  but in lesser degree  Example Problem What if I define a column in a dataframe called  quot dtypes quot  or  quot columns quot   Then assume I want to delete these columns  del df dtypes would make the   delattr   method confused as if it should delete the  quot dtypes quot  attribute or the  quot dtypes quot  column  Architectural questions behind this problem  Is a dataframe a collection of columns  Is a dataframe a collection of rows  Is a column an attribute of a dataframe   Pandas answers   Yes  in all ways No  but if you want it to be  you can use the  ix   loc or  iloc methods  Maybe  do you want to read data  Then yes  unless the name of the attribute is already taken by another attribute belonging to the dataframe  Do you want to modify data  Then no   TLDR  You cannot do del df column name because pandas has a quite wildly grown architecture that needs to be reconsidered in order for this kind of cognitive dissonance not to occur to its users  Protip  Don t use df column name  It may be pretty  but it causes cognitive dissonance Zen of Python quotes that fits in here  There are multiple ways of deleting a column   There should be one-- and preferably only one --obvious way to do it   Columns are sometimes attributes but sometimes not   Special cases aren t special enough to break the rules   Does del df dtypes delete the dtypes attribute or the dtypes column   In the face of ambiguity  refuse the temptation to guess

User · Answer

from version 0 16 1 you can do   df drop   column name    axis   1  inplace   True  errors    ignore

User · Answer

Another way of Deleting a Column in Pandas DataFrame  if you re not looking for In-Place deletion then you can create a new DataFrame by specifying the columns using DataFrame      function  as  my dict      name      a   b   c   d     age     10 20 25 22    designation      CEO    VP    MD    CEO     df   pd DataFrame my dict    Create a new DataFrame as  newdf   pd DataFrame df  columns   name    age      You get a result as good as what you get with del   drop

User · Answer

The dot syntax works in JavaScript  but not in Python    Python  del df  column name   JavaScript  del df  column name   or del df column name

User · Answer

Drop by index  Delete first  second and fourth columns   df drop df columns  0 1 3    axis 1  inplace True    Delete first column   df drop df columns  0    axis 1  inplace True    There is an optional parameter inplace so that the original data can be modified without creating a copy   Popped  Column selection  addition  deletion  Delete column column-name   df pop  column-name     Examples   df   DataFrame from items    A    1  2  3      B    4  5  6      C    7 8  9     orient  index   columns   one    two    three      print df      one  two  three A    1    2      3 B    4    5      6 C    7    8      9   df drop df columns  0    axis 1  inplace True  print df      two  three A    2      3 B    5      6 C    8      9   three   df pop  three   print df      two A    2 B    5 C    8

[python] Delete column from pandas DataFrame

Examples related to python

Examples related to pandas

Examples related to dataframe