I have a dataframe look like this:
import pandas
import numpy as np
df = DataFrame(np.random.rand(4,4), columns = list('abcd'))
df
a b c d
0 0.418762 0.042369 0.869203 0.972314
1 0.991058 0.510228 0.594784 0.534366
2 0.407472 0.259811 0.396664 0.894202
3 0.726168 0.139531 0.324932 0.906575
How I can get all columns except column b
?
You can use df.columns.isin()
df.loc[:, ~df.columns.isin(['b'])]
When you want to drop multiple columns, as simple as:
df.loc[:, ~df.columns.isin(['col1', 'col2'])]
You can drop
columns in index:
df[df.columns.drop('b')]
Output:
a c d
0 0.418762 0.869203 0.972314
1 0.991058 0.594784 0.534366
2 0.407472 0.396664 0.894202
3 0.726168 0.324932 0.906575
Here is another way:
df[[i for i in list(df.columns) if i != '<your column>']]
You just pass all columns to be shown except of the one you do not want.
Another slight modification to @Salvador Dali enables a list of columns to exclude:
df[[i for i in list(df.columns) if i not in [list_of_columns_to_exclude]]]
or
df.loc[:,[i for i in list(df.columns) if i not in [list_of_columns_to_exclude]]]
I think the best way to do is the way mentioned by @Salvador Dali. Not that the others are wrong.
Because when you have a data set where you just want to select one column and put it into one variable and the rest of the columns into another for comparison or computational purposes. Then dropping the column of the data set might not help. Of course there are use cases for that as well.
x_cols = [x for x in data.columns if x != 'name of column to be excluded']
Then you can put those collection of columns in variable x_cols
into another variable like x_cols1
for other computation.
ex: x_cols1 = data[x_cols]
Here is a one line lambda:
df[map(lambda x :x not in ['b'], list(df.columns))]
before:
import pandas
import numpy as np
df = pd.DataFrame(np.random.rand(4,4), columns = list('abcd'))
df
a b c d
0 0.774951 0.079351 0.118437 0.735799
1 0.615547 0.203062 0.437672 0.912781
2 0.804140 0.708514 0.156943 0.104416
3 0.226051 0.641862 0.739839 0.434230
after:
df[map(lambda x :x not in ['b'], list(df.columns))]
a c d
0 0.774951 0.118437 0.735799
1 0.615547 0.437672 0.912781
2 0.804140 0.156943 0.104416
3 0.226051 0.739839 0.434230
I think a nice solution is with the function filter of pandas and regex (match everything except "b"):
df.filter(regex="^(?!b$)")
df[df.columns.difference(['b'])]
Out:
a c d
0 0.427809 0.459807 0.333869
1 0.678031 0.668346 0.645951
2 0.996573 0.673730 0.314911
3 0.786942 0.719665 0.330833
Don't use ix
. It's deprecated. The most readable and idiomatic way of doing this is df.drop()
:
>>> df
a b c d
0 0.175127 0.191051 0.382122 0.869242
1 0.414376 0.300502 0.554819 0.497524
2 0.142878 0.406830 0.314240 0.093132
3 0.337368 0.851783 0.933441 0.949598
>>> df.drop('b', axis=1)
a c d
0 0.175127 0.382122 0.869242
1 0.414376 0.554819 0.497524
2 0.142878 0.314240 0.093132
3 0.337368 0.933441 0.949598
Note that by default, .drop()
does not operate inplace; despite the ominous name, df
is unharmed by this process. If you want to permanently remove b
from df
, do df.drop('b', inplace=True)
.
df.drop()
also accepts a list of labels, e.g. df.drop(['a', 'b'], axis=1)
will drop column a
and b
.
Source: Stackoverflow.com