I can use pandas
dropna()
functionality to remove rows with some or all columns set as NA
's. Is there an equivalent function for dropping rows with all columns having value 0?
P kt b tt mky depth
1 0 0 0 0 0
2 0 0 0 0 0
3 0 0 0 0 0
4 0 0 0 0 0
5 1.1 3 4.5 2.3 9.0
In this example, we would like to drop the first 4 rows from the data frame.
thanks!
One-liner. No transpose needed:
df.loc[~(df==0).all(axis=1)]
And for those who like symmetry, this also works...
df.loc[(df!=0).any(axis=1)]
I think this solution is the shortest :
df= df[df['ColName'] != 0]
Couple of solutions I found to be helpful while looking this up, especially for larger data sets:
df[(df.sum(axis=1) != 0)] # 30% faster
df[df.values.sum(axis=1) != 0] # 3X faster
Continuing with the example from @U2EF1:
In [88]: df = pd.DataFrame({'a':[0,0,1,1], 'b':[0,1,0,1]})
In [91]: %timeit df[(df.T != 0).any()]
1000 loops, best of 3: 686 µs per loop
In [92]: df[(df.sum(axis=1) != 0)]
Out[92]:
a b
1 0 1
2 1 0
3 1 1
In [95]: %timeit df[(df.sum(axis=1) != 0)]
1000 loops, best of 3: 495 µs per loop
In [96]: %timeit df[df.values.sum(axis=1) != 0]
1000 loops, best of 3: 217 µs per loop
On a larger dataset:
In [119]: bdf = pd.DataFrame(np.random.randint(0,2,size=(10000,4)))
In [120]: %timeit bdf[(bdf.T != 0).any()]
1000 loops, best of 3: 1.63 ms per loop
In [121]: %timeit bdf[(bdf.sum(axis=1) != 0)]
1000 loops, best of 3: 1.09 ms per loop
In [122]: %timeit bdf[bdf.values.sum(axis=1) != 0]
1000 loops, best of 3: 517 µs per loop
import pandas as pd
df = pd.DataFrame({'a' : [0,0,1], 'b' : [0,0,-1]})
temp = df.abs().sum(axis=1) == 0
df = df.drop(temp)
Result:
>>> df
a b
2 1 -1
this works for me
new_df = df[df.loc[:]!=0].dropna()
For me this code: df.loc[(df!=0).any(axis=0)]
did not work. It returned the exact dataset.
Instead, I used df.loc[:, (df!=0).any(axis=0)]
and dropped all the columns with 0 values in the dataset
The function .all()
droped all the columns in which are any zero values in my dataset.
Another alternative:
# Is there anything in this row non-zero?
# df != 0 --> which entries are non-zero? T/F
# (df != 0).any(axis=1) --> are there 'any' entries non-zero row-wise? T/F of rows that return true to this statement.
# df.loc[all_zero_mask,:] --> mask your rows to only show the rows which contained a non-zero entry.
# df.shape to confirm a subset.
all_zero_mask=(df != 0).any(axis=1) # Is there anything in this row non-zero?
df.loc[all_zero_mask,:].shape
You can use a quick lambda
function to check if all the values in a given row are 0
. Then you can use the result of applying that lambda
as a way to choose only the rows that match or don't match that condition:
import pandas as pd
import numpy as np
np.random.seed(0)
df = pd.DataFrame(np.random.randn(5,3),
index=['one', 'two', 'three', 'four', 'five'],
columns=list('abc'))
df.loc[['one', 'three']] = 0
print df
print df.loc[~df.apply(lambda row: (row==0).all(), axis=1)]
Yields:
a b c
one 0.000000 0.000000 0.000000
two 2.240893 1.867558 -0.977278
three 0.000000 0.000000 0.000000
four 0.410599 0.144044 1.454274
five 0.761038 0.121675 0.443863
[5 rows x 3 columns]
a b c
two 2.240893 1.867558 -0.977278
four 0.410599 0.144044 1.454274
five 0.761038 0.121675 0.443863
[3 rows x 3 columns]
Replace the zeros with nan
and then drop the rows with all entries as nan
.
After that replace nan
with zeros.
import numpy as np
df = df.replace(0, np.nan)
df = df.dropna(how='all', axis=0)
df = df.replace(np.nan, 0)
To drop all columns with values 0 in any row:
new_df = df[df.loc[:]!=0].dropna()
I look up this question about once a month and always have to dig out the best answer from the comments:
df.loc[(df!=0).any(1)]
Thanks Dan Allan!
df = df [~( df [ ['kt' 'b' 'tt' 'mky' 'depth', ] ] == 0).all(axis=1) ]
Try this command its perfectly working.
Source: Stackoverflow.com