I have a problem to solve how to remove rows with a Zero value in R. In others hand, I can use na.omit()
to delete all the NA values or use complete.cases()
to delete rows that contains NA values.
Is there anyone know how to remove rows with a Zero Values in R?
For example :
Before
| DateTime | Mac1 | Mac2 | Mac3 | Mac4 |
----------------------------------------------------
| 2011-04-02 06:00 | 20 | 0 | 20 | 20 |
| 2011-04-02 06:05 | 21 | 21 | 21 | 21 |
| 2011-04-02 06:10 | 22 | 22 | 22 | 22 |
| 2011-04-02 06:15 | 23 | 23 | 0 | 23 |
| 2011-04-02 06:20 | 24 | 24 | 24 | 24 |
| 2011-04-02 06:25 | 0 | 25 | 25 | 0 |
After
| DateTime | Mac1 | Mac2 | Mac3 | Mac4 |
----------------------------------------------------
| 2011-04-02 06:05 | 21 | 21 | 21 | 21 |
| 2011-04-02 06:10 | 22 | 22 | 22 | 22 |
| 2011-04-02 06:20 | 24 | 24 | 24 | 24 |
Well, you could swap your 0
's for NA
and then use one of those solutions, but for sake of a difference, you could notice that a number will only have a finite logarithm if it is greater than 0
, so that rowSums
of the log
will only be finite if there are no zeros in a row.
dfr[is.finite(rowSums(log(dfr[-1]))),]
I prefer a simple adaptation of csgillespie's method, foregoing the need of a function definition:
d[apply(d!=0, 1, all),]
where d
is your data frame.
You can use filter from dplyr package.
Let's call your data frame df
library(dplyr)
df1 <- filter(df, Mac1 > 0, Mac2 > 0, Mac3 > 0, Mac4 > 0)
df1 will have only rows with entries above zero. Hope this helps.
In base R, we can select the columns which we want to test using grep
, compare the data with 0, use rowSums
to select rows which has all non-zero values.
cols <- grep("^Mac", names(df))
df[rowSums(df[cols] != 0) == length(cols), ]
# DateTime Mac1 Mac2 Mac3 Mac4
#1 2011-04-02 06:05 21 21 21 21
#2 2011-04-02 06:10 22 22 22 22
#3 2011-04-02 06:20 24 24 24 24
Doing this with inverted logic but giving the same output
df[rowSums(df[cols] == 0) == 0, ]
In dplyr
, we can use filter_at
to test for specific columns and use all_vars
to select rows where all the values are not equal to 0.
library(dplyr)
df %>% filter_at(vars(starts_with("Mac")), all_vars(. != 0))
data
df <- structure(list(DateTime = structure(1:6, .Label = c("2011-04-02 06:00",
"2011-04-02 06:05", "2011-04-02 06:10", "2011-04-02 06:15", "2011-04-02 06:20",
"2011-04-02 06:25"), class = "factor"), Mac1 = c(20L, 21L, 22L,
23L, 24L, 0L), Mac2 = c(0L, 21L, 22L, 23L, 24L, 25L), Mac3 = c(20L,
21L, 22L, 0L, 24L, 25L), Mac4 = c(20L, 21L, 22L, 23L, 24L, 0L
)), class = "data.frame", row.names = c(NA, -6L))
I would probably go with Joran's suggestion of replacing 0's with NAs and then using the built in functions you mentioned. If you can't/don't want to do that, one approach is to use any()
to find rows that contain 0's and subset those out:
set.seed(42)
#Fake data
x <- data.frame(a = sample(0:2, 5, TRUE), b = sample(0:2, 5, TRUE))
> x
a b
1 2 1
2 2 2
3 0 0
4 2 1
5 1 2
#Subset out any rows with a 0 in them
#Note the negation with ! around the apply function
x[!(apply(x, 1, function(y) any(y == 0))),]
a b
1 2 1
2 2 2
4 2 1
5 1 2
To implement Joran's method, something like this should get you started:
x[x==0] <- NA
I would do the following.
Set the zero to NA.
data[data==0] <- NA
data
Delete the rows associated with NA.
data2<-data[complete.cases(data),]
Using tidyverse/dplyr, you can also remove rows with any zero value in a subset of variables:
# variables starting with Mac must be non-zero
filter_at(df, vars(starts_with("Mac")), all_vars((.) != 0))
# variables x, y, and z must be non-zero
filter_at(df, vars(x, y, z), all_vars((.) != 0))
# all numeric variables must be non-zero
filter_if(df, is.numeric, all_vars((.) != 0))
As dplyr 1.0.0 deprecated the scoped variants which @Feng Mai nicely showed, here is an update with the new syntax. This might be useful because in this case, across()
doesn't work, and it took me some time to figure out the solution as follows.
The goal was to extract all rows that contain at least one 0 in a column.
df %>%
rowwise() %>%
filter(any(c_across(everything(.)) == 0))
with the data
df <- data.frame(a = 1:4, b= 1:0, c=0:3)
df <- rbind(df, c(0,0,0))
df <- rbind(df, c(9,9,9))
# A tibble: 4 x 3
# Rowwise:
a b c
<dbl> <dbl> <dbl>
1 1 1 0
2 2 0 1
3 4 0 3
4 0 0 0
So it correctly doesn't return the last row containing all 9s.
Source: Stackoverflow.com