I have a dataset with empty rows. I would like to remove them:
myData<-myData[-which(apply(myData,1,function(x)all(is.na(x)))),]
It works OK. But now I would like to add a column in my data and initialize the first value:
myData$newCol[1] <- -999
Error in `$<-.data.frame`(`*tmp*`, "newCol", value = -999) :
replacement has 1 rows, data has 0
Unfortunately it doesn't work and I don't really understand why and I can't solve this. It worked when I removed one line at a time using:
TgData = TgData[2:nrow(TgData),]
Or anything similar.
It also works when I used only the first 13.000 rows.
But it doesn't work with my actual data, with 32.000 rows.
What did I do wrong? It seems to make no sense to me.
This question is related to
r
Here are some dplyr
options:
# sample data
df <- data.frame(a = c('1', NA, '3', NA), b = c('a', 'b', 'c', NA), c = c('e', 'f', 'g', NA))
library(dplyr)
# remove rows where all values are NA:
df %>% filter_all(any_vars(!is.na(.)))
df %>% filter_all(any_vars(complete.cases(.)))
# remove rows where only some values are NA:
df %>% filter_all(all_vars(!is.na(.)))
df %>% filter_all(all_vars(complete.cases(.)))
# or more succinctly:
df %>% filter(complete.cases(.))
df %>% na.omit
# dplyr and tidyr:
library(tidyr)
df %>% drop_na
Alternative solution for rows of NA
s using janitor
package
myData %>% remove_empty("rows")
If you have empty rows, not NAs, you can do:
data[!apply(data == "", 1, all),]
To remove both (NAs and empty):
data <- data[!apply(is.na(data) | data == "", 1, all),]
This is similar to some of the above answers, but with this, you can specify if you want to remove rows with a percentage of missing values greater-than or equal-to a given percent (with the argument pct
)
drop_rows_all_na <- function(x, pct=1) x[!rowSums(is.na(x)) >= ncol(x)*pct,]
Where x
is a dataframe and pct
is the threshold of NA
-filled data you want to get rid of.
pct = 1
means remove rows that have 100% of its values NA
.
pct = .5
means remome rows that have at least half its values NA
Source: Stackoverflow.com