[r] Removing empty rows of a data file in R

I have a dataset with empty rows. I would like to remove them:

myData<-myData[-which(apply(myData,1,function(x)all(is.na(x)))),]

It works OK. But now I would like to add a column in my data and initialize the first value:

myData$newCol[1] <- -999

Error in `$<-.data.frame`(`*tmp*`, "newCol", value = -999) : 
  replacement has 1 rows, data has 0

Unfortunately it doesn't work and I don't really understand why and I can't solve this. It worked when I removed one line at a time using:

TgData = TgData[2:nrow(TgData),]

Or anything similar.

It also works when I used only the first 13.000 rows.

But it doesn't work with my actual data, with 32.000 rows.

What did I do wrong? It seems to make no sense to me.

This question is related to r

The answer is


Here are some dplyr options:

# sample data
df <- data.frame(a = c('1', NA, '3', NA), b = c('a', 'b', 'c', NA), c = c('e', 'f', 'g', NA))

library(dplyr)

# remove rows where all values are NA:
df %>% filter_all(any_vars(!is.na(.)))
df %>% filter_all(any_vars(complete.cases(.)))  


# remove rows where only some values are NA:
df %>% filter_all(all_vars(!is.na(.)))
df %>% filter_all(all_vars(complete.cases(.)))  

# or more succinctly:
df %>% filter(complete.cases(.))  
df %>% na.omit

# dplyr and tidyr:
library(tidyr)
df %>% drop_na

Alternative solution for rows of NAs using janitor package

myData %>% remove_empty("rows")

If you have empty rows, not NAs, you can do:

data[!apply(data == "", 1, all),]

To remove both (NAs and empty):

data <- data[!apply(is.na(data) | data == "", 1, all),]

This is similar to some of the above answers, but with this, you can specify if you want to remove rows with a percentage of missing values greater-than or equal-to a given percent (with the argument pct)

drop_rows_all_na <- function(x, pct=1) x[!rowSums(is.na(x)) >= ncol(x)*pct,]

Where x is a dataframe and pct is the threshold of NA-filled data you want to get rid of.

pct = 1 means remove rows that have 100% of its values NA. pct = .5 means remome rows that have at least half its values NA