[r] Delete rows with blank values in one particular column

I am working on a large dataset, with some rows with NAs and others with blanks:

df <- data.frame(ID = c(1:7),                                   
         home_pc = c("","CB4 2DT", "NE5 7TH", "BY5 8IB", "DH4 6PB","MP9 7GH","KN4 5GH"),               
         start_pc = c(NA,"Home", "FC5 7YH","Home", "CB3 5TH", "BV6 5PB",NA),               
         end_pc = c(NA,"CB5 4FG","Home","","Home","",NA))

How do I remove the NAs and blanks in one go (in the start_pc and end_pc columns)? I have in the past used:

df<- df[-which(is.na(df$start_pc)), ]

... to remove the NAs - is there a similar command to remove the blanks?

This question is related to r dataframe missing-data

The answer is


 df[!(is.na(df$start_pc) | df$start_pc==""), ]

Alternative solution can be to remove the rows with blanks in one variable:

df <- subset(df, VAR != "")

An easy approach would be making all the blank cells NA and only keeping complete cases. You might also look for na.omit examples. It is a widely discussed topic.

df[df==""]<-NA
df<-df[complete.cases(df),]

It is the same construct - simply test for empty strings rather than NA:

Try this:

df <- df[-which(df$start_pc == ""), ]

In fact, looking at your code, you don't need the which, but use the negation instead, so you can simplify it to:

df <- df[!(df$start_pc == ""), ]
df <- df[!is.na(df$start_pc), ]

And, of course, you can combine these two statements as follows:

df <- df[!(df$start_pc == "" | is.na(df$start_pc)), ]

And simplify it even further with with:

df <- with(df, df[!(start_pc == "" | is.na(start_pc)), ])

You can also test for non-zero string length using nzchar.

df <- with(df, df[!(nzchar(start_pc) | is.na(start_pc)), ])

Disclaimer: I didn't test any of this code. Please let me know if there are syntax errors anywhere


An elegant solution with dplyr would be:

df %>%
  # recode empty strings "" by NAs
  na_if("") %>%
  # remove NAs
  na.omit

Examples related to r

How to get AIC from Conway–Maxwell-Poisson regression via COM-poisson package in R? R : how to simply repeat a command? session not created: This version of ChromeDriver only supports Chrome version 74 error with ChromeDriver Chrome using Selenium How to show code but hide output in RMarkdown? remove kernel on jupyter notebook Function to calculate R2 (R-squared) in R Center Plot title in ggplot2 R ggplot2: stat_count() must not be used with a y aesthetic error in Bar graph R multiple conditions in if statement What does "The following object is masked from 'package:xxx'" mean?

Examples related to dataframe

Trying to merge 2 dataframes but get ValueError How to show all of columns name on pandas dataframe? Python Pandas - Find difference between two data frames Pandas get the most frequent values of a column Display all dataframe columns in a Jupyter Python Notebook How to convert column with string type to int form in pyspark data frame? Display/Print one column from a DataFrame of Series in Pandas Binning column with python pandas Selection with .loc in python Set value to an entire column of a pandas dataframe

Examples related to missing-data

Replace missing values with column mean How to lowercase a pandas dataframe string column if it has missing values? Delete rows with blank values in one particular column Elegant way to report missing values in a data.frame How do I replace NA values with zeros in an R dataframe? Remove NA values from a vector Remove rows with all or some NAs (missing values) in data.frame