[r] Subset of rows containing NA (missing) values in a chosen column of a data frame

We have a data frame from a CSV file. The data frame DF has columns that contain observed values and a column (VaR2) that contains the date at which a measurement has been taken. If the date was not recorded, the CSV file contains the value NA, for missing data.

Var1  Var2 
10   2010/01/01
20   NA
30   2010/03/01

We would like to use the subset command to define a new data frame new_DF such that it only contains rows that have an NA' value from the column (VaR2). In the example given, only Row 2 will be contained in the new DF.

The command

new_DF<-subset(DF,DF$Var2=="NA") 

does not work, the resulting data frame has no row entries.

If in the original CSV file the Value NA are exchanged with NULL, the same command produces the desired result: new_DF<-subset(DF,DF$Var2=="NULL").

How can I get this method working, if for the character string the value NA is provided in the original CSV file?

This question is related to r csv dataframe subset na

The answer is


Prints all the rows with NA data:

tmp <- data.frame(c(1,2,3),c(4,NA,5));
tmp[round(which(is.na(tmp))/ncol(tmp)),]

Never use =='NA' to test for missing values. Use is.na() instead. This should do it:

new_DF <- DF[rowSums(is.na(DF)) > 0,]

or in case you want to check a particular column, you can also use

new_DF <- DF[is.na(DF$Var),]

In case you have NA character values, first run

Df[Df=='NA'] <- NA

to replace them with missing values.


complete.cases gives TRUE when all values in a row are not NA

DF[!complete.cases(DF), ]

new_data <- data %>% filter_all(any_vars(is.na(.))) 

This should create a new data frame (new_data) with only the missing values in it.

Works best to keep a track of values that you might later drop because they had some columns with missing observations (NA).


Try changing this:

new_DF<-dplyr::filter(DF,is.na(Var2)) 

NA is a special value in R, do not mix up the NA value with the "NA" string. Depending on the way the data was imported, your "NA" and "NULL" cells may be of various type (the default behavior is to convert "NA" strings to NA values, and let "NULL" strings as is).

If using read.table() or read.csv(), you should consider the "na.strings" argument to do clean data import, and always work with real R NA values.

An example, working in both cases "NULL" and "NA" cells :

DF <- read.csv("file.csv", na.strings=c("NA", "NULL"))
new_DF <- subset(DF, is.na(DF$Var2))

Examples related to r

How to get AIC from Conway–Maxwell-Poisson regression via COM-poisson package in R? R : how to simply repeat a command? session not created: This version of ChromeDriver only supports Chrome version 74 error with ChromeDriver Chrome using Selenium How to show code but hide output in RMarkdown? remove kernel on jupyter notebook Function to calculate R2 (R-squared) in R Center Plot title in ggplot2 R ggplot2: stat_count() must not be used with a y aesthetic error in Bar graph R multiple conditions in if statement What does "The following object is masked from 'package:xxx'" mean?

Examples related to csv

Pandas: ValueError: cannot convert float NaN to integer Export result set on Dbeaver to CSV Convert txt to csv python script How to import an Excel file into SQL Server? "CSV file does not exist" for a filename with embedded quotes Save Dataframe to csv directly to s3 Python Data-frame Object has no Attribute (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape How to write to a CSV line by line? How to check encoding of a CSV file

Examples related to dataframe

Trying to merge 2 dataframes but get ValueError How to show all of columns name on pandas dataframe? Python Pandas - Find difference between two data frames Pandas get the most frequent values of a column Display all dataframe columns in a Jupyter Python Notebook How to convert column with string type to int form in pyspark data frame? Display/Print one column from a DataFrame of Series in Pandas Binning column with python pandas Selection with .loc in python Set value to an entire column of a pandas dataframe

Examples related to subset

how to remove multiple columns in r dataframe? Using grep to help subset a data frame in R Subset a dataframe by multiple factor levels creating a new list with subset of list using index in python subsetting a Python DataFrame Undefined columns selected when subsetting data frame How to select some rows with specific rownames from a dataframe? Subset data to contain only columns whose names match a condition find all subsets that sum to a particular value Subset and ggplot2

Examples related to na

Removing NA in dplyr pipe Change the Blank Cells to "NA" How to create an empty matrix in R? Convert Pandas column containing NaNs to dtype `int` How to replace NA values in a table for selected columns How to delete columns that contain ONLY NAs? How to avoid warning when introducing NAs by coercion Replace NA with 0 in a data frame column Exclude Blank and NA in R Omit rows containing specific column of NA