[r] Cleaning `Inf` values from an R dataframe

Option 1

Use the fact that a data.frame is a list of columns, then use do.call to recreate a data.frame.

do.call(data.frame,lapply(DT, function(x) replace(x, is.infinite(x),NA)))

Option 2 -- data.table

You could use data.table and set. This avoids some internal copying.

DT <- data.table(dat)
invisible(lapply(names(DT),function(.name) set(DT, which(is.infinite(DT[[.name]])), j = .name,value =NA)))

Or using column numbers (possibly faster if there are a lot of columns):

for (j in 1:ncol(DT)) set(DT, which(is.infinite(DT[[j]])), j, NA)

Timings

# some `big(ish)` data
dat <- data.frame(a = rep(c(1,Inf), 1e6), b = rep(c(Inf,2), 1e6), 
                  c = rep(c('a','b'),1e6),d = rep(c(1,Inf), 1e6),  
                  e = rep(c(Inf,2), 1e6))
# create data.table
library(data.table)
DT <- data.table(dat)

# replace (@mnel)
system.time(na_dat <- do.call(data.frame,lapply(dat, function(x) replace(x, is.infinite(x),NA))))
## user  system elapsed 
#  0.52    0.01    0.53 

# is.na (@dwin)
system.time(is.na(dat) <- sapply(dat, is.infinite))
# user  system elapsed 
# 32.96    0.07   33.12 

# modified is.na
system.time(is.na(dat) <- do.call(cbind,lapply(dat, is.infinite)))
#  user  system elapsed 
# 1.22    0.38    1.60 


# data.table (@mnel)
system.time(invisible(lapply(names(DT),function(.name) set(DT, which(is.infinite(DT[[.name]])), j = .name,value =NA))))
# user  system elapsed 
# 0.29    0.02    0.31 

data.table is the quickest. Using sapply slows things down noticeably.

Examples related to r

How to get AIC from Conway–Maxwell-Poisson regression via COM-poisson package in R? R : how to simply repeat a command? session not created: This version of ChromeDriver only supports Chrome version 74 error with ChromeDriver Chrome using Selenium How to show code but hide output in RMarkdown? remove kernel on jupyter notebook Function to calculate R2 (R-squared) in R Center Plot title in ggplot2 R ggplot2: stat_count() must not be used with a y aesthetic error in Bar graph R multiple conditions in if statement What does "The following object is masked from 'package:xxx'" mean?

Examples related to dataframe

Trying to merge 2 dataframes but get ValueError How to show all of columns name on pandas dataframe? Python Pandas - Find difference between two data frames Pandas get the most frequent values of a column Display all dataframe columns in a Jupyter Python Notebook How to convert column with string type to int form in pyspark data frame? Display/Print one column from a DataFrame of Series in Pandas Binning column with python pandas Selection with .loc in python Set value to an entire column of a pandas dataframe

Examples related to data.table

Error: package or namespace load failed for ggplot2 and for data.table Select subset of columns in data.table R How to get week numbers from dates? data.table vs dplyr: can one do something well the other can't or does poorly? How to replace NA values in a table for selected columns Select multiple columns in data.table by their numeric indices Sort rows in data.table in decreasing order on string key `order(-x,v)` gives error on data.table 1.9.4 or earlier Cleaning `Inf` values from an R dataframe Aggregate / summarize multiple variables per group (e.g. sum, mean) How do you delete a column by name in data.table?