[r] Opposite of %in%: exclude rows with values specified in a vector

A categorical variable V1 in a data frame D1 can have values represented by the letters from A to Z. I want to create a subset D2, which excludes some values, say, B, N and T. Basically, I want a command which is the opposite of %in%

D2 = subset(D1, V1 %in% c('B','N',T'))

This question is related to r dataframe subset

The answer is


Instead of creating your own function, it would be useful to just negate the behavior of

needle %in% haystack 

do this instead:

!(needle %in% haystack)

this works as well.


purrr::compose() is another quick way to define this for later use, as in:

`%!in%` <- compose(`!`, `%in%`)

The package collapse has it built in: %!in%.


Another solution could be using setdiff

D1 = c("A",..., "Z") ; D0 = c("B","N","T")

D2 = setdiff(D1, D0)

D2 is your desired subset.


The help for %in%, help("%in%"), includes, in the Examples section, this definition of not in,

"%w/o%" <- function(x, y) x[!x %in% y] #-- x without y

Lets try it:

c(2,3,4) %w/o% c(2,8,9)
[1] 3 4

Alternatively

"%w/o%" <- function(x, y) !x %in% y #--  x without y
c(2,3,4) %w/o% c(2,8,9)
# [1] FALSE  TRUE  TRUE

Using negate from purrr also does the trick quickly and neatly:

`%not_in%` <- purrr::negate(`%in%`)

Then usage is, for example,

c("cat", "dog") %not_in% c("dog", "mouse")

library(roperators)

1 %ni% 2:10

If you frequently need to use custom infix operators, it is easier to just have them in a package rather than declaring the same exact functions over and over in each script or project.



If you look at the code of %in%

 function (x, table) match(x, table, nomatch = 0L) > 0L

then you should be able to write your version of opposite. I use

`%not in%` <- function (x, table) is.na(match(x, table, nomatch=NA_integer_))

Another way is:

function (x, table) match(x, table, nomatch = 0L) == 0L

How about:

'%ni%' <- Negate('%in%')
c(1,3,11) %ni% 1:10
# [1] FALSE FALSE  TRUE

This works fine for me:

`%nin%` <- Negate(`%in%`)

require(TSDT)

c(1,3,11) %nin% 1:10
# [1] FALSE FALSE  TRUE

For more information, you can refer to: https://cran.r-project.org/web/packages/TSDT/TSDT.pdf


Here is a version using filter in dplyr that applies the same technique as the accepted answer by negating the logical with !:

D2 <- D1 %>% dplyr::filter(!V1 %in% c('B','N','T'))

Examples related to r

How to get AIC from Conway–Maxwell-Poisson regression via COM-poisson package in R? R : how to simply repeat a command? session not created: This version of ChromeDriver only supports Chrome version 74 error with ChromeDriver Chrome using Selenium How to show code but hide output in RMarkdown? remove kernel on jupyter notebook Function to calculate R2 (R-squared) in R Center Plot title in ggplot2 R ggplot2: stat_count() must not be used with a y aesthetic error in Bar graph R multiple conditions in if statement What does "The following object is masked from 'package:xxx'" mean?

Examples related to dataframe

Trying to merge 2 dataframes but get ValueError How to show all of columns name on pandas dataframe? Python Pandas - Find difference between two data frames Pandas get the most frequent values of a column Display all dataframe columns in a Jupyter Python Notebook How to convert column with string type to int form in pyspark data frame? Display/Print one column from a DataFrame of Series in Pandas Binning column with python pandas Selection with .loc in python Set value to an entire column of a pandas dataframe

Examples related to subset

how to remove multiple columns in r dataframe? Using grep to help subset a data frame in R Subset a dataframe by multiple factor levels creating a new list with subset of list using index in python subsetting a Python DataFrame Undefined columns selected when subsetting data frame How to select some rows with specific rownames from a dataframe? Subset data to contain only columns whose names match a condition find all subsets that sum to a particular value Subset and ggplot2