[r] Removing specific rows from a dataframe

I have a data frame e.g.:

sub   day
1      1
1      2
1      3
1      4
2      1
2      2
2      3
2      4
3      1
3      2
3      3
3      4

and I would like to remove specific rows that can be identified by the combination of sub and day. For example say I wanted to remove rows where sub='1' and day='2' and sub=3 and day='4'. How could I do this? I realise that I could specify the row numbers, but this needs to be applied to a huge dataframe which would be tedious to go through and ID each row.

This question is related to r dataframe rows

The answer is


This boils down to two distinct steps:

  1. Figure out when your condition is true, and hence compute a vector of booleans, or, as I prefer, their indices by wrapping it into which()
  2. Create an updated data.frame by excluding the indices from the previous step.

Here is an example:

R> set.seed(42)
R> DF <- data.frame(sub=rep(1:4, each=4), day=sample(1:4, 16, replace=TRUE))
R> DF
   sub day
1    1   4
2    1   4
3    1   2
4    1   4
5    2   3
6    2   3
7    2   3
8    2   1
9    3   3
10   3   3
11   3   2
12   3   3
13   4   4
14   4   2
15   4   2
16   4   4
R> ind <- which(with( DF, sub==2 & day==3 ))
R> ind
[1] 5 6 7
R> DF <- DF[ -ind, ]
R> table(DF)
   day
sub 1 2 3 4
  1 0 1 0 3
  2 1 0 0 0
  3 0 1 3 0
  4 0 2 0 2
R> 

And we see that sub==2 has only one entry remaining with day==1.

Edit The compound condition can be done with an 'or' as follows:

ind <- which(with( DF, (sub==1 & day==2) | (sub=3 & day=4) ))

and here is a new full example

R> set.seed(1)
R> DF <- data.frame(sub=rep(1:4, each=5), day=sample(1:4, 20, replace=TRUE))
R> table(DF)
   day
sub 1 2 3 4
  1 1 2 1 1
  2 1 0 2 2
  3 2 1 1 1
  4 0 2 1 2
R> ind <- which(with( DF, (sub==1 & day==2) | (sub==3 & day==4) ))
R> ind
[1]  1  2 15
R> DF <- DF[-ind, ]
R> table(DF)
   day
sub 1 2 3 4
  1 1 0 1 1
  2 1 0 2 2
  3 2 1 1 0
  4 0 2 1 2
R> 

One simple solution:

cond1 <- df$sub == 1 & df$day == 2

cond2 <- df$sub == 3 & df$day == 4

df <- df[!(cond1 | cond2),]


Here's a solution to your problem using dplyr's filter function.

Although you can pass your data frame as the first argument to any dplyr function, I've used its %>% operator, which pipes your data frame to one or more dplyr functions (just filter in this case).

Once you are somewhat familiar with dplyr, the cheat sheet is very handy.

> print(df <- data.frame(sub=rep(1:3, each=4), day=1:4))
   sub day
1    1   1
2    1   2
3    1   3
4    1   4
5    2   1
6    2   2
7    2   3
8    2   4
9    3   1
10   3   2
11   3   3
12   3   4
> print(df <- df %>% filter(!((sub==1 & day==2) | (sub==3 & day==4))))
   sub day
1    1   1
2    1   3
3    1   4
4    2   1
5    2   2
6    2   3
7    2   4
8    3   1
9    3   2
10   3   3

Examples related to r

How to get AIC from Conway–Maxwell-Poisson regression via COM-poisson package in R? R : how to simply repeat a command? session not created: This version of ChromeDriver only supports Chrome version 74 error with ChromeDriver Chrome using Selenium How to show code but hide output in RMarkdown? remove kernel on jupyter notebook Function to calculate R2 (R-squared) in R Center Plot title in ggplot2 R ggplot2: stat_count() must not be used with a y aesthetic error in Bar graph R multiple conditions in if statement What does "The following object is masked from 'package:xxx'" mean?

Examples related to dataframe

Trying to merge 2 dataframes but get ValueError How to show all of columns name on pandas dataframe? Python Pandas - Find difference between two data frames Pandas get the most frequent values of a column Display all dataframe columns in a Jupyter Python Notebook How to convert column with string type to int form in pyspark data frame? Display/Print one column from a DataFrame of Series in Pandas Binning column with python pandas Selection with .loc in python Set value to an entire column of a pandas dataframe

Examples related to rows

SQL count rows in a table Converting rows into columns and columns into rows using R Delete rows containing specific strings in R How to append rows to an R data frame Excel Create Collapsible Indented Row Hierarchies Excel CSV. file with more than 1,048,576 rows of data Find which rows have different values for a given column in Teradata SQL More than 1 row in <Input type="textarea" /> Update multiple rows with different values in a single SQL query Repeat rows of a data.frame