Per the comments to the original post, merges / joins are well-suited for this problem. In particular, an inner join will return only values that are present in both dataframes, making thesetdiff
statement unnecessary.
Using the data from Dinre's example:
In base R:
cleanedA <- merge(data_A, data_B[, "index"], by = 1, sort = FALSE)
cleanedB <- merge(data_B, data_A[, "index"], by = 1, sort = FALSE)
Using the dplyr package:
library(dplyr)
cleanedA <- inner_join(data_A, data_B %>% select(index))
cleanedB <- inner_join(data_B, data_A %>% select(index))
To keep the data as two separate tables, each containing only its own variables, this subsets the unwanted table to only its index variable before joining. Then no new variables are added to the resulting table.