[r] Left join only selected columns in R with the merge() function

I am trying to LEFT Join 2 data frames but I do not want join all the variables from the second data set:

As an example, I have dataset 1 (DF1):

  Cl    Q   Sales  Date
   A    2   30     01/01/2014
   A    3   24     02/01/2014
   A    1   10     03/01/2014
   B    4   10     01/01/2014
   B    1   20     02/01/2014
   B    3   30     03/01/2014

And I would like to left join dataset 2 (DF2):

Client  LO  CON
   A    12  CA
   B    11  US
   C    12  UK
   D    10  CA
   E    15  AUS
   F    91  DD

I am able to left join with the following code:

merge(x = DF1, y = DF2, by = "Client", all.x=TRUE) :

   Client Q    Sales   Date             LO      CON
   A      2    30      01/01/2014       12      CA
   A      3    24      02/01/2014       12      CA
   A      1    10      03/01/2014       12      CA
   B      4    10      01/01/2014       11      US
   B      1    20      02/01/2014       11      US
   B      3    30      03/01/2014       11      US

However, it merges both column LO and CON. I would only like to merge the column LO.

   Client Q    Sales   Date             LO      
   A      2    30      01/01/2014       12      
   A      3    24      02/01/2014       12      
   A      1    10      03/01/2014       12      
   B      4    10      01/01/2014       11      
   B      1    20      02/01/2014       11     
   B      3    30      03/01/2014       11      

This question is related to r merge

The answer is


Nothing elegant but this could be another satisfactory answer.

merge(x = DF1, y = DF2, by = "Client", all.x=TRUE)[,c("Client","LO","CON")]

This will be useful especially when you don't need the keys that were used to join the tables in your results.


I think it's a little simpler to use the dplyr functions select and left_join ; at least it's easier for me to understand. The join function from dplyr are made to mimic sql arguments.

 library(tidyverse)

 DF2 <- DF2 %>%
   select(client, LO)

 joined_data <- left_join(DF1, DF2, by = "Client")

You don't actually need to use the "by" argument in this case because the columns have the same name.