I am trying to LEFT Join 2 data frames but I do not want join all the variables from the second data set:
As an example, I have dataset 1 (DF1):
Cl Q Sales Date
A 2 30 01/01/2014
A 3 24 02/01/2014
A 1 10 03/01/2014
B 4 10 01/01/2014
B 1 20 02/01/2014
B 3 30 03/01/2014
And I would like to left join dataset 2 (DF2):
Client LO CON
A 12 CA
B 11 US
C 12 UK
D 10 CA
E 15 AUS
F 91 DD
I am able to left join with the following code:
merge(x = DF1, y = DF2, by = "Client", all.x=TRUE) :
Client Q Sales Date LO CON
A 2 30 01/01/2014 12 CA
A 3 24 02/01/2014 12 CA
A 1 10 03/01/2014 12 CA
B 4 10 01/01/2014 11 US
B 1 20 02/01/2014 11 US
B 3 30 03/01/2014 11 US
However, it merges both column LO and CON. I would only like to merge the column LO.
Client Q Sales Date LO
A 2 30 01/01/2014 12
A 3 24 02/01/2014 12
A 1 10 03/01/2014 12
B 4 10 01/01/2014 11
B 1 20 02/01/2014 11
B 3 30 03/01/2014 11
Nothing elegant but this could be another satisfactory answer.
merge(x = DF1, y = DF2, by = "Client", all.x=TRUE)[,c("Client","LO","CON")]
This will be useful especially when you don't need the keys that were used to join the tables in your results.
I think it's a little simpler to use the dplyr
functions select
and left_join
; at least it's easier for me to understand. The join function from dplyr
are made to mimic sql arguments.
library(tidyverse)
DF2 <- DF2 %>%
select(client, LO)
joined_data <- left_join(DF1, DF2, by = "Client")
You don't actually need to use the "by" argument in this case because the columns have the same name.
Source: Stackoverflow.com