[r] Merge unequal dataframes and replace missing rows with 0

I have two data.frames, one with only characters and the other one with characters and values.

df1 = data.frame(x=c('a', 'b', 'c', 'd', 'e'))
df2 = data.frame(x=c('a', 'b', 'c'),y = c(0,1,0))
merge(df1, df2)
  x y
1 a 0
2 b 1
3 c 0 

I want to merge df1 and df2. The characters a, b and c merged good and also have 0, 1, 0 but d and e has nothing. I want d and e also in the merge table, with the 0 0 condition. Thus for every missing row at the df2 data.frame, the 0 must be placed in the df1 table, like:

  x y
1 a 0
2 b 1
3 c 0
4 d 0
5 e 0

This question is related to r merge dataframe

The answer is

Or, as an alternative to @Chase's code, being a recent plyr fan with a background in databases:

zz<-join(df1, df2, type="left")
zz[is.na(zz)] <- 0

I used the answer given by Chase (answered May 11 '11 at 14:21), but I added a bit of code to apply that solution to my particular problem.

I had a frame of rates (user, download) and a frame of totals (user, download) to be merged by user, and I wanted to include every rate, even if there were no corresponding total. However, there could be no missing totals, in which case the selection of rows for replacement of NA by zero would fail.

The first line of code does the merge. The next two lines change the column names in the merged frame. The if statement replaces NA by zero, but only if there are rows with NA.

# merge rates and totals, replacing absent totals by zero
graphdata <- merge(rates, totals, by=c("user"),all.x=T)
colnames(graphdata)[colnames(graphdata)=="download.x"] = "download.rate"
colnames(graphdata)[colnames(graphdata)=="download.y"] = "download.total"
if(any(is.na(graphdata$download.total))) {
    graphdata[is.na(graphdata$download.total),]$download.total <- 0

"all" option does not work anymore, The new parameter is;

x = pd.merge(df1, df2, how="outer")

Assuming df1 has all the values of x of interest, you could use a dplyr::left_join() to merge and then either a base::replace() or tidyr::replace_na() to replace the NAs as 0s:


# dplyr only:
df_new <- 
  left_join(df1, df2, by = 'x') %>% 
  mutate(y = replace(y, is.na(y), 0))

# dplyr and tidyr:
df_new <- 
  left_join(df1, df2, by = 'x') %>% 
  mutate(y = replace_na(y, 0))

# In the sample data column `x` is a factor, which will give a warning with the join. This can be prevented by converting to a character before the join:
df_new <- 
  left_join(df1 %>% mutate(x = as.character(x)), 
            df2 %>% mutate(x = as.character(x)), 
            by = 'x') %>% 
    mutate(y = replace(y, is.na(y), 0))

Another alternative with data.table.


dt1 <- data.table(df1)
dt2 <- data.table(df2)



