If I understand you correctly, you can use a combination of Series.isin()
and DataFrame.append()
:
In [80]: df1
Out[80]:
rating user_id
0 2 0x21abL
1 1 0x21abL
2 1 0xdafL
3 0 0x21abL
4 4 0x1d14L
5 2 0x21abL
6 1 0x21abL
7 0 0xdafL
8 4 0x1d14L
9 1 0x21abL
In [81]: df2
Out[81]:
rating user_id
0 2 0x1d14L
1 1 0xdbdcad7
2 1 0x21abL
3 3 0x21abL
4 3 0x21abL
5 1 0x5734a81e2
6 2 0x1d14L
7 0 0xdafL
8 0 0x1d14L
9 4 0x5734a81e2
In [82]: ind = df2.user_id.isin(df1.user_id) & df1.user_id.isin(df2.user_id)
In [83]: ind
Out[83]:
0 True
1 False
2 True
3 True
4 True
5 False
6 True
7 True
8 True
9 False
Name: user_id, dtype: bool
In [84]: df1[ind].append(df2[ind])
Out[84]:
rating user_id
0 2 0x21abL
2 1 0xdafL
3 0 0x21abL
4 4 0x1d14L
6 1 0x21abL
7 0 0xdafL
8 4 0x1d14L
0 2 0x1d14L
2 1 0x21abL
3 3 0x21abL
4 3 0x21abL
6 2 0x1d14L
7 0 0xdafL
8 0 0x1d14L
This is essentially the algorithm you described as "clunky", using idiomatic pandas
methods. Note the duplicate row indices. Also, note that this won't give you the expected output if df1
and df2
have no overlapping row indices, i.e., if
In [93]: df1.index & df2.index
Out[93]: Int64Index([], dtype='int64')
In fact, it won't give the expected output if their row indices are not equal.