cartesian product in pandas

Question

I have two pandas dataframes   from pandas import DataFrame df1   DataFrame   col1   1 2   col2   3 4    df2   DataFrame   col3   5 6           What is the best practice to get their cartesian product  of course without writing it explicitly like me     df1  df2 cartesian product df cartesian   DataFrame   col1   1 2 1 2   col2   3 4 3 4   col3   5 5 6 6

User · Answer

Presenting to you pandas  gt   1 2 left merge right  how  cross   import pandas as pd   pd   version      1 2 0   left   pd DataFrame   col1    1  2    col2    3  4    right   pd DataFrame   col3    5  6      left merge right  how  cross       col1  col2  col3 0     1     3     5 1     1     3     6 2     2     4     5 3     2     4     6  Indexes are ignored in the result   Implementation wise  this uses the join on common key column method as described in the accepted answer  The upsides of using the API is that it saves you a lot of typing and handles some corner cases pretty well  I d almost always recommend this syntax as my first preference for cartesian product in pandas unless you re looking for something more performant

User · Answer

Use pd MultiIndex from product as an index in an otherwise empty dataframe  then reset its index  and you re done    a    1  2  3  b     a    b    c    index   pd MultiIndex from product  a  b   names     a    b     pd DataFrame index   index  reset index     out      a  b 0  1  a 1  1  b 2  1  c 3  2  a 4  2  b 5  2  c 6  3  a 7  3  b 8  3  c

User · Answer

If you have a key that is repeated for each row  then you can produce a cartesian product using merge  like you would in SQL    from pandas import DataFrame  merge df1   DataFrame   key   1 1    col1   1 2   col2   3 4    df2   DataFrame   key   1 1    col3   5 6     merge df1  df2 on  key     col1    col2    col3      Output      col1  col2  col3 0     1     3     5 1     1     3     6 2     2     4     5 3     2     4     6   See here for the documentation  http   pandas pydata org pandas-docs stable merging html brief-primer-on-merge-methods-relational-algebra

User · Answer

With method chaining    product         df1 assign key 1       merge df2 assign key 1   on  key        drop  key   axis 1

User · Answer

You could start by taking the Cartesian product of df1 col1 and df2 col3  then merge back to df1 to get col2   Here s a general Cartesian product function which takes a dictionary of lists   def cartesian product d       index   pd MultiIndex from product d values    names d keys        return pd DataFrame index index  reset index     Apply as   res   cartesian product   col1   df1 col1   col3   df2 col3   pd merge res  df1  on  col1      col1 col3 col2   0   1    5    3   1   1    6    3   2   2    5    4   3   2    6    4

User · Answer

Here is a helper function to perform a simple Cartesian product with two data frames  The internal logic handles using an internal key  and avoids mangling any columns that happen to be named  key  from either side   import pandas as pd  def cartesian df1  df2          Determine Cartesian product of two data frames         key    key      while key in df1 columns or key in df2 columns          key         key     key d    key  0      return pd merge          df1 assign   key d   df2 assign   key d   on key  drop key  axis 1     Two data frames  where the first happens to have a  key  column df1   pd DataFrame   number   1  2    key   3  4    df2   pd DataFrame   digit    5  6    cartesian df1  df2    shows      number  key  digit 0       1    3      5 1       1    3      6 2       2    4      5 3       2    4      6

User · Answer

If you have no overlapping columns  don t want to add one  and the indices of the data frames can be discarded  this may be easier   df1 index      df2 index      0 df cartesian   df1 join df2  how  outer   df cartesian index      range len df cartesian

User · Answer

As an alternative  one can rely on the cartesian product provided by itertools  itertools product  which avoids creating a temporary key or modifying the index   import numpy as np  import pandas as pd  import itertools  def cartesian df1  df2       rows   itertools product df1 iterrows    df2 iterrows         df   pd DataFrame left append right  for     left       right  in rows      return df reset index drop True    Quick test    In  46   a   pd DataFrame np random rand 5  3   columns   a    b    c     In  47   b   pd DataFrame np random rand 5  3   columns   d    e    f         In  48   cartesian a b  Out 48              a         b         c         d         e         f 0   0 436480  0 068491  0 260292  0 991311  0 064167  0 715142 1   0 436480  0 068491  0 260292  0 101777  0 840464  0 760616 2   0 436480  0 068491  0 260292  0 655391  0 289537  0 391893 3   0 436480  0 068491  0 260292  0 383729  0 061811  0 773627 4   0 436480  0 068491  0 260292  0 575711  0 995151  0 804567 5   0 469578  0 052932  0 633394  0 991311  0 064167  0 715142 6   0 469578  0 052932  0 633394  0 101777  0 840464  0 760616 7   0 469578  0 052932  0 633394  0 655391  0 289537  0 391893 8   0 469578  0 052932  0 633394  0 383729  0 061811  0 773627 9   0 469578  0 052932  0 633394  0 575711  0 995151  0 804567 10  0 466813  0 224062  0 218994  0 991311  0 064167  0 715142 11  0 466813  0 224062  0 218994  0 101777  0 840464  0 760616 12  0 466813  0 224062  0 218994  0 655391  0 289537  0 391893 13  0 466813  0 224062  0 218994  0 383729  0 061811  0 773627 14  0 466813  0 224062  0 218994  0 575711  0 995151  0 804567 15  0 831365  0 273890  0 130410  0 991311  0 064167  0 715142 16  0 831365  0 273890  0 130410  0 101777  0 840464  0 760616 17  0 831365  0 273890  0 130410  0 655391  0 289537  0 391893 18  0 831365  0 273890  0 130410  0 383729  0 061811  0 773627 19  0 831365  0 273890  0 130410  0 575711  0 995151  0 804567 20  0 447640  0 848283  0 627224  0 991311  0 064167  0 715142 21  0 447640  0 848283  0 627224  0 101777  0 840464  0 760616 22  0 447640  0 848283  0 627224  0 655391  0 289537  0 391893 23  0 447640  0 848283  0 627224  0 383729  0 061811  0 773627 24  0 447640  0 848283  0 627224  0 575711  0 995151  0 804567

User · Answer

This won t win a code golf competition  and borrows from the previous answers - but clearly shows how the key is added  and how the join works  This creates 2 new data frames from lists  then adds the key to do the cartesian product on   My use case was that I needed a list of all store IDs on for each week in my list  So  I created a list of all the weeks I wanted to have  then a list of all the store IDs I wanted to map them against   The merge I chose left  but would be semantically the same as inner in this setup  You can see this in the documentation on merging  which states it does a Cartesian product if key combination appears more than once in both tables - which is what we set up   days   pd DataFrame   date  list of days   stores   pd DataFrame   store id  list of stores   stores  key     0 days  key     0 days and stores   days merge stores  how  left   on    key   days and stores drop  key  1  inplace True

User · Answer

Minimal code needed for this one  Create a common  key  to cartesian merge the two   df1  key     0 df2  key     0  df cartesian   df1 merge df2  how  outer

User · Answer

I find using pandas MultiIndex to be the best tool for the job  If you have a list of lists lists list  call pd MultiIndex from product lists list  and iterate over the result  or use it in DataFrame index

[python] cartesian product in pandas

pandas >= 1.2

`left.merge(right, how='cross')`

Examples related to python

Examples related to pandas