What is the most efficient way to create a dictionary of two pandas Dataframe columns

Question

What is the most efficient way to organise the following pandas Dataframe   data    Position    Letter 1           a 2           b 3           c 4           d 5           e   into a dictionary like alphabet 1    a   2    b   3    c   4    d   5    e

User · Answer

In Python 3 6 the fastest way is still the WouterOvermeire one  Kikohs  proposal is slower than the other two options   import timeit  setup       import pandas as pd import numpy as np df   pd DataFrame np random randint 32  120  100000  reshape 50000 2  columns list  AB    df  A     df  A   apply chr       timeit Timer  dict zip df A df B     setup setup  repeat 7 500  timeit Timer  pd Series df A values index df B  to dict     setup setup  repeat 7 500  timeit Timer  df set index  A   to dict    B     setup setup  repeat 7 500    Results   1 1214002349999777 s    WouterOvermeire 1 1922008498571748 s    Jeff 1 7034366211428602 s    Kikohs

User · Answer

TL DR   gt  gt  gt  import pandas as pd  gt  gt  gt  df   pd DataFrame   Position   1 2 3 4 5    Letter    a    b    c    d    e      gt  gt  gt  dict sorted df values tolist       Sort of sorted       a   1   b   2   c   3   d   4   e   5   gt  gt  gt  from collections import OrderedDict  gt  gt  gt  OrderedDict df values tolist    OrderedDict    a   1     b   2     c   3     d   4     e   5      In Long  Explaining solution  dict sorted df values tolist      Given   df   pd DataFrame   Position   1 2 3 4 5    Letter    a    b    c    d    e        out     Letter Position 0   a   1 1   b   2 2   c   3 3   d   4 4   e   5   Try     Get the values out to a 2-D numpy array   df values    out    array    a   1            b   2            c   3            d   4            e   5    dtype object    Then optionally      Dump it into a list so that you can sort it using  sorted    sorted df values tolist      Sort by key   Or     Sort by value  from operator import itemgetter sorted df values tolist    key itemgetter 1      out       a   1     b   2     c   3     d   4     e   5     Lastly  cast the list of list of 2 elements into a dict    dict sorted df values tolist         out      a   1   b   2   c   3   d   4   e   5      Related  Answering  sbradbio comment   If there are multiple values for a specific key and you would like to keep all of them  it s the not the most efficient but the most intuitive way is   from collections import defaultdict import pandas as pd  multivalue dict   defaultdict list   df   pd DataFrame   Position   1 2 4 4 4    Letter    a    b    d    e    f      for idx row in df iterrows        multivalue dict row  Position    append row  Letter       out     gt  gt  gt  print multivalue dict  defaultdict list   1    a    2    b    4    d    e    f

User · Answer

I found a faster way to solve the problem  at least on realistically large datasets using  df set index KEY  to dict   VALUE   Proof on 50 000 rows   df   pd DataFrame np random randint 32  120  100000  reshape 50000 2  columns list  AB    df  A     df  A   apply chr    timeit dict zip df A df B    timeit pd Series df A values index df B  to dict    timeit df set index  A   to dict    B     Output   100 loops  best of 3  7 04 ms per loop    WouterOvermeire 100 loops  best of 3  9 83 ms per loop    Jeff 100 loops  best of 3  4 28 ms per loop    Kikohs  me

User · Answer

In  9   pd Series df Letter values index df Position  to dict   Out 9    1   a   2   b   3   c   4   d   5   e     Speed comparion  using Wouter s method   In  6   df   pd DataFrame randint 0 10 10000  reshape 5000 2  columns list  AB     In  7    timeit dict zip df A df B   1000 loops  best of 3  1 27 ms per loop  In  8    timeit pd Series df A values index df B  to dict   1000 loops  best of 3  987 us per loop

[python] What is the most efficient way to create a dictionary of two pandas Dataframe columns?

Examples related to python

Examples related to dictionary

Examples related to pandas

Examples related to dataframe