Pandas convert dataframe to array of tuples

Question

I have manipulated some data using pandas and now I want to carry out a batch save back to the database  This requires me to convert the dataframe into an array of tuples  with each tuple corresponding to a  row  of the dataframe   My DataFrame looks something like   In  182   data set Out 182      index data date   data 1  data 2 0  14303 2012-02-17  24 75   25 03  1  12009 2012-02-16  25 00   25 07  2  11830 2012-02-15  24 99   25 15  3  6274  2012-02-14  24 68   25 05  4  2302  2012-02-13  24 62   24 77  5  14085 2012-02-10  24 38   24 61    I want to convert it to an array of tuples like     datetime date 2012 2 17  24 75 25 03    datetime date 2012 2 16  25 00 25 07      etc      Any suggestion on how I can efficiently do this

User · Answer

try this one   tuples   list zip data set  data date    data set  data 1   data set  data 2     print  tuples

User · Answer

list data set itertuples index False     As of 17 1  the above will return a list of namedtuples   If you want a list of ordinary tuples  pass name None as an argument   list data set itertuples index False  name None

User · Answer

How about   subset   data set   data date    data 1    data 2    tuples    tuple x  for x in subset to numpy      for pandas  lt  0 24 use   tuples    tuple x  for x in subset values

User · Answer

Here s a vectorized approach  assuming the dataframe  data set to be defined as df instead  that returns a list of tuples as shown    gt  gt  gt  df set index   data date      data 1    data 2    to records   tolist     produces     datetime datetime 2012  2  17  0  0   24 75  25 03     datetime datetime 2012  2  16  0  0   25 0  25 07     datetime datetime 2012  2  15  0  0   24 99  25 15     datetime datetime 2012  2  14  0  0   24 68  25 05     datetime datetime 2012  2  13  0  0   24 62  24 77     datetime datetime 2012  2  10  0  0   24 38  24 61     The idea of setting datetime column as the index axis is to aid in the conversion of the Timestamp value to it s corresponding datetime datetime format equivalent by making use of the convert datetime64 argument in DF to records which does so for a DateTimeIndex dataframe   This returns a recarray which could be then made to return a list using  tolist    More generalized solution depending on the use case would be   df to records   tolist                                  Supply index False to exclude index

User · Answer

Changing the data frames list into a list of tuples   df   pd DataFrame   col1    1  2  3    col2    4  5  6    print df  OUTPUT    col1  col2 0     1     4 1     2     5 2     3     6  records   df to records index False  result   list records  print result  OUTPUT   1  4    2  5    3  6

User · Answer

A generic way    tuple x  for x in data set to records index False

User · Answer

This answer doesn t add any answers that aren t already discussed  but here are some speed results  I think this should resolve questions that came up in the comments  All of these look like they are O n   based on these three values   TL DR  tuples   list df itertuples index False  name None   and tuples   list zip   df c  values tolist   for c in df    are tied for the fastest   I did a quick speed test on results for three suggestions here    The zip answer from  pirsquared  tuples   list zip   df c  values tolist   for c in df    The accepted answer from  wes-mckinney  tuples    tuple x  for x in df values  The itertuples answer from  ksindi with the name None suggestion from  Axel  tuples   list df itertuples index False  name None     from numpy import random import pandas as pd   def create random df n       return pd DataFrame   A   random randint n  size n    B   random randint n  size n      Small size   df   create random df 10000   timeit tuples   list zip   df c  values tolist   for c in df     timeit tuples    tuple x  for x in df values   timeit tuples   list df itertuples index False  name None     Gives   1 66 ms    200   s per loop  mean    std  dev  of 7 runs  1000 loops each  15 5 ms    1 52 ms per loop  mean    std  dev  of 7 runs  100 loops each  1 74 ms    75 4   s per loop  mean    std  dev  of 7 runs  1000 loops each    Larger   df   create random df 1000000   timeit tuples   list zip   df c  values tolist   for c in df     timeit tuples    tuple x  for x in df values   timeit tuples   list df itertuples index False  name None     Gives   202 ms    5 91 ms per loop  mean    std  dev  of 7 runs  10 loops each  1 52 s    98 1 ms per loop  mean    std  dev  of 7 runs  1 loop each  209 ms    11 8 ms per loop  mean    std  dev  of 7 runs  10 loops each    As much patience as I have   df   create random df 10000000   timeit tuples   list zip   df c  values tolist   for c in df     timeit tuples    tuple x  for x in df values   timeit tuples   list df itertuples index False  name None     Gives   1 78 s    118 ms per loop  mean    std  dev  of 7 runs  1 loop each  15 4 s    222 ms per loop  mean    std  dev  of 7 runs  1 loop each  1 68 s    96 3 ms per loop  mean    std  dev  of 7 runs  1 loop each    The zip version and the itertuples version are within the confidence intervals each other  I suspect that they are doing the same thing under the hood   These speed tests are probably irrelevant though  Pushing the limits of my computer s memory doesn t take a huge amount of time  and you really shouldn t be doing this on a large data set  Working with those tuples after doing this will end up being really inefficient  It s unlikely to be a major bottleneck in your code  so just stick with the version you think is most readable

User · Answer

Motivation Many data sets are large enough that we need to concern ourselves with speed efficiency   So I offer this solution in that spirit   It happens to also be succinct   For the sake of comparison  let s drop the index column  df   data set drop  index   1    Solution I ll propose the use of zip and map  list zip  map df get  df        2012-02-17   24 75  25 03      2012-02-16   25 0  25 07      2012-02-15   24 99  25 15      2012-02-14   24 68  25 05      2012-02-13   24 62  24 77      2012-02-10   24 38  24 61     It happens to also be flexible if we wanted to deal with a specific subset of columns   We ll assume the columns we ve already displayed are the subset we want   list zip  map df get    data date    data 1    data 2          2012-02-17   24 75  25 03      2012-02-16   25 0  25 07      2012-02-15   24 99  25 15      2012-02-14   24 68  25 05      2012-02-13   24 62  24 77      2012-02-10   24 38  24 61       What is Quicker   Turn s out records is quickest followed by asymptotically converging zipmap and iter tuples  I ll use a library simple benchmarks that I got from this post  from simple benchmark import BenchmarkBuilder b   BenchmarkBuilder    import pandas as pd import numpy as np  def tuple comp df   return  tuple x  for x in df to numpy    def iter namedtuples df   return list df itertuples index False   def iter tuples df   return list df itertuples index False  name None   def records df   return df to records index False  tolist   def zipmap df   return list zip  map df get  df     funcs    tuple comp  iter namedtuples  iter tuples  records  zipmap  for func in funcs      b add function   func   def creator n       return pd DataFrame   A   random randint n  size n    B   random randint n  size n      b add arguments  Rows in DataFrame   def argument provider        for n in  10     np arange 4  11    2   astype int           yield n  creator n   r   b run     Check the results  r to pandas dataframe   pipe lambda d  d div d min 1   0            tuple comp  iter namedtuples  iter tuples   records    zipmap 100       2 905662          6 626308     3 450741  1 469471  1 000000 316       4 612692          4 814433     2 375874  1 096352  1 000000 1000      6 513121          4 106426     1 958293  1 000000  1 316303 3162      8 446138          4 082161     1 808339  1 000000  1 533605 10000     8 424483          3 621461     1 651831  1 000000  1 558592 31622     7 813803          3 386592     1 586483  1 000000  1 515478 100000    7 050572          3 162426     1 499977  1 000000  1 480131     r plot

User · Answer

More pythonic way   df   data set   data date    data 1    data 2    map tuple df values

User · Answer

The most efficient and easy way   list data set to records      You can filter the columns you need before this call

[python] Pandas convert dataframe to array of tuples

Examples related to python

Examples related to pandas