Convert pandas dataframe to NumPy array

Question

I am interested in knowing how to convert a pandas dataframe into a NumPy array   dataframe   import numpy as np import pandas as pd  index    1  2  3  4  5  6  7  a    np nan  np nan  np nan  0 1  0 1  0 1  0 1  b    0 2  np nan  0 2  0 2  0 2  np nan  np nan  c    np nan  0 5  0 5  np nan  0 5  0 5  np nan  df   pd DataFrame   A   a   B   b   C   c   index index  df   df rename axis  ID     gives  label   A    B    C ID                                  1   NaN  0 2  NaN 2   NaN  NaN  0 5 3   NaN  0 2  0 5 4   0 1  0 2  NaN 5   0 1  0 2  0 5 6   0 1  NaN  0 5 7   0 1  NaN  NaN   I would like to convert this to a NumPy array  as so   array    nan   0 2   nan            nan   nan   0 5            nan   0 2   0 5            0 1   0 2   nan            0 1   0 2   0 5            0 1   nan   0 5            0 1   nan   nan      How can I do this      As a bonus  is it possible to preserve the dtypes  like this   array    1  nan   0 2   nan            2  nan   nan   0 5            3  nan   0 2   0 5            4  0 1   0 2   nan            5  0 1   0 2   0 5            6  0 1   nan   0 5            7  0 1   nan   nan         dtype    ID     lt i4      A     lt f8      B     lt f8      B     lt f8       or similar

User · Answer

Just had a similar problem when exporting from dataframe to arcgis table and stumbled on a solution from usgs  https   my usgs gov confluence display cdi pandas DataFrame to ArcGIS Table   In short your problem has a similar solution   df        A    B    C ID                1   NaN  0 2  NaN 2   NaN  NaN  0 5 3   NaN  0 2  0 5 4   0 1  0 2  NaN 5   0 1  0 2  0 5 6   0 1  NaN  0 5 7   0 1  NaN  NaN  np data   np array np rec fromrecords df values   np names   df dtypes index tolist   np data dtype names   tuple  name encode  UTF8   for name in np names    np data  array    nan   0 2   nan     nan   nan   0 5     nan   0 2   0 5            0 1   0 2   nan     0 1   0 2   0 5     0 1   nan   0 5            0 1   nan   nan           dtype  numpy record     A     lt f8      B     lt f8      C     lt f8

User · Answer

A Simpler Way for Example DataFrame    df           gbm       nnet        reg 0  12 097439  12 047437  12 100953 1  12 109811  12 070209  12 095288 2  11 720734  11 622139  11 740523 3  11 824557  11 926414  11 926527 4  11 800868  11 727730  11 729737 5  12 490984  12 502440  12 530894   USE   np array df to records   view type np matrix     GET   array    0  12 097439    12 047437  12 10095324            1  12 10981081  12 070209  12 09528824            2  11 72073428  11 622139  11 74052253            3  11 82455653  11 926414  11 92652727            4  11 80086775  11 72773   11 72973699            5  12 49098389  12 50244   12 53089367     dtype  numpy record     index     lt i8      gbm     lt f8      nnet     lt f4             reg     lt f8

User · Answer

df to numpy   is better than df values  here s why   It s time to deprecate your usage of values and as matrix    pandas v0 24 0 introduced two new methods for obtaining NumPy arrays from pandas objects   to numpy    which is defined on Index  Series  and DataFrame objects  and array  which is defined on Index and Series objects only   If you visit the v0 24 docs for  values  you will see a big red warning that says   Warning  We recommend using DataFrame to numpy   instead   See this section of the v0 24 0 release notes  and this answer for more information    - to numpy   is my recommended method for any production code that needs to run reliably for many versions into the future  However if you re just making a scratchpad in jupyter or the terminal  using  values to save a few milliseconds of typing is a permissable exception  You can always add the fit n finish later    Towards Better Consistency  to numpy   In the spirit of better consistency throughout the API  a new method to numpy has been introduced to extract the underlying NumPy array from DataFrames    Setup df   pd DataFrame data   A    1  2  3    B    4  5  6    C    7  8  9                       index   a    b    c       Convert the entire DataFrame df to numpy     array   1  4  7             2  5  8             3  6  9       Convert specific columns df   A    C    to numpy     array   1  7             2  8             3  9     As mentioned above  this method is also defined on Index and Series objects  see here   df index to numpy     array   a    b    c    dtype object   df  A   to numpy      array  1  2  3    By default  a view is returned  so any modifications made will affect the original  v   df to numpy   v 0  0    -1   df    A  B  C a -1  4  7 b  2  5  8 c  3  6  9  If you need a copy instead  use to numpy copy True    pandas  gt   1 0 update for ExtensionTypes If you re using pandas 1 x  chances are you ll be dealing with extension types a lot more  You ll have to be a little more careful that these extension types are correctly converted  a   pd array  1  2  None   dtype  quot Int64 quot                                     a                                                                             lt IntegerArray gt   1  2   lt NA gt   Length  3  dtype  Int64     Wrong a to numpy                                                                    array  1  2   lt NA gt    dtype object     yuck  objects    Correct a to numpy dtype  float   na value np nan                                     array   1    2   nan      Also correct a to numpy dtype  int   na value -1    array   1   2  -1    This is called out in the docs   If you need the dtypes in the result    As shown in another answer  DataFrame to records is a good way to do this  df to records     rec array    a   1  4  7     b   2  5  8     c   3  6  9                dtype    index    O      A     lt i8      B     lt i8      C     lt i8      This cannot be done with to numpy  unfortunately  However  as an alternative  you can use np rec fromrecords  v   df reset index   np rec fromrecords v  names v columns tolist      rec array    a   1  4  7     b   2  5  8     c   3  6  9                dtype    index     lt U1      A     lt i8      B     lt i8      C     lt i8      Performance wise  it s nearly the same  actually  using rec fromrecords is a bit faster   df2   pd concat  df    10000    timeit df2 to records     timeit v   df2 reset index   np rec fromrecords v  names v columns tolist     12 9 ms    511   s per loop  mean    std  dev  of 7 runs  100 loops each  9 56 ms    291   s per loop  mean    std  dev  of 7 runs  100 loops each     Rationale for Adding a New Method to numpy    in addition to array  was added as a result of discussions under two GitHub issues GH19954 and GH23623  Specifically  the docs mention the rationale         with  values it was unclear whether the returned value would be the actual array  some transformation of it  or one of pandas custom arrays  like Categorical   For example  with PeriodIndex   values generates a new ndarray of period objects each time         to numpy aims to improve the consistency of the API  which is a major step in the right direction   values will not be deprecated in the current version  but I expect this may happen at some point in the future  so I would urge users to migrate towards the newer API  as soon as you can    Critique of Other Solutions DataFrame values has inconsistent behaviour  as already noted  DataFrame get values   is simply a wrapper around DataFrame values  so everything said above applies  DataFrame as matrix   is deprecated now  do NOT use

User · Answer

Try this   np array df    array    ID   nan  nan  nan        1   nan  0 2  nan        2   nan  nan  0 5        3   nan  0 2  0 5        4   0 1  0 2  nan        5   0 1  0 2  0 5        6   0 1  nan  0 5        7   0 1  nan  nan    dtype object    Some more information at   https   docs scipy org doc numpy reference generated numpy array html  Valid for numpy 1 16 5 and pandas 0 25 2

User · Answer

Two ways to convert the data-frame to its Numpy-array representation    mah np array   df as matrix columns None  mah np array   df values   Doc  https   pandas pydata org pandas-docs stable generated pandas DataFrame as matrix html

User · Answer

Note  The  as matrix   method used in this answer is deprecated  Pandas 0 23 4 warns      Method  as matrix will be removed in a future version  Use  values instead      Pandas has something built in     numpy matrix   df as matrix     gives  array   nan  0 2  nan           nan  nan  0 5           nan  0 2  0 5           0 1  0 2  nan           0 1  0 2  0 5           0 1  nan  0 5           0 1  nan  nan

User · Answer

I would just chain the DataFrame reset index   and DataFrame values functions to get the Numpy representation of the dataframe  including the index   In  8   df Out 8              A         B         C 0 -0 982726  0 150726  0 691625 1  0 617297 -0 471879  0 505547 2  0 417123 -1 356803 -1 013499 3 -0 166363 -0 957758  1 178659 4 -0 164103  0 074516 -0 674325 5 -0 340169 -0 293698  1 231791 6 -1 062825  0 556273  1 508058 7  0 959610  0 247539  0 091333   8 rows x 3 columns   In  9   df reset index   values Out 9   array    0           -0 98272574   0 150726     0 69162512            1            0 61729734  -0 47187926   0 50554728            2            0 4171228   -1 35680324  -1 01349922            3           -0 16636303  -0 95775849   1 17865945            4           -0 16410334   0 0745164   -0 67432474            5           -0 34016865  -0 29369841   1 23179064            6           -1 06282542   0 55627285   1 50805754            7            0 95961001   0 24753911   0 09133339      To get the dtypes we d need to transform this ndarray into a structured array using view   In  10   df reset index   values ravel   view dtype    index   int     A   float     B   float     C   float    Out 10   array    0  -0 98272574   0 150726     0 69162512            1   0 61729734  -0 47187926   0 50554728            2   0 4171228   -1 35680324  -1 01349922            3  -0 16636303  -0 95775849   1 17865945            4  -0 16410334   0 0745164   -0 67432474            5  -0 34016865  -0 29369841   1 23179064            6  -1 06282542   0 55627285   1 50805754            7   0 95961001   0 24753911   0 09133339          dtype    index     lt i8      A     lt f8      B     lt f8      C     lt f8

User · Answer

Further to meteore s answer  I found the code  df index   df index astype  i8     doesn t work for me  So I put my code here for the convenience of others stuck with this issue   city cluster df   pd read csv text filepath  encoding  utf-8     the field  city en  is a string  when converted to Numpy array  it will be an object city cluster arr   city cluster df   city en   lat   lon   cluster   cluster filtered    to records   descr city cluster arr dtype descr   change the field  city en  to string type  the index for  city en  here is 1 because before the field is the row index of dataframe  descr 1   descr 1  0    S20   newArr city cluster arr astype np dtype descr

User · Answer

You can use the to records method  but have to play around a bit with the dtypes if they are not what you want from the get go  In my case  having copied your DF from a string  the index type is string  represented by an object dtype in pandas    In  102   df Out 102    label    A    B    C ID                   1      NaN  0 2  NaN 2      NaN  NaN  0 5 3      NaN  0 2  0 5 4      0 1  0 2  NaN 5      0 1  0 2  0 5 6      0 1  NaN  0 5 7      0 1  NaN  NaN  In  103   df index dtype Out 103   dtype  object   In  104   df to records   Out 104    rec array   1  nan  0 2  nan    2  nan  nan  0 5    3  nan  0 2  0 5           4  0 1  0 2  nan    5  0 1  0 2  0 5    6  0 1  nan  0 5           7  0 1  nan  nan           dtype    index     O8      A     lt f8      B     lt f8      C     lt f8     In  106   df to records   dtype Out 106   dtype    index     O8      A     lt f8      B     lt f8      C     lt f8       Converting the recarray dtype does not work for me  but one can do this in Pandas already   In  109   df index   df index astype  i8   In  111   df to records   view    ID     lt i8      A     lt f8      B     lt f8      C     lt f8     Out 111   rec array   1  nan  0 2  nan    2  nan  nan  0 5    3  nan  0 2  0 5           4  0 1  0 2  nan    5  0 1  0 2  0 5    6  0 1  nan  0 5           7  0 1  nan  nan           dtype    ID     lt i8      A     lt f8      B     lt f8      C     lt f8       Note that Pandas does not set the name of the index properly  to ID  in the exported record array  a bug    so we profit from the type conversion to also correct for that    At the moment Pandas has only 8-byte integers  i8  and floats  f8  see this issue

User · Answer

A simple way to convert dataframe to numpy array   import pandas as pd df   pd DataFrame   A    1  2    B    3  4    df to array   df to numpy   array   1  3       2  4      Use of to numpy is encouraged to preserve consistency   Reference  https   pandas pydata org pandas-docs stable reference api pandas DataFrame to numpy html

User · Answer

Here is my approach to making a structure array from a pandas DataFrame   Create the data frame  import pandas as pd import numpy as np import six  NaN   float  nan   ID    1  2  3  4  5  6  7  A    NaN  NaN  NaN  0 1  0 1  0 1  0 1  B    0 2  NaN  0 2  0 2  0 2  NaN  NaN  C    NaN  0 5  0 5  NaN  0 5  0 5  NaN  columns     A  A   B  B   C  C  df   pd DataFrame columns  index ID  df index name    ID  print df         A    B    C ID                1   NaN  0 2  NaN 2   NaN  NaN  0 5 3   NaN  0 2  0 5 4   0 1  0 2  NaN 5   0 1  0 2  0 5 6   0 1  NaN  0 5 7   0 1  NaN  NaN   Define function to make a numpy structure array  not a record array  from a pandas DataFrame   def df to sarray df               Convert a pandas DataFrame object to a numpy structured array      This is functionally equivalent to but more efficient than     np array df to array          param df  the data frame to convert      return  a numpy structured array representation of df              v   df values     cols   df columns      if six PY2     python 2 needs  encode   but 3 does not         types     cols i  encode    df k  dtype type  for  i  k  in enumerate cols       else          types     cols i   df k  dtype type  for  i  k  in enumerate cols       dtype   np dtype types      z   np zeros v shape 0   dtype      for  i  k  in enumerate z dtype names           z k    v    i      return z   Use reset index to make a new data frame that includes the index as part of its data  Convert that data frame to a structure array   sa   df to sarray df reset index    sa  array   1L  nan  0 2  nan    2L  nan  nan  0 5    3L  nan  0 2  0 5           4L  0 1  0 2  nan    5L  0 1  0 2  0 5    6L  0 1  nan  0 5           7L  0 1  nan  nan           dtype    ID     lt i8      A     lt f8      B     lt f8      C     lt f8       EDIT  Updated df to sarray to avoid error calling  encode   with python 3  Thanks to Joseph Garvin and halcyon  for their comment and solution

User · Answer

To convert a pandas dataframe  df  to a numpy ndarray  use this code   df values  array   nan  0 2  nan           nan  nan  0 5           nan  0 2  0 5           0 1  0 2  nan           0 1  0 2  0 5           0 1  nan  0 5           0 1  nan  nan

User · Answer

Try this   a   numpy asarray df

User · Answer

It seems like df to records   will work for you  The exact feature you re looking for was requested and to records pointed to as an alternative   I tried this out locally using your example  and that call yields something very similar to the output you were looking for   rec array   1  nan  0 2  nan    2  nan  nan  0 5    3  nan  0 2  0 5           4  0 1  0 2  nan    5  0 1  0 2  0 5    6  0 1  nan  0 5           7  0 1  nan  nan          dtype   u ID     lt i8     u A     lt f8     u B     lt f8     u C     lt f8       Note that this is a recarray rather than an array  You could move the result in to regular numpy array by calling its constructor as np array df to records

User · Answer

I went through the answers above  The  as matrix    method works but its obsolete now  For me  What worked was   to numpy       This returns a multidimensional array  I ll prefer using this method if you re reading data from excel sheet and you need to access data from any index  Hope this helps

[python] Convert pandas dataframe to NumPy array

Examples related to python

Examples related to arrays

Examples related to pandas

Examples related to numpy

Examples related to dataframe