How to convert a pandas DataFrame subset of columns AND rows into a numpy array

Question

I m wondering if there is a simpler  memory efficient way to select a subset of rows and columns from a pandas DataFrame   For instance  given this dataframe    df   DataFrame np random rand 4 5   columns   list  abcde    print df            a         b         c         d         e 0  0 945686  0 000710  0 909158  0 892892  0 326670 1  0 919359  0 667057  0 462478  0 008204  0 473096 2  0 976163  0 621712  0 208423  0 980471  0 048334 3  0 459039  0 788318  0 309892  0 100539  0 753992   I want only those rows in which the value for column  c  is greater than 0 5  but I only need columns  b  and  e  for those rows   This is the method that I ve come up with - perhaps there is a better  pandas  way    locs    df columns get loc    for   in   a    d    print df df c   0 5  locs             a         d 0  0 945686  0 892892   My final goal is to convert the result to a numpy array to pass into an sklearn regression algorithm  so I will use the code above like this    training set   array df df c   0 5  locs         and that peeves me since I end up with a huge array copy in memory  Perhaps there s a better way for that too

User · Answer

Use its value directly   In  79   df df c  gt  0 5    b    e    values Out 79    array    0 98836259   0 82403141            0 337358     0 02054435            0 29271728   0 37813099            0 70033513   0 69919695

User · Answer

Perhaps something like this for the first problem  you can simply access the columns by their names    gt  gt  gt  df   pd DataFrame np random rand 4 5   columns   list  abcde     gt  gt  gt  df df  c   gt  5    b   e              b         e 1  0 071146  0 132145 2  0 495152  0 420219   For the second problem    gt  gt  gt  df df  c   gt  5    b   e    values array    0 07114556   0 13214495            0 49515157   0 42021946

User · Answer

loc accept row and column selectors simultaneously  as do  ix  iloc FYI  This is done in a single pass as well   In  1   df   DataFrame np random rand 4 5   columns   list  abcde     In  2   df Out 2              a         b         c         d         e 0  0 669701  0 780497  0 955690  0 451573  0 232194 1  0 952762  0 585579  0 890801  0 643251  0 556220 2  0 900713  0 790938  0 952628  0 505775  0 582365 3  0 994205  0 330560  0 286694  0 125061  0 575153  In  5   df loc df  c   gt 0 5   a   d    Out 5              a         d 0  0 669701  0 451573 1  0 952762  0 643251 2  0 900713  0 505775   And if you want the values  though this should pass directly to sklearn as is   frames support the array interface  In  6   df loc df  c   gt 0 5   a   d    values Out 6    array    0 66970138   0 45157274            0 95276167   0 64325143            0 90071271   0 50577509

[python] How to convert a pandas DataFrame subset of columns AND rows into a numpy array?

Examples related to python

Examples related to arrays

Examples related to numpy

Examples related to pandas

Examples related to scikit-learn