How to take column-slices of dataframe in pandas

Question

I load some machine learning data from a CSV file  The first 2 columns are observations and the remaining columns are features   Currently  I do the following   data   pandas read csv  mydata csv     which gives something like   data   pandas DataFrame np random rand 10 5   columns   list  abcde      I d like to slice this dataframe in two dataframes  one containing the columns a and b and one containing the columns c  d and e   It is not possible to write something like   observations   data   c   features   data  c      I m not sure what the best method is  Do I need a pd Panel   By the way  I find dataframe indexing pretty inconsistent  data  a   is permitted  but data 0  is not  On the other side  data  a    is not permitted but data 0   is  Is there a practical reason for this  This is really confusing if columns are indexed by Int  given that data 0     data 0 1

User · Answer

And if you came here looking for slicing two ranges of columns and combining them together (like me) you can do something like

op = df[list(df.columns[0:899]) + list(df.columns[3593:])]
print op

This will create a new dataframe with first 900 columns and (all) columns > 3593 (assuming you have some 4000 columns in your data set).

User · Answer

You can slice along the columns of a DataFrame by referring to the names of each column in a list  like so   data   pandas DataFrame np random rand 10 5   columns   list  abcde    data ab   data list  ab    data cde   data list  cde

User · Answer

Lets use the titanic dataset from the seaborn package as an example    Load dataset  pip install seaborn   gt  gt  import seaborn apionly as sns  gt  gt  titanic   sns load dataset  titanic        using the column names    gt  gt  titanic loc     sex   age   fare         using the column indices    gt  gt  titanic iloc    2 3 6        using ix  Older than Pandas  lt  20 version     gt  gt  titanic ix       sex       age       fare        or   gt  gt  titanic ix    2 3 6        using the reindex method    gt  gt  titanic reindex columns   sex   age   fare

User · Answer

Also  Given a DataFrame      data   as in your example  if you would like to extract column a and d only  e i  the 1st and the 4th column   iloc mothod from the pandas dataframe is what you need and could be used very effectively  All you need to know is the index of the columns you would like to extract  For example    gt  gt  gt  data iloc    0 3     will give you            a         d 0  0 883283  0 100975 1  0 614313  0 221731 2  0 438963  0 224361 3  0 466078  0 703347 4  0 955285  0 114033 5  0 268443  0 416996 6  0 613241  0 327548 7  0 370784  0 359159 8  0 692708  0 659410 9  0 806624  0 875476

User · Answer

2017 Answer - pandas 0 20   ix is deprecated  Use  loc  See the deprecation in the docs   loc uses label based indexing to select both rows and columns  The labels being the values of the index or the columns  Slicing with  loc includes the last element       Let s assume we have a DataFrame with the following columns    foo  bar  quz  ant  cat  sat  dat      selects all rows and all columns beginning at  foo  up to and including  sat  df loc     foo   sat     foo bar quz ant cat sat    loc accepts the same slice notation that Python lists do for both row and columns  Slice notation being start stop step    slice from  foo  to  cat  by every 2nd column df loc     foo   cat  2    foo quz cat    slice from the beginning to  bar  df loc      bar     foo bar    slice from  quz  to the end by 3 df loc     quz   3    quz sat    attempt from  sat  to  bar  df loc     sat   bar     no columns returned    slice from  sat  to  bar  df loc     sat   bar  -1  sat cat ant quz bar    slice notation is syntatic sugar for the slice function   slice from  quz  to the end by 2 with slice function df loc    slice  quz  None  2     quz cat dat    select specific columns with a list   select columns foo  bar and dat df loc      foo   bar   dat      foo bar dat   You can slice by rows and columns  For instance  if you have 5 rows with labels v  w  x  y  z    slice from  w  to  y  and  foo  to  ant  by 3 df loc  w   y    foo   ant  3       foo ant   w   x   y

User · Answer

Its equivalent    gt  gt  gt  print df2 loc 140 160   Relevance   Title       gt  gt  gt  print df2 ix 140 160  3 7

User · Answer

if Data frame look like that   group         name      count fruit         apple     90 fruit         banana    150 fruit         orange    130 vegetable     broccoli  80 vegetable     kale      70 vegetable     lettuce   125   and OUTPUT could be like     group    name  count 0  fruit   apple     90 1  fruit  banana    150 2  fruit  orange    130   if you use logical operator np logical not  df np logical not df  group       vegetable      more about   https   docs scipy org doc numpy-1 13 0 reference routines logic html  other logical operators    logical and x1  x2      out  where          Compute the truth value of x1 AND x2 element-wise   logical or x1  x2      out  where  casting          Compute the truth value of x1 OR x2 element-wise  logical not x      out  where  casting          Compute the truth value of NOT x element-wise   logical xor x1  x2      out  where          Compute the truth value of x1 XOR x2  element-wise

User · Answer

Note   ix has been deprecated since Pandas v0 20  You should instead use  loc or  iloc  as appropriate   The DataFrame ix index is what you want to be accessing  It s a little confusing  I agree that Pandas indexing is perplexing at times    but the following seems to do what you want    gt  gt  gt  df   DataFrame np random rand 4 5   columns   list  abcde     gt  gt  gt  df ix    b          b         c         d         e 0  0 418762  0 042369  0 869203  0 972314 1  0 991058  0 510228  0 594784  0 534366 2  0 407472  0 259811  0 396664  0 894202 3  0 726168  0 139531  0 324932  0 906575   where  ix row slice  column slice  is what is being interpreted  More on Pandas indexing here  http   pandas pydata org pandas-docs stable indexing html indexing-advanced

User · Answer

Another way to get a subset of columns from your DataFrame  assuming you want all the rows  would be to do  data   a   b    and data   c   d   e    If you want to use numerical column indexes you can do  data data columns  2   and data data columns 2

User · Answer

Here s how you could use different methods to do selective column slicing  including selective label based  index based and the selective ranges based column slicing   In  37   import pandas as pd     In  38   import numpy as np In  43   df   pd DataFrame np random rand 4 7   columns   list  abcdefg     In  44   df Out 44              a         b         c         d         e         f         g 0  0 409038  0 745497  0 890767  0 945890  0 014655  0 458070  0 786633 1  0 570642  0 181552  0 794599  0 036340  0 907011  0 655237  0 735268 2  0 568440  0 501638  0 186635  0 441445  0 703312  0 187447  0 604305 3  0 679125  0 642817  0 697628  0 391686  0 698381  0 936899  0 101806  In  45   df loc      a    b    c       label based selective column slicing  Out 45              a         b         c 0  0 409038  0 745497  0 890767 1  0 570642  0 181552  0 794599 2  0 568440  0 501638  0 186635 3  0 679125  0 642817  0 697628  In  46   df loc     a   c      label based column ranges slicing  Out 46              a         b         c 0  0 409038  0 745497  0 890767 1  0 570642  0 181552  0 794599 2  0 568440  0 501638  0 186635 3  0 679125  0 642817  0 697628  In  47   df iloc    0 3     index based column ranges slicing  Out 47              a         b         c 0  0 409038  0 745497  0 890767 1  0 570642  0 181552  0 794599 2  0 568440  0 501638  0 186635 3  0 679125  0 642817  0 697628      with 2 different column ranges  index based slicing   In  49   df df columns 0 1  tolist     df columns 1 3  tolist    Out 49              a         b         c 0  0 409038  0 745497  0 890767 1  0 570642  0 181552  0 794599 2  0 568440  0 501638  0 186635 3  0 679125  0 642817  0 697628

[python] How to take column-slices of dataframe in pandas

Examples related to python

Examples related to pandas

Examples related to numpy

Examples related to dataframe

Examples related to slice