Select multiple columns by labels in pandas

Question

I ve been looking around for ways to select columns through the python documentation and the forums but every example on indexing columns are too simplistic    Suppose I have a 10 x 10 dataframe  df   DataFrame randn 10  10   index range 0 10   columns   A    B    C    D   E   F   G   H   I   J      So far  all the documentations gives is just a simple example of indexing like  subset   df loc    A   C     or  subset   df loc    C      But I get an error when I try index multiple  non-sequential columns  like this  subset   df loc     A   C    E      How would I index in Pandas if I wanted to select column A to C  E  and G to I  It appears that this logic will not work  subset   df loc     A   C    E    G   I      I feel that the solution is pretty simple  but I can t get around this error  Thanks

User · Answer

How do I select multiple columns by labels in pandas?

Multiple label-based range slicing is not easily supported with pandas, but position-based slicing is, so let's try that instead:

loc = df.columns.get_loc
df.iloc[:, np.r_[loc('A'):loc('C')+1, loc('E'), loc('G'):loc('I')+1]]

          A         B         C         E         G         H         I
0 -1.666330  0.321260 -1.768185 -0.034774  0.023294  0.533451 -0.241990
1  0.911498  3.408758  0.419618 -0.462590  0.739092  1.103940  0.116119
2  1.243001 -0.867370  1.058194  0.314196  0.887469  0.471137 -1.361059
3 -0.525165  0.676371  0.325831 -1.152202  0.606079  1.002880  2.032663
4  0.706609 -0.424726  0.308808  1.994626  0.626522 -0.033057  1.725315
5  0.879802 -1.961398  0.131694 -0.931951 -0.242822 -1.056038  0.550346
6  0.199072  0.969283  0.347008 -2.611489  0.282920 -0.334618  0.243583
7  1.234059  1.000687  0.863572  0.412544  0.569687 -0.684413 -0.357968
8 -0.299185  0.566009 -0.859453 -0.564557 -0.562524  0.233489 -0.039145
9  0.937637 -2.171174 -1.940916 -1.553634  0.619965 -0.664284 -0.151388

Note that the +1 is added because when using iloc the rightmost index is exclusive.

Comments on Other Solutions

filter is a nice and simple method for OP's headers, but this might not generalise well to arbitrary column names.
The "location-based" solution with loc is a little closer to the ideal, but you cannot avoid creating intermediate DataFrames (that are eventually thrown out and garbage collected) to compute the final result range -- something that we would ideally like to avoid.
Lastly, "pick your columns directly" is good advice as long as you have a manageably small number of columns to pick. It will, however not be applicable in some cases where ranges span dozens (or possibly hundreds) of columns.

User · Answer

Name- or Label-Based  using regular expression syntax  df filter regex   A-CEG-I        does NOT depend on the column order  Note that any regular expression is allowed here  so this approach can be very general   E g  if you wanted all columns starting with a capital or lowercase  quot A quot  you could use   df filter regex    Aa    Location-Based  depends on column order  df  list df loc    A   C        E     list df loc    G   I       Note that unlike the label-based method  this only works if your columns are alphabetically sorted   This is not necessarily a problem  however   For example  if your columns go   A   C   B    then you could replace  A   C  above with  A   B   The Long Way And for completeness  you always have the option shown by  Magdalena of simply listing each column individually  although it could be much more verbose as the number of columns increases  df   A   B   C   E   G   H   I        does NOT depend on the column order  Results for any of the above methods           A         B         C         E         G         H         I 0 -0 814688 -1 060864 -0 008088  2 697203 -0 763874  1 793213 -0 019520 1  0 549824  0 269340  0 405570 -0 406695 -0 536304 -1 231051  0 058018 2  0 879230 -0 666814  1 305835  0 167621 -1 100355  0 391133  0 317467

User · Answer

Just pick the columns you want directly      df   A   E   I   C

[python] Select multiple columns by labels in pandas

How do I select multiple columns by labels in pandas?

Comments on Other Solutions

Examples related to python

Examples related to pandas