How do I select multiple columns by labels in pandas?
Multiple label-based range slicing is not easily supported with pandas, but position-based slicing is, so let's try that instead:
loc = df.columns.get_loc
df.iloc[:, np.r_[loc('A'):loc('C')+1, loc('E'), loc('G'):loc('I')+1]]
A B C E G H I
0 -1.666330 0.321260 -1.768185 -0.034774 0.023294 0.533451 -0.241990
1 0.911498 3.408758 0.419618 -0.462590 0.739092 1.103940 0.116119
2 1.243001 -0.867370 1.058194 0.314196 0.887469 0.471137 -1.361059
3 -0.525165 0.676371 0.325831 -1.152202 0.606079 1.002880 2.032663
4 0.706609 -0.424726 0.308808 1.994626 0.626522 -0.033057 1.725315
5 0.879802 -1.961398 0.131694 -0.931951 -0.242822 -1.056038 0.550346
6 0.199072 0.969283 0.347008 -2.611489 0.282920 -0.334618 0.243583
7 1.234059 1.000687 0.863572 0.412544 0.569687 -0.684413 -0.357968
8 -0.299185 0.566009 -0.859453 -0.564557 -0.562524 0.233489 -0.039145
9 0.937637 -2.171174 -1.940916 -1.553634 0.619965 -0.664284 -0.151388
Note that the +1
is added because when using iloc
the rightmost index is exclusive.
filter
is a nice and simple method for OP's headers, but this might not generalise well to arbitrary column names.
The "location-based" solution with loc
is a little closer to the ideal, but you cannot avoid creating intermediate DataFrames (that are eventually thrown out and garbage collected) to compute the final result range -- something that we would ideally like to avoid.
Lastly, "pick your columns directly" is good advice as long as you have a manageably small number of columns to pick. It will, however not be applicable in some cases where ranges span dozens (or possibly hundreds) of columns.