I am curious as to why df[2]
is not supported, while df.ix[2]
and df[2:3]
both work.
In [26]: df.ix[2]
Out[26]:
A 1.027680
B 1.514210
C -1.466963
D -0.162339
Name: 2000-01-03 00:00:00
In [27]: df[2:3]
Out[27]:
A B C D
2000-01-03 1.02768 1.51421 -1.466963 -0.162339
I would expect df[2]
to work the same way as df[2:3]
to be consistent with Python indexing convention. Is there a design reason for not supporting indexing row by single integer?
[]
is to select columns.When the indexing operator is passed a string or integer, it attempts to find a column with that particular name and return it as a Series.
So, in the question above: df[2]
searches for a column name matching the integer value 2
. This column does not exist and a KeyError
is raised.
Strangely, when given a slice, the DataFrame indexing operator selects rows and can do so by integer location or by index label.
df[2:3]
This will slice beginning from the row with integer location 2 up to 3, exclusive of the last element. So, just a single row. The following selects rows beginning at integer location 6 up to but not including 20 by every third row.
df[6:20:3]
You can also use slices consisting of string labels if your DataFrame index has strings in it. For more details, see this solution on .iloc vs .loc.
I almost never use this slice notation with the indexing operator as its not explicit and hardly ever used. When slicing by rows, stick with .loc/.iloc
.
I would normally go for .loc/.iloc
as suggested by Ted, but one may also select a row by tranposing the DataFrame. To stay in the example above, df.T[2]
gives you row 2 of df
.
You can think DataFrame as a dict of Series. df[key]
try to select the column index by key
and returns a Series object.
However slicing inside of [] slices the rows, because it's a very common operation.
You can read the document for detail:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#basics
To index-based access to the pandas table, one can also consider numpy.as_array option to convert the table to Numpy array as
np_df = df.as_matrix()
and then
np_df[i]
would work.
You can take a look at the source code .
DataFrame
has a private function _slice()
to slice the DataFrame
, and it allows the parameter axis
to determine which axis to slice. The __getitem__()
for DataFrame
doesn't set the axis while invoking _slice()
. So the _slice()
slice it by default axis 0.
You can take a simple experiment, that might help you:
print df._slice(slice(0, 2))
print df._slice(slice(0, 2), 0)
print df._slice(slice(0, 2), 1)
you can loop through the data frame like this .
for ad in range(1,dataframe_c.size):
print(dataframe_c.values[ad])
Source: Stackoverflow.com