I am trying to access the index of a row in a function applied across an entire DataFrame
in Pandas. I have something like this:
df = pandas.DataFrame([[1,2,3],[4,5,6]], columns=['a','b','c'])
>>> df
a b c
0 1 2 3
1 4 5 6
and I'll define a function that access elements with a given row
def rowFunc(row):
return row['a'] + row['b'] * row['c']
I can apply it like so:
df['d'] = df.apply(rowFunc, axis=1)
>>> df
a b c d
0 1 2 3 7
1 4 5 6 34
Awesome! Now what if I want to incorporate the index into my function?
The index of any given row in this DataFrame
before adding d
would be Index([u'a', u'b', u'c', u'd'], dtype='object')
, but I want the 0 and 1. So I can't just access row.index
.
I know I could create a temporary column in the table where I store the index, but I'm wondering if it is stored in the row object somewhere.
To answer the original question: yes, you can access the index value of a row in apply()
. It is available under the key name
and requires that you specify axis=1
(because the lambda processes the columns of a row and not the rows of a column).
Working example (pandas 0.23.4):
>>> import pandas as pd
>>> df = pd.DataFrame([[1,2,3],[4,5,6]], columns=['a','b','c'])
>>> df.set_index('a', inplace=True)
>>> df
b c
a
1 2 3
4 5 6
>>> df['index_x10'] = df.apply(lambda row: 10*row.name, axis=1)
>>> df
b c index_x10
a
1 2 3 10
4 5 6 40
Either:
row.name
inside the apply(..., axis=1)
call:df = pandas.DataFrame([[1,2,3],[4,5,6]], columns=['a','b','c'], index=['x','y'])
a b c
x 1 2 3
y 4 5 6
df.apply(lambda row: row.name, axis=1)
x x
y y
iterrows()
(slower)DataFrame.iterrows() allows you to iterate over rows, and access their index:
for idx, row in df.iterrows():
...
Source: Stackoverflow.com