[python] iterrows pandas get next rows value

I have a df in pandas

import pandas as pd
df = pd.DataFrame(['AA', 'BB', 'CC'], columns = ['value'])

I want to iterate over rows in df. For each row i want rows value and next rows value Something like(it does not work):

for i, row in df.iterrows():
     print row['value']
     i1, row1 = next(df.iterrows())
     print row1['value']

As a result I want

'AA'
'BB'
'BB'
'CC'
'CC'
*Wrong index error here  

At this point i have mess way to solve this

for i in range(0, df.shape[0])
   print df.irow(i)['value']
   print df.irow(i+1)['value']

Is there more efficient way to solve this issue?

This question is related to python pandas next

The answer is


Firstly, your "messy way" is ok, there's nothing wrong with using indices into the dataframe, and this will not be too slow. iterrows() itself isn't terribly fast.

A version of your first idea that would work would be:

row_iterator = df.iterrows()
_, last = row_iterator.next()  # take first item from row_iterator
for i, row in row_iterator:
    print(row['value'])
    print(last['value'])
    last = row

The second method could do something similar, to save one index into the dataframe:

last = df.irow(0)
for i in range(1, df.shape[0]):
    print(last)
    print(df.irow(i))
    last = df.irow(i)

When speed is critical you can always try both and time the code.


There is a pairwise() function example in the itertools document:

from itertools import tee, izip
def pairwise(iterable):
    "s -> (s0,s1), (s1,s2), (s2, s3), ..."
    a, b = tee(iterable)
    next(b, None)
    return izip(a, b)

import pandas as pd
df = pd.DataFrame(['AA', 'BB', 'CC'], columns = ['value'])

for (i1, row1), (i2, row2) in pairwise(df.iterrows()):
    print i1, i2, row1["value"], row2["value"]

Here is the output:

0 1 AA BB
1 2 BB CC

But, I think iter rows in a DataFrame is slow, if you can explain what's the problem you want to solve, maybe I can suggest some better method.


I would use shift() function as follows:

df['value_1'] = df.value.shift(-1)
[print(x) for x in df.T.unstack().dropna(how = 'any').values];

which produces

AA
BB
BB
CC
CC

This is how the code above works:

Step 1) Use shift function

df['value_1'] = df.value.shift(-1)
print(df)

produces

value value_1
0    AA      BB
1    BB      CC
2    CC     NaN

step 2) Transpose:

df = df.T
print(df)

produces:

          0   1    2
value    AA  BB   CC
value_1  BB  CC  NaN

Step 3) Unstack:

df = df.unstack()
print(df)

produces:

0  value       AA
   value_1     BB
1  value       BB
   value_1     CC
2  value       CC
   value_1    NaN
dtype: object

Step 4) Drop NaN values

df = df.dropna(how = 'any')
print(df)

produces:

0  value      AA
   value_1    BB
1  value      BB
   value_1    CC
2  value      CC
dtype: object

Step 5) Return a Numpy representation of the DataFrame, and print value by value:

df = df.values
[print(x) for x in df];

produces:

AA
BB
BB
CC
CC

This can be solved also by izipping the dataframe (iterator) with an offset version of itself.

Of course the indexing error cannot be reproduced this way.

Check this out

import pandas as pd
from itertools import izip

df = pd.DataFrame(['AA', 'BB', 'CC'], columns = ['value'])   

for id1, id2 in izip(df.iterrows(),df.ix[1:].iterrows()):
    print id1[1]['value']
    print id2[1]['value']

which gives

AA
BB
BB
CC

a combination of answers gave me a very fast running time. using the shift method to create new column of next row values, then using the row_iterator function as @alisdt did, but here i changed it from iterrows to itertuples which is 100 times faster.

my script is for iterating dataframe of duplications in different length and add one second for each duplication so they all be unique.

# create new column with shifted values from the departure time column
df['next_column_value'] = df['column_value'].shift(1)
# create row iterator that can 'save' the next row without running for loop
row_iterator = df.itertuples()
# jump to the next row using the row iterator
last = next(row_iterator)
# because pandas does not support items alteration i need to save it as an object
t = last[your_column_num]
# run and update the time duplications with one more second each
for row in row_iterator:
    if row.column_value == row.next_column_value:
         t = t + add_sec
         df_result.at[row.Index, 'column_name'] = t
    else:
         # here i resetting the 'last' and 't' values
         last = row
         t = last[your_column_num]

Hope it will help.


Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to pandas

xlrd.biffh.XLRDError: Excel xlsx file; not supported Pandas Merging 101 How to increase image size of pandas.DataFrame.plot in jupyter notebook? Trying to merge 2 dataframes but get ValueError Python Pandas User Warning: Sorting because non-concatenation axis is not aligned How to show all of columns name on pandas dataframe? Pandas/Python: Set value of one column based on value in another column Python Pandas - Find difference between two data frames Pandas get the most frequent values of a column Python convert object to float

Examples related to next

iterrows pandas get next rows value Passing variables to the next middleware using next() in Express.js How to read one single line of csv data in Python? Android Button click go to another xml page Continue For loop javascript node.js next() "Continue" (to next iteration) on VBScript jquery, find next element by class Getting next element while cycling through a list