I believe the most simple and efficient way to loop through DataFrames is using numpy and numba. In that case, looping can be approximately as fast as vectorized operations in many cases. If numba is not an option, plain numpy is likely to be the next best option. As has been noted many times, your default should be vectorization, but this answer merely considers efficient looping, given the decision to loop, for whatever reason.
For a test case, let's use the example from @DSM's answer of calculating a percentage change. This is a very simple situation and as a practical matter you would not write a loop to calculate it, but as such it provides a reasonable baseline for timing vectorized approaches vs loops.
Let's set up the 4 approaches with a small DataFrame, and we'll time them on a larger dataset below.
import pandas as pd
import numpy as np
import numba as nb
df = pd.DataFrame( { 'close':[100,105,95,105] } )
pandas_vectorized = df.close.pct_change()[1:]
x = df.close.to_numpy()
numpy_vectorized = ( x[1:] - x[:-1] ) / x[:-1]
def test_numpy(x):
pct_chng = np.zeros(len(x))
for i in range(1,len(x)):
pct_chng[i] = ( x[i] - x[i-1] ) / x[i-1]
return pct_chng
numpy_loop = test_numpy(df.close.to_numpy())[1:]
@nb.jit(nopython=True)
def test_numba(x):
pct_chng = np.zeros(len(x))
for i in range(1,len(x)):
pct_chng[i] = ( x[i] - x[i-1] ) / x[i-1]
return pct_chng
numba_loop = test_numba(df.close.to_numpy())[1:]
And here are the timings on a DataFrame with 100,000 rows (timings performed with Jupyter's %timeit
function, collapsed to a summary table for readability):
pandas/vectorized 1,130 micro-seconds
numpy/vectorized 382 micro-seconds
numpy/looped 72,800 micro-seconds
numba/looped 455 micro-seconds
Summary: for simple cases, like this one, you would go with (vectorized) pandas for simplicity and readability, and (vectorized) numpy for speed. If you really need to use a loop, do it in numpy. If numba is available, combine it with numpy for additional speed. In this case, numpy + numba is almost as fast as vectorized numpy code.
Other details: