For large datasets, it is memory efficient to read only selected rows via the skiprows
parameter.
Example
pred = lambda x: x not in [1, 3]
pd.read_csv("data.csv", skiprows=pred, index_col=0, names=...)
This will now return a DataFrame from a file that skips all rows except 1 and 3.
Details
From the docs:
skiprows
: list-like or integer or callable, defaultNone
...
If callable, the callable function will be evaluated against the row indices, returning True if the row should be skipped and False otherwise. An example of a valid callable argument would be
lambda x: x in [0, 2]
This feature works in version pandas 0.20.0+. See also the corresponding issue and a related post.