[python] How to get the last N rows of a pandas DataFrame?

I have pandas dataframe df1 and df2 (df1 is vanila dataframe, df2 is indexed by 'STK_ID' & 'RPT_Date') :

>>> df1
    STK_ID  RPT_Date  TClose   sales  discount
0   000568  20060331    3.69   5.975       NaN
1   000568  20060630    9.14  10.143       NaN
2   000568  20060930    9.49  13.854       NaN
3   000568  20061231   15.84  19.262       NaN
4   000568  20070331   17.00   6.803       NaN
5   000568  20070630   26.31  12.940       NaN
6   000568  20070930   39.12  19.977       NaN
7   000568  20071231   45.94  29.269       NaN
8   000568  20080331   38.75  12.668       NaN
9   000568  20080630   30.09  21.102       NaN
10  000568  20080930   26.00  30.769       NaN

>>> df2
                 TClose   sales  discount  net_sales    cogs
STK_ID RPT_Date                                             
000568 20060331    3.69   5.975       NaN      5.975   2.591
       20060630    9.14  10.143       NaN     10.143   4.363
       20060930    9.49  13.854       NaN     13.854   5.901
       20061231   15.84  19.262       NaN     19.262   8.407
       20070331   17.00   6.803       NaN      6.803   2.815
       20070630   26.31  12.940       NaN     12.940   5.418
       20070930   39.12  19.977       NaN     19.977   8.452
       20071231   45.94  29.269       NaN     29.269  12.606
       20080331   38.75  12.668       NaN     12.668   3.958
       20080630   30.09  21.102       NaN     21.102   7.431

I can get the last 3 rows of df2 by:

>>> df2.ix[-3:]
                 TClose   sales  discount  net_sales    cogs
STK_ID RPT_Date                                             
000568 20071231   45.94  29.269       NaN     29.269  12.606
       20080331   38.75  12.668       NaN     12.668   3.958
       20080630   30.09  21.102       NaN     21.102   7.431

while df1.ix[-3:] give all the rows:

>>> df1.ix[-3:]
    STK_ID  RPT_Date  TClose   sales  discount
0   000568  20060331    3.69   5.975       NaN
1   000568  20060630    9.14  10.143       NaN
2   000568  20060930    9.49  13.854       NaN
3   000568  20061231   15.84  19.262       NaN
4   000568  20070331   17.00   6.803       NaN
5   000568  20070630   26.31  12.940       NaN
6   000568  20070930   39.12  19.977       NaN
7   000568  20071231   45.94  29.269       NaN
8   000568  20080331   38.75  12.668       NaN
9   000568  20080630   30.09  21.102       NaN
10  000568  20080930   26.00  30.769       NaN

Why ? How to get the last 3 rows of df1 (dataframe without index) ? Pandas 0.10.1

This question is related to python pandas dataframe

The answer is


This is because of using integer indices (ix selects those by label over -3 rather than position, and this is by design: see integer indexing in pandas "gotchas"*).

*In newer versions of pandas prefer loc or iloc to remove the ambiguity of ix as position or label:

df.iloc[-3:]

see the docs.

As Wes points out, in this specific case you should just use tail!


How to get the last N rows of a pandas DataFrame?

If you are slicing by position, __getitem__ (i.e., slicing with[]) works well, and is the most succinct solution I've found for this problem.

pd.__version__
# '0.24.2'

df = pd.DataFrame({'A': list('aaabbbbc'), 'B': np.arange(1, 9)})
df

   A  B
0  a  1
1  a  2
2  a  3
3  b  4
4  b  5
5  b  6
6  b  7
7  c  8

df[-3:]

   A  B
5  b  6
6  b  7
7  c  8

This is the same as calling df.iloc[-3:], for instance (iloc internally delegates to __getitem__).


As an aside, if you want to find the last N rows for each group, use groupby and GroupBy.tail:

df.groupby('A').tail(2)

   A  B
1  a  2
2  a  3
5  b  6
6  b  7
7  c  8

Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to pandas

xlrd.biffh.XLRDError: Excel xlsx file; not supported Pandas Merging 101 How to increase image size of pandas.DataFrame.plot in jupyter notebook? Trying to merge 2 dataframes but get ValueError Python Pandas User Warning: Sorting because non-concatenation axis is not aligned How to show all of columns name on pandas dataframe? Pandas/Python: Set value of one column based on value in another column Python Pandas - Find difference between two data frames Pandas get the most frequent values of a column Python convert object to float

Examples related to dataframe

Trying to merge 2 dataframes but get ValueError How to show all of columns name on pandas dataframe? Python Pandas - Find difference between two data frames Pandas get the most frequent values of a column Display all dataframe columns in a Jupyter Python Notebook How to convert column with string type to int form in pyspark data frame? Display/Print one column from a DataFrame of Series in Pandas Binning column with python pandas Selection with .loc in python Set value to an entire column of a pandas dataframe