Pandas every nth row

Question

Dataframe resample   works only with timeseries data  I cannot find a way of getting every nth row from non-timeseries data  What is the best method

User · Accepted Answer

I d use iloc  which takes a row column slice  both based on integer position and following normal python syntax  If you want every 5th row  df iloc   5

User · Answer

I had a similar requirement  but I wanted the n th item in a particular group  This is how I solved it   groups   data groupby   group key    selection   groups  index col   apply lambda x  x   3    0  subset   data selection

User · Answer

There is an even simpler solution to the accepted answer that involves directly invoking df   getitem     df   pd DataFrame  x   index range 5   columns list  abc    df     a  b  c 0  x  x  x 1  x  x  x 2  x  x  x 3  x  x  x 4  x  x  x     For example  to get every 2 rows  you can do  df   2      a  b  c 0  x  x  x 2  x  x  x 4  x  x  x     There s also GroupBy first GroupBy head  you group on the index   df index    2   Int64Index  0  0  1  1  2   dtype  int64    df groupby df index    2  first     Alternatively    df groupby df index    2  head 1      a  b  c 0  x  x  x 1  x  x  x 2  x  x  x   The index is floor-divved by the stride  2  in this case   If the index is non-numeric  instead do    df groupby np arange len df      2  first   df groupby pd RangeIndex len df      2  first       a  b  c 0  x  x  x 1  x  x  x 2  x  x  x

User · Answer

df drop labels df df index   3    0  index  axis 0     every 3rd row  mod 3

User · Answer

Though  chrisb s accepted answer does answer the question  I would like to add to it the following   A simple method I use to get the nth data or drop the nth row is the following   df1   df df index   3    0     Excludes every 3rd row starting from 0 df2   df df index   3    0     Selects every 3rd raw starting from 0   This arithmetic based sampling has the ability to enable even more complex row-selections   This assumes  of course  that you have an index column of ordered  consecutive  integers starting at 0

User · Answer

A solution I came up with when using the index was not viable   possibly the multi-Gig  csv was too large  or I missed some technique that would allow me to reindex without crashing    Walk through one row at a time and add the nth row to a new dataframe  import pandas as pd from csv import DictReader  def make downsampled df filename  interval           with open filename   r   as read obj          csv dict reader   DictReader read obj          column names   csv dict reader fieldnames         df   pd DataFrame columns column names               for index  row in enumerate csv dict reader               if index   interval    0                 print str row                  df   df append row  ignore index True       return df

[python] Pandas every nth row

Examples related to python

Examples related to pandas

Examples related to resampling