How to preview a part of a large pandas DataFrame in iPython notebook

Question

I am just getting started with pandas in the IPython Notebook and encountering the following problem  When a DataFrame read from a CSV file is small  the IPython Notebook displays it in a nice table view  When the DataFrame is large  something like this is ouput   In  27    evaluation   readCSV  evaluation MO without VNS quality csv   filter   solver    instance    runtime    objective     In  37    evaluation  Out 37     lt class  pandas core frame DataFrame  gt  Int64Index  333 entries  0 to 332 Data columns  solver       333  non-null values instance     333  non-null values runtime      333  non-null values objective    333  non-null values dtypes  int64 1   object 3    I would like to see a small portion of the data frame as a table just to make sure it is in the right format  What options do I have

User · Answer

To see the first n rows of DataFrame   df head n     n 5 by default    To see the last n rows   df tail n

User · Answer

I write a method to show the four corners of the data and monkey-patch to dataframe to do so   def  sw df  up rows 10  down rows 5  left cols 4  right cols 3  return df False           display df data at four corners         A B  up pt          C D  down pt          parameters   up rows 10  down rows 5  left cols 4  right cols 3         usage              df   pd DataFrame np random randn 20 10   columns list  ABCDEFGHIJKLMN   0 10               df sw 5 2 3 2              df1   df set index   A   B    drop True  inplace False              df1 sw 5 2 3 2               pd set printoptions max columns   80  max rows   40      ncol  nrow   len df columns   len df         handle columns     if ncol  lt    left cols   right cols            up pt   df ix 0 up rows               screen width can contain all columns         down pt   df ix -down rows          else                                      screen width can not contain all columns         pt a   df ix 0 up rows   0 left cols          pt b   df ix 0 up rows   -right cols           pt c   df -down rows   ix   0 left cols          pt d   df -down rows   ix   -right cols            up pt     pt a join pt b  how  inner           down pt   pt c join pt d  how  inner           up pt insert left cols                      down pt insert left cols                   overlap qty   len up pt    len down pt  - len df      down pt   down pt drop down pt index range overlap qty      remove overlap rows      dt str list   down pt to string   split   n     transfer down pt to string list        Display up part data     print up pt      start row    1 if df index names 0  is None else 2    start from 1 if without index        Display omit line if screen height is not enought to display all rows     if overlap qty  lt  0          print       len dt str list start row          Display down part data row by row     for line in dt str list start row            print line        Display foot note     print   n      print  Index    df index names     print  Column       join list df columns values       print  row   d    col   d   len df   len df columns       print   n       return  df if return df else None  DataFrame sw    sw   add a method to DataFrame class   Here is the sample    gt  gt  gt  df   pd DataFrame np random randn 20 10   columns list  ABCDEFGHIJKLMN   0 10     gt  gt  gt  df sw            A       B       C       D           H       I       J 0  -0 8166  0 0102  0 0215 -0 0307     -0 0820  1 2727  0 6395 1   1 0659 -1 0102 -1 3960  0 4700      1 0999  1 1222 -1 2476 2   0 4347  1 5423  0 5710 -0 5439      0 2491 -0 0725  2 0645 3  -1 5952 -1 4959  2 2697 -1 1004     -1 9614  0 6488 -0 6190 4  -1 4426 -0 8622  0 0942 -0 1977     -0 7802 -1 1774  1 9682 5   1 2526 -0 2694  0 4841 -0 7568      0 2481  0 3608 -0 7342 6   0 2108  2 5181  1 3631  0 4375     -0 1266  1 0572  0 3654 7  -1 0617 -0 4743 -1 7399 -1 4123     -1 0398 -1 4703 -0 9466 8  -0 5682 -1 3323 -0 6992  1 7737      0 6152  0 9269  2 1854 9   0 2361  0 4873 -1 1278 -0 2251      1 4232  2 1212  2 9180 10  2 0034  0 5454 -2 6337  0 1556      0 0016 -1 6128 -0 8093                                                                15  1 4091  0 3540 -1 3498 -1 0490      0 9328  0 3668  1 3948 16  0 4528 -0 3183  0 4308 -0 1818      0 1295  1 2268  0 1365 17 -0 7093  1 3991  0 9501  2 1227     -1 5296  1 1908  0 0318 18  1 7101  0 5962  0 8948  1 5606     -0 6862  0 9558 -0 5514 19  1 0329 -1 2308 -0 6896 -0 5112      0 2719  1 1478 -0 1459   Index    None  Column  A B C D E F G H I J row  20    col  10    gt  gt  gt  df sw 4 2 3 4          A       B       C           G       H       I       J 0 -0 8166  0 0102  0 0215      0 3671 -0 0820  1 2727  0 6395 1  1 0659 -1 0102 -1 3960      1 0984  1 0999  1 1222 -1 2476 2  0 4347  1 5423  0 5710      1 6675  0 2491 -0 0725  2 0645 3 -1 5952 -1 4959  2 2697      0 4856 -1 9614  0 6488 -0 6190 4 -1 4426 -0 8622  0 0942     -0 0947 -0 7802 -1 1774  1 9682                                                                18  1 7101  0 5962  0 8948     -0 8592 -0 6862  0 9558 -0 5514 19  1 0329 -1 2308 -0 6896     -0 3954  0 2719  1 1478 -0 1459   Index    None  Column  A B C D E F G H I J row  20    col  10

User · Answer

You can just use nrows  For instance  pd read csv  data csv  nrows 6    will show the first 6 rows from data csv

User · Answer

In this case  where the DataFrame is long but not too wide  you can simply slice it    gt  gt  gt  df   pd DataFrame   A   range 1000    B   range 1000     gt  gt  gt  df  lt class  pandas core frame DataFrame  gt  Int64Index  1000 entries  0 to 999 Data columns  A    1000  non-null values B    1000  non-null values dtypes  int64 2   gt  gt  gt  df  5     A  B 0  0  0 1  1  1 2  2  2 3  3  3 4  4  4   ix is deprecated   If it s both wide and long  I tend to use  ix    gt  gt  gt  df   pd DataFrame  i  range 1000  for i in range 100     gt  gt  gt  df ix  5   10     0   1   2   3   4   5   6   7   8   9   10 0   0   0   0   0   0   0   0   0   0   0   0 1   1   1   1   1   1   1   1   1   1   1   1 2   2   2   2   2   2   2   2   2   2   2   2 3   3   3   3   3   3   3   3   3   3   3   3 4   4   4   4   4   4   4   4   4   4   4   4 5   5   5   5   5   5   5   5   5   5   5   5

User · Answer

Update one to generate string instead  and accommodate to Pandas0 13   def  sw2 df  up rows 5  down rows 3  left cols 4  right cols 2  return df False           return df data display string at four corners         A B  up pt          C D  down pt          parameters   up rows 10  down rows 5  left cols 4  right cols 3         usage              df   pd DataFrame np random randn 20 10   columns list  ABCDEFGHIJKLMN   0 10               df sw 5 2 3 2              df1   df set index   A   B    drop True  inplace False              df1 sw 5 2 3 2                pd set printoptions max columns   80  max rows   40      nrow  ncol   df shape  ncol  nrow   len df columns   len df         handle columns     if ncol  lt    left cols   right cols            up pt   df ix 0 up rows               screen width can contain all columns         down pt   df ix -down rows          else                                      screen width can not contain all columns         pt a   df ix 0 up rows   0 left cols          pt b   df ix 0 up rows   -right cols           pt c   df -down rows   ix   0 left cols          pt d   df -down rows   ix   -right cols            up pt     pt a join pt b  how  inner           down pt   pt c join pt d  how  inner           up pt insert left cols                      down pt insert left cols                   overlap qty   len up pt    len down pt  - len df      down pt   down pt drop down pt index range overlap qty      remove overlap rows      dt str list   down pt to string   split   n     transfer down pt to string list        Display up part data     ds   up pt   str          get rid of ending part of Pandas0 13  display string by finding the last 3   n   ugly though     Display str   ds 0 ds 0 ds 0 ds rfind   n    rfind   n    rfind   n     refer to http   stackoverflow com questions 4664850 find-all-occurrences-of-a-substring-in-python      start row    1 if df index names 0  is None else 2    start from 1 if without index        Display omit line if screen height is not enought to display all rows     if overlap qty  lt  0          Display str      n          Display str          len dt str list start row           Display str      n         Display down part data row by row     for line in dt str list start row            Display str      n          Display str    line        Display foot note     Display str      n n      Display str     Index    s n  str df index names       col name list   list df columns values      if ncol  lt  10          col name str        join col name list      else          col name str        join col name list 0 7                    join col name list -2        Display str   Display str    Column      col name str     n      Display str   Display str    row   d   col   d   nrow  ncol                 dty dict     simulate defaultdict     for k g in itertools groupby list df dtypes values     http   stackoverflow com questions 13565248 grouping-the-same-recurring-items-that-occur-in-a-row-from-list 13565414 13565414         try              dty dict k    dty dict k    len list g           except              dty dict k    len list g        for key in dty dict          Display str      0    1      format key  dty dict key        Display str      n n       return  df if return df else Display str

User · Answer

df head 5    will print out the first 5 rows df tail 5    will print out the 5 last rows

User · Answer

This line will allow you to see all rows  up to the number that you set as  max rows   without any rows being hidden by the dots           that normally appear between head and tail in the print output   pd options display max rows   500

User · Answer

In Python pandas provide head   and tail   to print head and tail data respectively   import pandas as pd train   pd read csv  file name   train head     it will print 5 head row data as default value is 5 train head n    it will print n head row data train tail    it will print 5 tail row data as default value is 5 train tail n   it will print n tail row data

User · Answer

In order to view only first few entries you can use  pandas head function which is used as  dataframe head any number            default is 5 dataframe head n value    or you can also you slicing for this purpose  which can also give the same result   dataframe  n    In order to view the last few entries you can use pandas tail   in a similar way   dataframe tail any number            default is 5 dataframe tail n value

User · Answer

Here s a quick way to preview a large table without having it run too wide   Display function     display large dataframes in an html iframe def ldf display df  lines 500       txt      lt iframe                 srcdoc      df head lines  to html                        width 1000 height 500 gt                  lt  iframe gt         return IPython display HTML txt    Now just run this in any cell   ldf display large dataframe    This will convert the dataframe to html then display it in an iframe  The advantage is that you can control the output size and have easily accessible scroll bars   Worked for my purposes  maybe it will help someone else

User · Answer

I found the following approach to be the most effective for sampling a DataFrame   print df A B       A  and  B  are the first and last records in range   For example  print df 10 15   will print rows 10 through 15 - inclusive - from your data set

[python] How to preview a part of a large pandas DataFrame, in iPython notebook?

Examples related to python

Examples related to pandas

Examples related to dataframe

Examples related to ipython

Examples related to ipython-notebook