[python] Output data from all columns in a dataframe in pandas

I have a csv file with the name params.csv. I opened up ipython qtconsole and created a pandas dataframe using:

import pandas
paramdata = pandas.read_csv('params.csv', names=paramnames)

where, paramnames is a python list of string objects. Example of paramnames (the length of actual list is 22):

paramnames = ["id",
"fc",
"mc",
"markup",
"asplevel",
"aspreview",
"reviewpd"]

At the ipython prompt if I type paramdata and press enter then I do not get the dataframe with columns and values as shown in examples on Pandas website. Instead, I get information about the dataframe. I get:

In[35]: paramdata
Out[35]: 
<class 'pandas.core.frame.DataFrame'>
Int64Index: 59 entries, 0 to 58
Data columns:
id                    59  non-null values
fc                    59  non-null values
mc                    59  non-null values
markup                59  non-null values
asplevel              59  non-null values
aspreview             59  non-null values
reviewpd              59  non-null values

If I type paramdata['mc'] then I do get the values as expected for the mc column. I have two questions:

(1) In the examples on the pandas website (see, for example, the output of df here: http://pandas.sourceforge.net/indexing.html#additional-column-access) typing the name of the dataframe gives the actual data. Why am I getting information about the dataframe as shown above instead of the actual data? Do I need to set some output options somewhere?

(2) How do I output all columns in the dataframe to the screen without having to type their names, i.e., without having to type something like paramdata[['id','fc','mc']].

I am using pandas version 0.8.

Thank you.

This question is related to python pandas

The answer is


you can also use DataFrame.head(x) / .tail(x) to display the first / last x rows of the DataFrame.


you can use sequence slicing syntax i.e

paramdata[:5] # first five records
paramdata[-5:] # last five records
paramdata[:] # all records

sometimes the dataframe might not fit in the screen buffer in which case you are probably better off either printing a small subset or exporting it to something else, plot or (csv again)


I'm coming to python from R, and R's head() function wraps lines in a really convenient way for looking at data:

> head(cbind(mtcars, mtcars, mtcars))
                   mpg cyl disp  hp drat    wt  qsec vs am gear carb  mpg cyl
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4 21.0   6
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4 21.0   6
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1 22.8   4
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1 21.4   6
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2 18.7   8
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1 18.1   6
                  disp  hp drat    wt  qsec vs am gear carb  mpg cyl disp  hp
Mazda RX4          160 110 3.90 2.620 16.46  0  1    4    4 21.0   6  160 110
Mazda RX4 Wag      160 110 3.90 2.875 17.02  0  1    4    4 21.0   6  160 110
Datsun 710         108  93 3.85 2.320 18.61  1  1    4    1 22.8   4  108  93
Hornet 4 Drive     258 110 3.08 3.215 19.44  1  0    3    1 21.4   6  258 110
Hornet Sportabout  360 175 3.15 3.440 17.02  0  0    3    2 18.7   8  360 175
Valiant            225 105 2.76 3.460 20.22  1  0    3    1 18.1   6  225 105
                  drat    wt  qsec vs am gear carb
Mazda RX4         3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     3.90 2.875 17.02  0  1    4    4
Datsun 710        3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 3.15 3.440 17.02  0  0    3    2
Valiant           2.76 3.460 20.22  1  0    3    1

I developed the following little python function to mimic this functionality:

def rhead(x, nrow = 6, ncol = 4):
    pd.set_option('display.expand_frame_repr', False)
    seq = np.arange(0, len(x.columns), ncol)
    for i in seq:
        print(x.loc[range(0, nrow), x.columns[range(i, min(i+ncol, len(x.columns)))]])
    pd.set_option('display.expand_frame_repr', True)

(it depends on pandas and numpy, obviously)


Use:

pandas.set_option('display.max_columns', 7)

This will force Pandas to display the 7 columns you have. Or more generally:

pandas.set_option('display.max_columns', None)

which will force it to display any number of columns.

Explanation: the default for max_columns is 0, which tells Pandas to display the table only if all the columns can be squeezed into the width of your console.

Alternatively, you can change the console width (in chars) from the default of 80 using e.g:

pandas.set_option('display.width', 200)

In ipython, I use this to print a part of the dataframe that works quite well (prints the first 100 rows):

print paramdata.head(100).to_string()

I know this is an old question, but I have just had a similar problem and I think what I did would work for you too.

I used the to_csv() method and wrote to stdout:

import sys

paramdata.to_csv(sys.stdout)

This should dump the whole dataframe whether it's nicely-printable or not, and you can use the to_csv parameters to configure column separators, whether the index is printed, etc.

Edit: It is now possible to use None as the target for .to_csv() with similar effect, which is arguably a lot nicer:

paramdata.to_csv(None)