I have a csv file with the name params.csv
. I opened up ipython qtconsole
and created a pandas dataframe
using:
import pandas
paramdata = pandas.read_csv('params.csv', names=paramnames)
where, paramnames
is a python list of string objects. Example of paramnames
(the length of actual list is 22):
paramnames = ["id",
"fc",
"mc",
"markup",
"asplevel",
"aspreview",
"reviewpd"]
At the ipython prompt if I type paramdata
and press enter then I do not get the dataframe with columns and values as shown in examples on Pandas website. Instead, I get information about the dataframe. I get:
In[35]: paramdata
Out[35]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 59 entries, 0 to 58
Data columns:
id 59 non-null values
fc 59 non-null values
mc 59 non-null values
markup 59 non-null values
asplevel 59 non-null values
aspreview 59 non-null values
reviewpd 59 non-null values
If I type paramdata['mc']
then I do get the values as expected for the mc
column. I have two questions:
(1) In the examples on the pandas website (see, for example, the output of df
here: http://pandas.sourceforge.net/indexing.html#additional-column-access) typing the name of the dataframe gives the actual data. Why am I getting information about the dataframe as shown above instead of the actual data? Do I need to set some output options somewhere?
(2) How do I output all columns in the dataframe to the screen without having to type their names, i.e., without having to type something like paramdata[['id','fc','mc']]
.
I am using pandas version 0.8.
Thank you.
you can also use DataFrame.head(x)
/ .tail(x)
to display the first / last x rows of the DataFrame.
you can use sequence slicing syntax i.e
paramdata[:5] # first five records
paramdata[-5:] # last five records
paramdata[:] # all records
sometimes the dataframe might not fit in the screen buffer in which case you are probably better off either printing a small subset or exporting it to something else, plot or (csv again)
I'm coming to python from R, and R's head()
function wraps lines in a really convenient way for looking at data:
> head(cbind(mtcars, mtcars, mtcars))
mpg cyl disp hp drat wt qsec vs am gear carb mpg cyl
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 21.0 6
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 21.0 6
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 22.8 4
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 21.4 6
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 18.7 8
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 18.1 6
disp hp drat wt qsec vs am gear carb mpg cyl disp hp
Mazda RX4 160 110 3.90 2.620 16.46 0 1 4 4 21.0 6 160 110
Mazda RX4 Wag 160 110 3.90 2.875 17.02 0 1 4 4 21.0 6 160 110
Datsun 710 108 93 3.85 2.320 18.61 1 1 4 1 22.8 4 108 93
Hornet 4 Drive 258 110 3.08 3.215 19.44 1 0 3 1 21.4 6 258 110
Hornet Sportabout 360 175 3.15 3.440 17.02 0 0 3 2 18.7 8 360 175
Valiant 225 105 2.76 3.460 20.22 1 0 3 1 18.1 6 225 105
drat wt qsec vs am gear carb
Mazda RX4 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 3.90 2.875 17.02 0 1 4 4
Datsun 710 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 3.15 3.440 17.02 0 0 3 2
Valiant 2.76 3.460 20.22 1 0 3 1
I developed the following little python function to mimic this functionality:
def rhead(x, nrow = 6, ncol = 4):
pd.set_option('display.expand_frame_repr', False)
seq = np.arange(0, len(x.columns), ncol)
for i in seq:
print(x.loc[range(0, nrow), x.columns[range(i, min(i+ncol, len(x.columns)))]])
pd.set_option('display.expand_frame_repr', True)
(it depends on pandas and numpy, obviously)
Use:
pandas.set_option('display.max_columns', 7)
This will force Pandas to display the 7 columns you have. Or more generally:
pandas.set_option('display.max_columns', None)
which will force it to display any number of columns.
Explanation: the default for max_columns
is 0
, which tells Pandas to display the table only if all the columns can be squeezed into the width of your console.
Alternatively, you can change the console width (in chars) from the default of 80 using e.g:
pandas.set_option('display.width', 200)
In ipython
, I use this to print a part of the dataframe that works quite well (prints the first 100 rows):
print paramdata.head(100).to_string()
I know this is an old question, but I have just had a similar problem and I think what I did would work for you too.
I used the to_csv() method and wrote to stdout:
import sys
paramdata.to_csv(sys.stdout)
This should dump the whole dataframe whether it's nicely-printable or not, and you can use the to_csv parameters to configure column separators, whether the index is printed, etc.
Edit: It is now possible to use None
as the target for .to_csv()
with similar effect, which is arguably a lot nicer:
paramdata.to_csv(None)
Source: Stackoverflow.com