[python] Python - Dimension of Data Frame

Summary of all ways to get info on dimensions of DataFrame or Series

There are a number of ways to get information on the attributes of your DataFrame or Series.

Create Sample DataFrame and Series

df = pd.DataFrame({'a':[5, 2, np.nan], 'b':[ 9, 2, 4]})
df

     a  b
0  5.0  9
1  2.0  2
2  NaN  4

s = df['a']
s

0    5.0
1    2.0
2    NaN
Name: a, dtype: float64

shape Attribute

The shape attribute returns a two-item tuple of the number of rows and the number of columns in the DataFrame. For a Series, it returns a one-item tuple.

df.shape
(3, 2)

s.shape
(3,)

len function

To get the number of rows of a DataFrame or get the length of a Series, use the len function. An integer will be returned.

len(df)
3

len(s)
3

size attribute

To get the total number of elements in the DataFrame or Series, use the size attribute. For DataFrames, this is the product of the number of rows and the number of columns. For a Series, this will be equivalent to the len function:

df.size
6

s.size
3

ndim attribute

The ndim attribute returns the number of dimensions of your DataFrame or Series. It will always be 2 for DataFrames and 1 for Series:

df.ndim
2

s.ndim
1

The tricky count method

The count method can be used to return the number of non-missing values for each column/row of the DataFrame. This can be very confusing, because most people normally think of count as just the length of each row, which it is not. When called on a DataFrame, a Series is returned with the column names in the index and the number of non-missing values as the values.

df.count() # by default, get the count of each column

a    2
b    3
dtype: int64


df.count(axis='columns') # change direction to get count of each row

0    2
1    2
2    1
dtype: int64

For a Series, there is only one axis for computation and so it just returns a scalar:

s.count()
2

Use the info method for retrieving metadata

The info method returns the number of non-missing values and data types of each column

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
a    2 non-null float64
b    3 non-null int64
dtypes: float64(1), int64(1)
memory usage: 128.0 bytes