[python] How do I get the row count of a Pandas DataFrame?

I'm trying to get the number of rows of dataframe df with Pandas, and here is my code.

Method 1:

total_rows = df.count
print total_rows + 1

Method 2:

total_rows = df['First_columnn_label'].count
print total_rows + 1

Both the code snippets give me this error:

TypeError: unsupported operand type(s) for +: 'instancemethod' and 'int'

What am I doing wrong?

This question is related to python pandas dataframe

The answer is


TL;DR

Short, clear and clean: use len(df)


len() is your friend, and it can be used for row counts as len(df).

Alternatively, you can access all rows by df.index and all columns by df.columns, and as you can use the len(anyList) for getting the count of list, use len(df.index) for getting the number of rows, and len(df.columns) for the column count.

Or, you can use df.shape which returns the number of rows and columns together. If you want to access the number of rows, only use df.shape[0]. For the number of columns, only use: df.shape[1].


Either of this can do it (df is the name of the DataFrame):

Method 1: Using the len function:

len(df) will give the number of rows in a DataFrame named df.

Method 2: using count function:

df[col].count() will count the number of rows in a given column col.

df.count() will give the number of rows for all the columns.


You can do this also:

Let’s say df is your dataframe. Then df.shape gives you the shape of your dataframe i.e (row,col)

Thus, assign the below command to get the required

 row = df.shape[0], col = df.shape[1]

...building on Jan-Philip Gehrcke's answer.

The reason why len(df) or len(df.index) is faster than df.shape[0]:

Look at the code. df.shape is a @property that runs a DataFrame method calling len twice.

df.shape??
Type:        property
String form: <property object at 0x1127b33c0>
Source:
# df.shape.fget
@property
def shape(self):
    """
    Return a tuple representing the dimensionality of the DataFrame.
    """
    return len(self.index), len(self.columns)

And beneath the hood of len(df)

df.__len__??
Signature: df.__len__()
Source:
    def __len__(self):
        """Returns length of info axis, but here we use the index """
        return len(self.index)
File:      ~/miniconda2/lib/python2.7/site-packages/pandas/core/frame.py
Type:      instancemethod

len(df.index) will be slightly faster than len(df) since it has one less function call, but this is always faster than df.shape[0]


In case you want to get the row count in the middle of a chained operation, you can use:

df.pipe(len)

Example:

row_count = (
      pd.DataFrame(np.random.rand(3,4))
      .reset_index()
      .pipe(len)
)

This can be useful if you don't want to put a long statement inside a len() function.

You could use __len__() instead but __len__() looks a bit weird.


For dataframe df, a printed comma formatted row count used while exploring data:

def nrow(df):
    print("{:,}".format(df.shape[0]))

Example:

nrow(my_df)
12,456,789

How do I get the row count of a Pandas DataFrame?

This table summarises the different situations in which you'd want to count something in a DataFrame (or Series, for completeness), along with the recommended method(s).

Enter image description here

Footnotes

  1. DataFrame.count returns counts for each column as a Series since the non-null count varies by column.
  2. DataFrameGroupBy.size returns a Series, since all columns in the same group share the same row-count.
  3. DataFrameGroupBy.count returns a DataFrame, since the non-null count could differ across columns in the same group. To get the group-wise non-null count for a specific column, use df.groupby(...)['x'].count() where "x" is the column to count.

#Minimal Code Examples

Below, I show examples of each of the methods described in the table above. First, the setup -

df = pd.DataFrame({
    'A': list('aabbc'), 'B': ['x', 'x', np.nan, 'x', np.nan]})
s = df['B'].copy()

df

   A    B
0  a    x
1  a    x
2  b  NaN
3  b    x
4  c  NaN

s

0      x
1      x
2    NaN
3      x
4    NaN
Name: B, dtype: object

Row Count of a DataFrame: len(df), df.shape[0], or len(df.index)

len(df)
# 5

df.shape[0]
# 5

len(df.index)
# 5

It seems silly to compare the performance of constant time operations, especially when the difference is on the level of "seriously, don't worry about it". But this seems to be a trend with other answers, so I'm doing the same for completeness.

Of the three methods above, len(df.index) (as mentioned in other answers) is the fastest.

Note

  • All the methods above are constant time operations as they are simple attribute lookups.
  • df.shape (similar to ndarray.shape) is an attribute that returns a tuple of (# Rows, # Cols). For example, df.shape returns (8, 2) for the example here.

Column Count of a DataFrame: df.shape[1], len(df.columns)

df.shape[1]
# 2

len(df.columns)
# 2

Analogous to len(df.index), len(df.columns) is the faster of the two methods (but takes more characters to type).

Row Count of a Series: len(s), s.size, len(s.index)

len(s)
# 5

s.size
# 5

len(s.index)
# 5

s.size and len(s.index) are about the same in terms of speed. But I recommend len(df).

Note size is an attribute, and it returns the number of elements (=count of rows for any Series). DataFrames also define a size attribute which returns the same result as df.shape[0] * df.shape[1].

Non-Null Row Count: DataFrame.count and Series.count

The methods described here only count non-null values (meaning NaNs are ignored).

Calling DataFrame.count will return non-NaN counts for each column:

df.count()

A    5
B    3
dtype: int64

For Series, use Series.count to similar effect:

s.count()
# 3

Group-wise Row Count: GroupBy.size

For DataFrames, use DataFrameGroupBy.size to count the number of rows per group.

df.groupby('A').size()

A
a    2
b    2
c    1
dtype: int64

Similarly, for Series, you'll use SeriesGroupBy.size.

s.groupby(df.A).size()

A
a    2
b    2
c    1
Name: B, dtype: int64

In both cases, a Series is returned. This makes sense for DataFrames as well since all groups share the same row-count.

Group-wise Non-Null Row Count: GroupBy.count

Similar to above, but use GroupBy.count, not GroupBy.size. Note that size always returns a Series, while count returns a Series if called on a specific column, or else a DataFrame.

The following methods return the same thing:

df.groupby('A')['B'].size()
df.groupby('A').size()

A
a    2
b    2
c    1
Name: B, dtype: int64

Meanwhile, for count, we have

df.groupby('A').count()

   B
A
a  2
b  1
c  0

...called on the entire GroupBy object, vs.,

df.groupby('A')['B'].count()

A
a    2
b    1
c    0
Name: B, dtype: int64

Called on a specific column.


Think, the dataset is "data" and name your dataset as " data_fr " and number of rows in the data_fr is "nu_rows"

#import the data frame. Extention could be different as csv,xlsx or etc.
data_fr = pd.read_csv('data.csv')

#print the number of rows
nu_rows = data_fr.shape[0]
print(nu_rows)

Apart from the previous answers, you can use df.axes to get the tuple with row and column indexes and then use the len() function:

total_rows = len(df.axes[0])
total_cols = len(df.axes[1])

Suppose df is your dataframe then:

count_row = df.shape[0]  # Gives number of rows
count_col = df.shape[1]  # Gives number of columns

Or, more succinctly,

r, c = df.shape

I'm not sure if this would work (data could be omitted), but this may work:

*dataframe name*.tails(1)

and then using this, you could find the number of rows by running the code snippet and looking at the row number that was given to you.


Use len(df).

__len__() is documented with Returns length of index. Timing info, set up the same way as in root's answer:

In [7]: timeit len(df.index)
1000000 loops, best of 3: 248 ns per loop

In [8]: timeit len(df)
1000000 loops, best of 3: 573 ns per loop

Due to one additional function call, it is a tiny bit slower than calling len(df.index) directly. This should not matter in most cases.


An alternative method to finding out the amount of rows in a dataframe which I think is the most readable variant is pandas.Index.size.

Do note that, as I commented on the accepted answer,

Suspected pandas.Index.size would actually be faster than len(df.index) but timeit on my computer tells me otherwise (~150 ns slower per loop).


I come to Pandas from an R background, and I see that Pandas is more complicated when it comes to selecting rows or columns.

I had to wrestle with it for a while, and then I found some ways to deal with:

Getting the number of columns:

len(df.columns)
## Here:
# df is your data.frame
# df.columns returns a string. It contains column's titles of the df.
# Then, "len()" gets the length of it.

Getting the number of rows:

len(df.index) # It's similar.

Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to pandas

xlrd.biffh.XLRDError: Excel xlsx file; not supported Pandas Merging 101 How to increase image size of pandas.DataFrame.plot in jupyter notebook? Trying to merge 2 dataframes but get ValueError Python Pandas User Warning: Sorting because non-concatenation axis is not aligned How to show all of columns name on pandas dataframe? Pandas/Python: Set value of one column based on value in another column Python Pandas - Find difference between two data frames Pandas get the most frequent values of a column Python convert object to float

Examples related to dataframe

Trying to merge 2 dataframes but get ValueError How to show all of columns name on pandas dataframe? Python Pandas - Find difference between two data frames Pandas get the most frequent values of a column Display all dataframe columns in a Jupyter Python Notebook How to convert column with string type to int form in pyspark data frame? Display/Print one column from a DataFrame of Series in Pandas Binning column with python pandas Selection with .loc in python Set value to an entire column of a pandas dataframe