[python] How to check whether a pandas DataFrame is empty?

How to check whether a pandas DataFrame is empty? In my case I want to print some message in terminal if the DataFrame is empty.

This question is related to python pandas dataframe

The answer is


I prefer going the long route. These are the checks I follow to avoid using a try-except clause -

  1. check if variable is not None
  2. then check if its a dataframe and
  3. make sure its not empty

Here, DATA is the suspect variable -

DATA is not None and isinstance(DATA, pd.DataFrame) and not DATA.empty

1) If a DataFrame has got Nan and Non Null values and you want to find whether the DataFrame
is empty or not then try this code.
2) when this situation can happen? 
This situation happens when a single function is used to plot more than one DataFrame 
which are passed as parameter.In such a situation the function try to plot the data even 
when a DataFrame is empty and thus plot an empty figure!.
It will make sense if simply display 'DataFrame has no data' message.
3) why? 
if a DataFrame is empty(i.e. contain no data at all.Mind you DataFrame with Nan values 
is considered non empty) then it is desirable not to plot but put out a message :
Suppose we have two DataFrames df1 and df2.
The function myfunc takes any DataFrame(df1 and df2 in this case) and print a message 
if a DataFrame is empty(instead of plotting):
df1                     df2
col1 col2           col1 col2 
Nan   2              Nan  Nan 
2     Nan            Nan  Nan  

and the function:

def myfunc(df):
  if (df.count().sum())>0: ##count the total number of non Nan values.Equal to 0 if DataFrame is empty
     print('not empty')
     df.plot(kind='barh')
  else:
     display a message instead of plotting if it is empty
     print('empty')

To see if a dataframe is empty, I argue that one should test for the length of a dataframe's columns index:

if len(df.columns) == 0: 1

Reason:

According to the Pandas Reference API, there is a distinction between:

  • an empty dataframe with 0 rows and 0 columns
  • an empty dataframe with rows containing NaN hence at least 1 column

Arguably, they are not the same. The other answers are imprecise in that df.empty, len(df), or len(df.index) make no distinction and return index is 0 and empty is True in both cases.

Examples

Example 1: An empty dataframe with 0 rows and 0 columns

In [1]: import pandas as pd
        df1 = pd.DataFrame()
        df1
Out[1]: Empty DataFrame
        Columns: []
        Index: []

In [2]: len(df1.index)  # or len(df1)
Out[2]: 0

In [3]: df1.empty
Out[3]: True

Example 2: A dataframe which is emptied to 0 rows but still retains n columns

In [4]: df2 = pd.DataFrame({'AA' : [1, 2, 3], 'BB' : [11, 22, 33]})
        df2
Out[4]:    AA  BB
        0   1  11
        1   2  22
        2   3  33

In [5]: df2 = df2[df2['AA'] == 5]
        df2
Out[5]: Empty DataFrame
        Columns: [AA, BB]
        Index: []

In [6]: len(df2.index)  # or len(df2)
Out[6]: 0

In [7]: df2.empty
Out[7]: True

Now, building on the previous examples, in which the index is 0 and empty is True. When reading the length of the columns index for the first loaded dataframe df1, it returns 0 columns to prove that it is indeed empty.

In [8]: len(df1.columns)
Out[8]: 0

In [9]: len(df2.columns)
Out[9]: 2

Critically, while the second dataframe df2 contains no data, it is not completely empty because it returns the amount of empty columns that persist.

Why it matters

Let's add a new column to these dataframes to understand the implications:

# As expected, the empty column displays 1 series
In [10]: df1['CC'] = [111, 222, 333]
         df1
Out[10]:    CC
         0 111
         1 222
         2 333
In [11]: len(df1.columns)
Out[11]: 1

# Note the persisting series with rows containing `NaN` values in df2
In [12]: df2['CC'] = [111, 222, 333]
         df2
Out[12]:    AA  BB   CC
         0 NaN NaN  111
         1 NaN NaN  222
         2 NaN NaN  333
In [13]: len(df2.columns)
Out[13]: 3

It is evident that the original columns in df2 have re-surfaced. Therefore, it is prudent to instead read the length of the columns index with len(pandas.core.frame.DataFrame.columns) to see if a dataframe is empty.

Practical solution

# New dataframe df
In [1]: df = pd.DataFrame({'AA' : [1, 2, 3], 'BB' : [11, 22, 33]})
        df
Out[1]:    AA  BB
        0   1  11
        1   2  22
        2   3  33

# This data manipulation approach results in an empty df
# because of a subset of values that are not available (`NaN`)
In [2]: df = df[df['AA'] == 5]
        df
Out[2]: Empty DataFrame
        Columns: [AA, BB]
        Index: []

# NOTE: the df is empty, BUT the columns are persistent
In [3]: len(df.columns)
Out[3]: 2

# And accordingly, the other answers on this page
In [4]: len(df.index)  # or len(df)
Out[4]: 0

In [5]: df.empty
Out[5]: True
# SOLUTION: conditionally check for empty columns
In [6]: if len(df.columns) != 0:  # <--- here
            # Do something, e.g. 
            # drop any columns containing rows with `NaN`
            # to make the df really empty
            df = df.dropna(how='all', axis=1)
        df
Out[6]: Empty DataFrame
        Columns: []
        Index: []

# Testing shows it is indeed empty now
In [7]: len(df.columns)
Out[7]: 0

Adding a new data series works as expected without the re-surfacing of empty columns (factually, without any series that were containing rows with only NaN):

In [8]: df['CC'] = [111, 222, 333]
         df
Out[8]:    CC
         0 111
         1 222
         2 333
In [9]: len(df.columns)
Out[9]: 1

I use the len function. It's much faster than empty. len(df.index) is even faster.

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(10000, 4), columns=list('ABCD'))

def empty(df):
    return df.empty

def lenz(df):
    return len(df) == 0

def lenzi(df):
    return len(df.index) == 0

'''
%timeit empty(df)
%timeit lenz(df)
%timeit lenzi(df)

10000 loops, best of 3: 13.9 µs per loop
100000 loops, best of 3: 2.34 µs per loop
1000000 loops, best of 3: 695 ns per loop

len on index seems to be faster
'''

Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to pandas

xlrd.biffh.XLRDError: Excel xlsx file; not supported Pandas Merging 101 How to increase image size of pandas.DataFrame.plot in jupyter notebook? Trying to merge 2 dataframes but get ValueError Python Pandas User Warning: Sorting because non-concatenation axis is not aligned How to show all of columns name on pandas dataframe? Pandas/Python: Set value of one column based on value in another column Python Pandas - Find difference between two data frames Pandas get the most frequent values of a column Python convert object to float

Examples related to dataframe

Trying to merge 2 dataframes but get ValueError How to show all of columns name on pandas dataframe? Python Pandas - Find difference between two data frames Pandas get the most frequent values of a column Display all dataframe columns in a Jupyter Python Notebook How to convert column with string type to int form in pyspark data frame? Display/Print one column from a DataFrame of Series in Pandas Binning column with python pandas Selection with .loc in python Set value to an entire column of a pandas dataframe