How to check if spark dataframe is empty?


Right now, I have to use df.count > 0 to check if the DataFrame is empty or not. But it is kind of inefficient. Is there any better way to do that?


PS: I want to check if it's empty so that I only save the DataFrame if it's not empty

This question is tagged with apache-spark apache-spark-sql

~ Asked on 2015-09-22 02:52:55

The Best Answer is


For Spark 2.1.0, my suggestion would be to use head(n: Int) or take(n: Int) with isEmpty, whichever one has the clearest intent to you.


with Python equivalent:

len(df.head(1)) == 0  # or bool(df.head(1))
len(df.take(1)) == 0  # or bool(df.take(1))

Using df.first() and df.head() will both return the java.util.NoSuchElementException if the DataFrame is empty. first() calls head() directly, which calls head(1).head.

def first(): T = head()
def head(): T = head(1).head

head(1) returns an Array, so taking head on that Array causes the java.util.NoSuchElementException when the DataFrame is empty.

def head(n: Int): Array[T] = withAction("head", limit(n).queryExecution)(collectFromPlan)

So instead of calling head(), use head(1) directly to get the array and then you can use isEmpty.

take(n) is also equivalent to head(n)...

def take(n: Int): Array[T] = head(n)

And limit(1).collect() is equivalent to head(1) (notice limit(n).queryExecution in the head(n: Int) method), so the following are all equivalent, at least from what I can tell, and you won't have to catch a java.util.NoSuchElementException exception when the DataFrame is empty.


I know this is an older question so hopefully it will help someone using a newer version of Spark.

~ Answered on 2017-04-13 04:10:19


I would say to just grab the underlying RDD. In Scala:


in Python:


That being said, all this does is call take(1).length, so it'll do the same thing as Rohan answered...just maybe slightly more explicit?

~ Answered on 2015-09-22 04:14:38

Most Viewed Questions: