PySpark 2 0 The size or shape of a DataFrame

Question

I am trying to find out the size shape of a DataFrame in PySpark  I do not see a single function that can do this   In Python I can do  data shape     Is there a similar function in PySpark  This is my current solution  but I am looking for an element one  row number   data count   column number   len data dtypes    The computation of the number of columns is not ideal

User · Answer

You can get its shape with  print  df count    len df columns

User · Answer

Add this to the your code  import pyspark def spark shape self       return  self count    len self columns   pyspark sql dataframe DataFrame shape   spark shape  Then you can do  gt  gt  gt  df shape    10000  10   But just remind you that  count   can be very slow for very large table that has not been persisted

User · Answer

Use df count   to get the number of rows

User · Answer

I think there is not similar function like data shape in Spark  But I will use len data columns  rather than len data dtypes

User · Answer

print  df count    len df columns      is easier for smaller datasets    However if the dataset is huge  an alternative approach would be to use pandas and arrows to convert the dataframe to pandas df and call shape  spark conf set  spark sql execution arrow enabled    true   spark conf set  spark sql crossJoin enabled    true   print df toPandas   shape

[dataframe] PySpark 2.0 The size or shape of a DataFrame

Examples related to dataframe

Examples related to size

Examples related to pyspark

Examples related to shape