I am using spark-csv to load data into a DataFrame. I want to do a simple query and display the content:
val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").load("my.csv")
df.registerTempTable("tasks")
results = sqlContext.sql("select col from tasks");
results.show()
The col seems truncated:
scala> results.show();
+--------------------+
| col|
+--------------------+
|2015-11-16 07:15:...|
|2015-11-16 07:15:...|
|2015-11-16 07:15:...|
|2015-11-16 07:15:...|
|2015-11-16 07:15:...|
|2015-11-16 07:15:...|
|2015-11-16 07:15:...|
|2015-11-16 07:15:...|
|2015-11-16 07:15:...|
|2015-11-16 07:15:...|
|2015-11-16 07:15:...|
|2015-11-16 07:15:...|
|2015-11-16 07:15:...|
|2015-11-16 07:15:...|
|2015-11-16 07:15:...|
|2015-11-06 07:15:...|
|2015-11-16 07:15:...|
|2015-11-16 07:21:...|
|2015-11-16 07:21:...|
|2015-11-16 07:21:...|
+--------------------+
How do I show the full content of the column?
This question is related to
apache-spark
dataframe
spark-csv
output-formatting
Below code would help to view all rows without truncation in each column
df.show(df.count(), False)
results.show(false)
will show you the full column content.
Show method by default limit to 20, and adding a number before false
will show more rows.
results.show(20,false)
did the trick for me in Scala.
PYSPARK
In the below code, df
is the name of dataframe. 1st parameter is to show all rows in the dataframe dynamically rather than hardcoding a numeric value. The 2nd parameter will take care of displaying full column contents since the value is set as False
.
df.show(df.count(),False)
SCALA
In the below code, df
is the name of dataframe. 1st parameter is to show all rows in the dataframe dynamically rather than hardcoding a numeric value. The 2nd parameter will take care of displaying full column contents since the value is set as false
.
df.show(df.count().toInt,false)
try this command :
df.show(df.count())
If you put results.show(false)
, results will not be truncated
In c# Option("truncate", false)
does not truncate data in the output.
StreamingQuery query = spark
.Sql("SELECT * FROM Messages")
.WriteStream()
.OutputMode("append")
.Format("console")
.Option("truncate", false)
.Start();
I use the plugin Chrome extension works pretty well:
[https://userstyles.org/styles/157357/jupyter-notebook-wide][1]
The following answer applies to a Spark Streaming application.
By setting the "truncate" option to false, you can tell the output sink to display the full column.
val query = out.writeStream
.outputMode(OutputMode.Update())
.format("console")
.option("truncate", false)
.trigger(Trigger.ProcessingTime("5 seconds"))
.start()
Within Databricks you can visualize the dataframe in a tabular format. With the command:
display(results)
It will look like
Tried this in pyspark
df.show(truncate=0)
results.show(20, False)
or results.show(20, false)
depending on whether you are running it on Java/Scala/Python
Try this in scala:
df.show(df.count.toInt, false)
The show method accepts an integer and a Boolean value but df.count returns Long...so type casting is required
The other solutions are good. If these are your goals:
These two lines are useful ...
df.persist
df.show(df.count, false) // in Scala or 'False' in Python
By persisting, the 2 executor actions, count and show, are faster & more efficient when using persist
or cache
to maintain the interim underlying dataframe structure within the executors. See more about persist and cache.
Source: Stackoverflow.com