I am new to Spark and Spark SQL.
How does createOrReplaceTempView
work in Spark?
If we register an RDD
of objects as a table will spark keep all the data in memory?
This question is related to
apache-spark
apache-spark-sql
spark-dataframe
SparkSQl support writing programs using Dataset and Dataframe API, along with it need to support sql.
In order to support Sql on DataFrames, first it requires a table definition with column names are required, along with if it creates tables the hive metastore will get lot unnecessary tables, because Spark-Sql natively resides on hive. So it will create a temporary view, which temporarily available in hive for time being and used as any other hive table, once the Spark Context stop it will be removed.
In order to create the view, developer need an utility called createOrReplaceTempView
CreateOrReplaceTempView will create a temporary view of the table on memory it is not presistant at this moment but you can run sql query on top of that . if you want to save it you can either persist or use saveAsTable to save.
first we read data in csv format and then convert to data frame and create a temp view
Reading data in csv format
val data = spark.read.format("csv").option("header","true").option("inferSchema","true").load("FileStore/tables/pzufk5ib1500654887654/campaign.csv")
printing the schema
data.printSchema
data.createOrReplaceTempView("Data")
Now we can run sql queries on top the table view we just created
%sql select Week as Date,Campaign Type,Engagements,Country from Data order by Date asc
Source: Stackoverflow.com