How does createOrReplaceTempView work in Spark

Question

I am new to Spark and Spark SQL   How does createOrReplaceTempView work in Spark    If we register an RDD of objects as a table will spark keep all the data in memory

User · Answer

CreateOrReplaceTempView will create a temporary view of the table on memory it is not presistant at this moment but you can run sql query on top of that . if you want to save it you can either persist or use saveAsTable to save.

first we read data in csv format and then convert to data frame and create a temp view

Reading data in csv format

val data = spark.read.format("csv").option("header","true").option("inferSchema","true").load("FileStore/tables/pzufk5ib1500654887654/campaign.csv")

printing the schema

data.printSchema

data.createOrReplaceTempView("Data")

Now we can run sql queries on top the table view we just created

  %sql select Week as Date,Campaign Type,Engagements,Country from Data order     by Date asc

User · Answer

createOrReplaceTempView creates  or replaces if that view name already exists  a lazily evaluated  view  that you can then use like a hive table in Spark SQL  It does not persist to memory unless you cache the dataset that underpins the view    scala gt  val s   Seq 1 2 3  toDF  num   s  org apache spark sql DataFrame    num  int   scala gt  s createOrReplaceTempView  nums    scala gt  spark table  nums   res22  org apache spark sql DataFrame    num  int   scala gt  spark table  nums   cache res23  org apache spark sql Dataset org apache spark sql Row     num  int   scala gt  spark table  nums   count res24  Long   3   The data is cached fully only after the  count call  Here s proof it s been cached     Related SO  spark createOrReplaceTempView vs createGlobalTempView  Relevant quote  comparing to persistent table    Unlike the createOrReplaceTempView command  saveAsTable will materialize the contents of the DataFrame and create a pointer to the data in the Hive metastore   from https   spark apache org docs latest sql-programming-guide html saving-to-persistent-tables  Note   createOrReplaceTempView was formerly registerTempTable

User · Answer

SparkSQl support writing programs using Dataset and Dataframe API  along with it need to support sql   In order to support Sql on DataFrames  first it requires a table definition with column names are required  along with if it creates tables the hive metastore will get lot unnecessary tables  because Spark-Sql natively resides on hive  So it will create a temporary view  which temporarily available in hive for time being and used as any other hive table  once the Spark Context stop it will be removed   In order to create the view  developer need an utility called createOrReplaceTempView

[apache-spark] How does createOrReplaceTempView work in Spark?

Examples related to apache-spark

Examples related to apache-spark-sql

Examples related to spark-dataframe