[hadoop] How to delete and update a record in Hive

Recently I was looking to resolve a similar issue, Apache Hive, Hadoop do not support Update/Delete operations. So ? So you have two ways:

  1. Use a backup table: Save the whole table in a backup_table, then truncate your input table, then re-write only the data you are intrested to mantain.
  2. Use Uber Hudi: It's a framework created by Uber to resolve the HDFS limitations including Deletion and Update. You can give a look in this link: https://eng.uber.com/hoodie/

an example for point 1:

Create table bck_table like input_table;
Insert overwrite table bck_table 
    select * from input_table;
Truncate table input_table;
Insert overwrite table input_table
    select * from bck_table where id <> 1;

NB: If the input_table is an external table you must follow the following link: How to truncate a partitioned external table in hive?

Examples related to hadoop

Hadoop MapReduce: Strange Result when Storing Previous Value in Memory in a Reduce Class (Java) What is the difference between spark.sql.shuffle.partitions and spark.default.parallelism? How to check Spark Version What are the pros and cons of parquet format compared to other formats? java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient How to export data from Spark SQL to CSV How to copy data from one HDFS to another HDFS? How to calculate Date difference in Hive Select top 2 rows in Hive Spark - load CSV file as DataFrame?

Examples related to hive

select rows in sql with latest date for each ID repeated multiple times PySpark: withColumn() with two conditions and three outcomes java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient Hive cast string to date dd-MM-yyyy How to save DataFrame directly to Hive? How to calculate Date difference in Hive Select top 2 rows in Hive Just get column names from hive table Create hive table using "as select" or "like" and also specify delimiter Hive Alter table change Column Name

Examples related to sql-delete

Delete all rows with timestamp older than x days How to delete duplicate rows in SQL Server? How to delete and update a record in Hive How to write a SQL DELETE statement with a SELECT statement in the WHERE clause? How can I delete using INNER JOIN with SQL Server? How to delete multiple rows in SQL where id = (x to y) Delete many rows from a table using id in Mysql How do I delete all the duplicate records in a MySQL table without temp tables Delete with "Join" in Oracle sql Query MySQL WHERE: how to write "!=" or "not equals"?