[hadoop] How to calculate Date difference in Hive

I'm a novice. I have a employee table with a column specifying the joining date and I want to retrieve the list of employees who have joined in the last 3 months. I understand we can get the current date using from_unixtime(unix_timestamp()). How do I calculate the datediff? Is there a built in DATEDIFF() function like in MS SQL? please advice!

This question is related to hadoop hive hiveql

The answer is


I would try this first

select * from employee where month(current_date)-3 = month(joining_date)

If you need the difference in seconds (i.e.: you're comparing dates with timestamps, and not whole days), you can simply convert two date or timestamp strings in the format 'YYYY-MM-DD HH:MM:SS' (or specify your string date format explicitly) using unix_timestamp(), and then subtract them from each other to get the difference in seconds. (And can then divide by 60.0 to get minutes, or by 3600.0 to get hours, etc.)

Example:

UNIX_TIMESTAMP('2017-12-05 10:01:30') - UNIX_TIMESTAMP('2017-12-05 10:00:00') AS time_diff -- This will return 90 (seconds). Unix_timestamp converts string dates into BIGINTs. 

More on what you can do with unix_timestamp() here, including how to convert strings with different date formatting: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions


yes datediff is implemented; see: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF

By the way I found this by Google-searching "hive datediff", it was the first result ;)


Examples related to hadoop

Hadoop MapReduce: Strange Result when Storing Previous Value in Memory in a Reduce Class (Java) What is the difference between spark.sql.shuffle.partitions and spark.default.parallelism? How to check Spark Version What are the pros and cons of parquet format compared to other formats? java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient How to export data from Spark SQL to CSV How to copy data from one HDFS to another HDFS? How to calculate Date difference in Hive Select top 2 rows in Hive Spark - load CSV file as DataFrame?

Examples related to hive

select rows in sql with latest date for each ID repeated multiple times PySpark: withColumn() with two conditions and three outcomes java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient Hive cast string to date dd-MM-yyyy How to save DataFrame directly to Hive? How to calculate Date difference in Hive Select top 2 rows in Hive Just get column names from hive table Create hive table using "as select" or "like" and also specify delimiter Hive Alter table change Column Name

Examples related to hiveql

PySpark: withColumn() with two conditions and three outcomes How to export data from Spark SQL to CSV How to calculate Date difference in Hive Select top 2 rows in Hive Add a column in a table in HIVE QL How to get/generate the create statement for an existing hive table? How do I output the results of a HiveQL query to CSV? How to select current date in Hive SQL Hive insert query like SQL Difference between Hive internal tables and external tables?