[hadoop] When to use Hadoop, HBase, Hive and Pig?

Short answer to this question is -

Hadoop - Is Framework which facilitates distributed file system and programming model which allow us to store humongous sized data and process data in distributed fashion very efficiently and with very less processing time compare to traditional approaches.

(HDFS - Hadoop Distributed File system) (Map Reduce - Programming Model for distributed processing)

Hive - Is query language which allows to read/write data from Hadoop distributed file system in a very popular SQL like fashion. This made life easier for many non-programming background people as they don't have to write Map-Reduce program anymore except for very complex scenarios where Hive is not supported.

Hbase - Is Columnar NoSQL Database. Underlying storage layer for Hbase is again HDFS. Most important use case for this database is to be able to store billion's of rows with million's of columns. Low latency feature of Hbase helps faster and random access of record over distributed data, is very important feature to make it useful for complex projects like Recommender Engines. Also it's record level versioning capability allow user to store transactional data very efficiently (this solves the problem of updating records we have with HDFS and Hive)

Hope this is helpful to quickly understand the above 3 features.

Examples related to hadoop

Hadoop MapReduce: Strange Result when Storing Previous Value in Memory in a Reduce Class (Java) What is the difference between spark.sql.shuffle.partitions and spark.default.parallelism? How to check Spark Version What are the pros and cons of parquet format compared to other formats? java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient How to export data from Spark SQL to CSV How to copy data from one HDFS to another HDFS? How to calculate Date difference in Hive Select top 2 rows in Hive Spark - load CSV file as DataFrame?

Examples related to hbase

When to use Hadoop, HBase, Hive and Pig? Hive load CSV with commas in quoted fields Hbase quickly count number of rows How to delete all data from solr and hbase

Examples related to hive

select rows in sql with latest date for each ID repeated multiple times PySpark: withColumn() with two conditions and three outcomes java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient Hive cast string to date dd-MM-yyyy How to save DataFrame directly to Hive? How to calculate Date difference in Hive Select top 2 rows in Hive Just get column names from hive table Create hive table using "as select" or "like" and also specify delimiter Hive Alter table change Column Name

Examples related to apache-pig

When to use Hadoop, HBase, Hive and Pig? PIG how to count a number of rows in alias Difference between Pig and Hive? Why have both?