[sql] Inserting Data into Hive Table

I am new to hive. I have successfully setup a single node hadoop cluster for development purpose and on top of it, I have installed hive and pig.

I created a dummy table in hive:

create table foo (id int, name string);

Now, I want to insert data into this table. Can I add data just like sql one record at a time? kindly help me with an analogous command to:

insert into foo (id, name) VALUES (12,"xyz);

Also, I have a csv file which contains data in the format:

1,name1
2,name2
..
..

..


1000,name1000

How can I load this data into the dummy table?

This question is related to sql insert hadoop hive

The answer is


this is supported from version hive 0.14

INSERT INTO TABLE pd_temp(dept,make,cost,id,asmb_city,asmb_ct,retail) VALUES('production','thailand',10,99202,'northcarolina','usa',20)


I think the best way is:
a) Copy data into HDFS (if it is not already there)
b) Create external table over your CSV like this

CREATE EXTERNAL TABLE TableName (id int, name string)
ROW FORMAT DELIMITED   
FIELDS TERMINATED BY ',' 
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION 'place in HDFS';

c) You can start using TableName already by issuing queries to it.
d) if you want to insert data into other Hive table:

insert overwrite table finalTable select * from table name;

What ever data you have inserted into one text file or log file that can put on one path in hdfs and then write a query as follows in hive

  hive>load data inpath<<specify inputpath>> into table <<tablename>>;

EXAMPLE:

hive>create table foo (id int, name string)
row format delimited
fields terminated by '\t' or '|'or ','
stored as text file;
table created..
    DATA INSERTION::
    hive>load data inpath '/home/hive/foodata.log' into table foo;

You may try this, I have developed a tool to generate hive scripts from a csv file. Following are few examples on how files are generated. Tool -- https://sourceforge.net/projects/csvtohive/?source=directory

  1. Select a CSV file using Browse and set hadoop root directory ex: /user/bigdataproject/

  2. Tool Generates Hadoop script with all csv files and following is a sample of generated Hadoop script to insert csv into Hadoop

    #!/bin/bash -v
    hadoop fs -put ./AllstarFull.csv /user/bigdataproject/AllstarFull.csv hive -f ./AllstarFull.hive

    hadoop fs -put ./Appearances.csv /user/bigdataproject/Appearances.csv hive -f ./Appearances.hive

    hadoop fs -put ./AwardsManagers.csv /user/bigdataproject/AwardsManagers.csv hive -f ./AwardsManagers.hive

  3. Sample of generated Hive scripts

    CREATE DATABASE IF NOT EXISTS lahman;
    USE lahman;
    CREATE TABLE AllstarFull (playerID string,yearID string,gameNum string,gameID string,teamID string,lgID string,GP string,startingPos string) row format delimited fields terminated by ',' stored as textfile;
    LOAD DATA INPATH '/user/bigdataproject/AllstarFull.csv' OVERWRITE INTO TABLE AllstarFull;
    SELECT * FROM AllstarFull;

Thanks Vijay


There's no direct way to insert 1 record at a time from the terminal, however, here's an easy straight forward workaround which I usually use when I want to test something:

Assuming that t is a table with at least 1 record. It doesn't matter what is the type or number of columns.

INSERT INTO TABLE foo
SELECT '12', 'xyz'
FROM t
LIMIT 1;

Hadoop file system does not support appending data to the existing files. Although, you can load your CSV file into HDFS and tell Hive to treat it as an external table.


to insert ad-hoc value like (12,"xyz), do this:

insert into table foo select * from (select 12,"xyz")a;

It's a limitation of hive.

1.You cannot update data after it is inserted

2.There is no "insert into table values ... " statement

3.You can only load data using bulk load

4.There is not "delete from " command

5.You can only do bulk delete

But you still want to insert record from hive console than you can do select from statck. refer this


Use this -

create table dummy_table_name as select * from source_table_name;

This will create the new table with existing data available on source_table_name.


Hive apparently supports INSERT...VALUES starting in Hive 0.14.

Please see the section 'Inserting into tables from SQL' at: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML


You can use following lines of code to insert values into an already existing table. Here the table is db_name.table_name having two columns, and I am inserting 'All','done' as a row in the table.

insert into table db_name.table_name
select 'ALL','Done';

Hope this was helpful.


Examples related to sql

Passing multiple values for same variable in stored procedure SQL permissions for roles Generic XSLT Search and Replace template Access And/Or exclusions Pyspark: Filter dataframe based on multiple conditions Subtracting 1 day from a timestamp date PYODBC--Data source name not found and no default driver specified select rows in sql with latest date for each ID repeated multiple times ALTER TABLE DROP COLUMN failed because one or more objects access this column Create Local SQL Server database

Examples related to insert

How to insert current datetime in postgresql insert query How to add element in Python to the end of list using list.insert? Python pandas insert list into a cell Field 'id' doesn't have a default value? Insert a row to pandas dataframe Insert at first position of a list in Python How can INSERT INTO a table 300 times within a loop in SQL? How to refresh or show immediately in datagridview after inserting? Insert node at a certain position in a linked list C++ select from one table, insert into another table oracle sql query

Examples related to hadoop

Hadoop MapReduce: Strange Result when Storing Previous Value in Memory in a Reduce Class (Java) What is the difference between spark.sql.shuffle.partitions and spark.default.parallelism? How to check Spark Version What are the pros and cons of parquet format compared to other formats? java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient How to export data from Spark SQL to CSV How to copy data from one HDFS to another HDFS? How to calculate Date difference in Hive Select top 2 rows in Hive Spark - load CSV file as DataFrame?

Examples related to hive

select rows in sql with latest date for each ID repeated multiple times PySpark: withColumn() with two conditions and three outcomes java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient Hive cast string to date dd-MM-yyyy How to save DataFrame directly to Hive? How to calculate Date difference in Hive Select top 2 rows in Hive Just get column names from hive table Create hive table using "as select" or "like" and also specify delimiter Hive Alter table change Column Name