[hadoop] Hive query output to file

I run hive query by java code. Example:

"SELECT * FROM table WHERE id > 100"

How to export result to hdfs file.

This question is related to hadoop hive

The answer is


Enter this line into Hive command line interface:

insert overwrite directory '/data/test' row format delimited fields terminated by '\t' stored as textfile select * from testViewQuery;

testViewQuery - some specific view


@sarath how to overwrite the file if i want to run another select * command from a different table and write to same file ?

INSERT OVERWRITE LOCAL DIRECTORY '/home/training/mydata/outputs' SELECT expl , count(expl) as total
FROM ( SELECT explode(splits) as expl FROM ( SELECT split(words,' ') as splits FROM wordcount ) t2 ) t3 GROUP BY expl ;

This is an example to sarath's question

the above is a word count job stored in outputs file which is in local directory :)


I agree with tnguyen80's response. Please note that when there is a specific string value in query better to given entire query in double quotes.

For example:

$hive -e "select * from table where city = 'London' and id >=100" > /home/user/outputdirectory/city details.csv

The ideal way to do it will be using "INSERT OVERWRITE DIRECTORY '/pathtofile' select * from temp where id > 100" instead of "hive -e 'select * from...' > /filepath.txt"


Two ways can store HQL query results:

  1. Save into HDFS Location
INSERT OVERWRITE DIRECTORY "HDFS Path" ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
SELECT * FROM XXXX LIMIT 10;
  1. Save to Local File
$hive  -e "select * from table_Name" > ~/sample_output.txt
$hive -e "select * from table where city = 'London' and id >=100" > /home/user/outputdirectory/city details.csv

This will put the results in tab delimited file(s) under a directory:

INSERT OVERWRITE LOCAL DIRECTORY '/home/hadoop/YourTableDir'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE
SELECT * FROM table WHERE id > 100;

The following query will insert the results directly into HDFS:

INSERT OVERWRITE DIRECTORY '/path/to/output/dir' SELECT * FROM table WHERE id > 100;

This command will redirect the output to a text file of your choice:

$hive -e "select * from table where id > 10" > ~/sample_output.txt

To directly save the file in HDFS, use the below command:

hive> insert overwrite  directory '/user/cloudera/Sample' row format delimited fields terminated by '\t' stored as textfile select * from table where id >100;

This will put the contents in the folder /user/cloudera/Sample in HDFS.


  1. Create an external table
  2. Insert data into the table
  3. Optional drop the table later, which wont delete that file since it is an external table

Example:

Creating external table to store the query results at '/user/myName/projectA_additionaData/'

CREATE EXTERNAL TABLE additionaData
(
     ID INT,
     latitude STRING,
     longitude STRING
)
COMMENT 'Additional Data gathered by joining of the identified cities with latitude and longitude data' 
ROW FORMAT DELIMITED FIELDS
TERMINATED BY ',' STORED AS TEXTFILE location '/user/myName/projectA_additionaData/';

Feeding the query results into the temp table

 insert into additionaData 
     Select T.ID, C.latitude, C.longitude 
     from TWITER  
     join CITY C on (T.location_name = C.location);

Dropping the temp table

drop table additionaData

To set output directory and output file format and more, try the following:

INSERT OVERWRITE [LOCAL] DIRECTORY directory1
[ROW FORMAT row_format] [STORED AS file_format] 
SELECT ... FROM ...

Example:

INSERT OVERWRITE DIRECTORY '/path/to/output/dir'
ROW FORMAT DELIMITED
STORED AS PARQUET
SELECT * FROM table WHERE id > 100;