[hadoop] Hive query output to file

  1. Create an external table
  2. Insert data into the table
  3. Optional drop the table later, which wont delete that file since it is an external table

Example:

Creating external table to store the query results at '/user/myName/projectA_additionaData/'

CREATE EXTERNAL TABLE additionaData
(
     ID INT,
     latitude STRING,
     longitude STRING
)
COMMENT 'Additional Data gathered by joining of the identified cities with latitude and longitude data' 
ROW FORMAT DELIMITED FIELDS
TERMINATED BY ',' STORED AS TEXTFILE location '/user/myName/projectA_additionaData/';

Feeding the query results into the temp table

 insert into additionaData 
     Select T.ID, C.latitude, C.longitude 
     from TWITER  
     join CITY C on (T.location_name = C.location);

Dropping the temp table

drop table additionaData