[hadoop] How to load a text file into a Hive table stored as sequence files

I have a hive table stored as a sequencefile.

I need to load a text file into this table. How do I load the data into this table?

This question is related to hadoop hive

The answer is


You cannot directly create a table stored as a sequence file and insert text into it. You must do this:

  1. Create a table stored as text
  2. Insert the text file into the text table
  3. Do a CTAS to create the table stored as a sequence file.
  4. Drop the text table if desired

Example:

CREATE TABLE test_txt(field1 int, field2 string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';

LOAD DATA INPATH '/path/to/file.tsv' INTO TABLE test_txt;

CREATE TABLE test STORED AS SEQUENCEFILE
AS SELECT * FROM test_txt;

DROP TABLE test_txt;

You can load the text file into a textfile Hive table and then insert the data from this table into your sequencefile.

Start with a tab delimited file:

% cat /tmp/input.txt
a       b
a2      b2

create a sequence file

hive> create table test_sq(k string, v string) stored as sequencefile;

try to load; as expected, this will fail:

hive> load data local inpath '/tmp/input.txt' into table test_sq;

But with this table:

hive> create table test_t(k string, v string) row format delimited fields terminated by '\t' stored as textfile;

The load works just fine:

hive> load data local inpath '/tmp/input.txt' into table test_t;
OK
hive> select * from test_t;
OK
a       b
a2      b2

Now load into the sequence table from the text table:

insert into table test_sq select * from test_t;

Can also do load/insert with overwrite to replace all.


The simple way is to create table as textfile and move the file to the appropriate location

CREATE EXTERNAL TABLE mytable(col1 string, col2 string)
row format delimited fields terminated by '|' stored as textfile;

Copy the file to the HDFS Location where table is created.
Hope this helps!!!