How to export a Hive table into a CSV file

Question

I used this Hive query to export a table into a CSV file   INSERT OVERWRITE DIRECTORY   user data output test  select column1  column2 from table1    The file generated  000000 0  does not have comma separator  Is this the right way to generate CSV file  If no  please let me know how can I generate the CSV file

User · Answer

Recent versions of hive comes with this feature.

INSERT OVERWRITE LOCAL DIRECTORY '/home/lvermeer/temp' 
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ',' 
select * from table;

this way you can choose your own delimiter and file name. Just be careful with the "OVERWRITE" it will try to delete everything from the mentioned folder.

User · Answer

The problem solutions are fine but I found some problems in both    As Carter Shanklin said  with this command we will obtain a csv file with the results of the query in the path specified   insert overwrite local directory   home carter staging  row format delimited fields terminated by     select   from hugetable    The problem with this solution is that the csv obtained won  t have headers and will create a file that is not a CSV  so we have to rename it   As user1922900 said  with the following command we will obtain a CSV files with the results of the query in the specified file and with headers   hive -e  select   from some table    sed  s   t    g   gt   home yourfile csv   With this solution we will get a CSV file with the result rows of our query  but with log messages between these rows too  As a solution of this problem I tried this  but without results    So  to solve all these issues I created a script that execute a list of queries  create a folder  with a timestamp  where it stores the results  rename the files obtained  remove the unnecesay files and it also add the respective headers       bin sh  QUERIES   select   from table1   select   from table2    IFS     directoryname   echo  ScriptResults timestamp    mkdir  directoryname   counter 1  for query in   QUERIES      do       tablename  query  counter       hive -S -e  INSERT OVERWRITE LOCAL DIRECTORY   data 2 DOMAIN USERS SANUK users  USER  tablename  ROW FORMAT DELIMITED FIELDS TERMINATED BY      query         hive -S -e  set hive cli print header true   query limit 1    head -1   sed  s   t    g   gt  gt   data 2 DOMAIN USERS SANUK users  USER  tablename header csv      mv  tablename 000000 0  tablename  tablename csv      cat  tablename  tablename csv  gt  gt   tablename header csv       rm  tablename  tablename csv      mv  tablename header csv  tablename  tablename csv       mv  tablename  tablename csv  directoryname      counter    counter 1        rm -rf  tablename    done

User · Answer

The following script should work for you      bin bash hive -e  insert overwrite local directory   LocalPath   row format delimited fields terminated by     select   from Mydatabase Mytable limit 100  cat  LocalPath    gt   LocalPath table csv   I used limit 100 to limit the size of data since I had a huge table  but you can delete it to export the entire table

User · Answer

There are ways to change the default delimiter  as shown by other answers   There are also ways to convert the raw output to csv with some bash scripting  There are 3 delimiters to consider though  not just  001  Things get a bit more complicated when your hive table has maps    I wrote a bash script that can handle all 3 default delimiters   001  002 and  003  from hive and output a csv  The script and some more info are here      Hive Default Delimiters to CSV      Hive s default delimiters are  Row Delimiter   gt  Control-A    001   Collection Item Delimiter   gt  Control-B    002   Map Key Delimiter   gt  Control-C    003         There are ways to change these delimiters when exporting tables but   sometimes you might still get stuck needing to convert this to csv        Here s a quick bash script that can handle a DB export that s   segmented in multiple files and has the default delimiters  It will   output a single CSV file       It is assumed that the segments all have the naming convention 000  0  INDIRECTORY  path to input directory  for f in  INDIRECTORY 000  0  do    echo  Processing  f file        cat -v  f          LC ALL C sed -e  s      g           LC ALL C sed -e  s   A       g           LC ALL C sed -e  s   C  B                    g           LC ALL C sed -e  s   B           g            LC ALL C sed -e  s   C           g           LC ALL C sed -e  s      g   gt   f-temp done echo  you can echo your header here if you like   gt   INDIRECTORY final output csv cat  INDIRECTORY  -temp  gt  gt   INDIRECTORY final output csv rm  INDIRECTORY  -temp    More explanation on the gist

User · Answer

I had a similar issue and this is how I was able to address it   Step 1 - Loaded the data from hive table into another table as follows     DROP TABLE IF EXISTS TestHiveTableCSV    CREATE TABLE TestHiveTableCSV ROW FORMAT DELIMITED FIELDS TERMINATED BY     LINES TERMINATED BY   n  AS   SELECT Column List FROM TestHiveTable    Step 2 - Copied the blob from hive warehouse to the new location with appropriate extension     Start-AzureStorageBlobCopy    -DestContext  destContext   -SrcContainer  Source Container     -SrcBlob  hive warehouse TestHiveTableCSV 000000 0    -DestContainer  Destination Container      -DestBlob  CSV TestHiveTable csv    Hope this helps   Best Regards  Dattatrey Sindol  Datta  http   dattatreysindol com

User · Answer

I have used simple linux shell piping   perl to convert hive generated output from tsv to csv   hive -e  SELECT col1  col2      FROM table name    perl -lpe  s       g  s       g  s  t     g   gt  output file csv    I got the updated perl regex from someone in stackoverflow some time ago   The result will be like regular csv    col1   col2   col3     and so on

User · Answer

try  hive --outputformat  csv2 -e  select   from YOUR TABLE     This worked for me  my hive version is  Hive 3 1 0 3 1 0 0-78

User · Answer

Here using Hive warehouse dir you can export data instead of Hive table   first give hive warehouse path and after local path where you want to store the  csv file For this command is bellow  -  hadoop fs -cat  user hdusr warehouse HiveDb tableName    gt   users hadoop test nilesh sample csv

User · Answer

In case you are doing it from Windows you can use Python script hivehoney to extract table data to local CSV file   It will    Login to bastion host   pbrun   kinit   beeline  with your query    Save echo from beeline to a file on Windows    Execute it like this   set PROXY HOST your bastion host  set SERVICE USER you func user  set LINUX USER your SOID  set LINUX PWD your pwd  python hh py --query file query sql

User · Answer

INSERT OVERWRITE LOCAL DIRECTORY   home lvermeer temp  ROW FORMAT DELIMITED FIELDS TERMINATED BY     select   from table     is the correct answer   If the number of records is really big  based on the number of files generated   the following command would give only partial result   hive -e  select   from some table   gt   home yourfile csv

User · Answer

You can not have a delimiter for query output after generating the report  as you did    you can change the delimiter to comma   It comes with default delimiter  001  inivisible character    hadoop fs -cat  user data output test    tr   01       gt  gt outputwithcomma csv   check this also

User · Answer

This is a much easier way to do it within Hive s SQL   set hive execution engine tez  set hive merge tezfiles true  set hive exec compress output false   INSERT OVERWRITE DIRECTORY   tmp job   ROW FORMAT DELIMITED FIELDS TERMINATED by     NULL DEFINED AS    STORED AS TEXTFILE SELECT   from table

User · Answer

If you re using Hive 11 or better you can use the INSERT statement with the LOCAL keyword   Example   insert overwrite local directory   home carter staging  row format delimited fields terminated by     select   from hugetable    Note that this may create multiple files and you may want to concatenate them on the client side after it s done exporting   Using this approach means you don t need to worry about the format of the source tables  can export based on arbitrary SQL query  and can select your own delimiters and output formats

User · Answer

That should work for you   tab separated  hive -e  select   from some table     home yourfile tsv comma separated  hive -e  select   from some table    sed  s   t    g     home yourfile csv

User · Answer

or use this  hive -e  select   from your Table    sed  s   t    g    gt   home yourfile csv   You can also specify property set hive cli print header true before the SELECT to ensure that header along with data is created and copied to file   For example   hive -e  set hive cli print header true  select   from your Table    sed  s   t    g    gt   home yourfile csv   If you don t want to write to local file system  pipe the output of sed command back into HDFS using the hadoop fs -put command   It may also be convenient to SFTP to your files using something like Cyberduck  or you can use scp to connect via terminal   command prompt

User · Answer

Below is the end-to-end solution that I use to export Hive table data to HDFS as a single named CSV file with a header   it is unfortunate that it s not possible to do with one HQL statement  It consists of several commands  but it s quite intuitive  I think  and it does not rely on the internal representation of Hive tables  which may change from time to time  Replace  DIRECTORY  with  LOCAL DIRECTORY  if you want to export the data to a local filesystem versus HDFS     cleanup the existing target HDFS directory  if it exists sudo -u hdfs hdfs dfs -rm -f -r  tmp data my exported table name      export the data using Beeline CLI  it will create a data file with a surrogate name in the target HDFS directory  beeline -u jdbc hive2   my hostname 10000 -n hive -e  INSERT OVERWRITE DIRECTORY   tmp data my exported table name  ROW FORMAT DELIMITED FIELDS TERMINATED BY     SELECT   FROM my exported table name     set the owner of the target HDFS directory to whatever UID you ll be using to run the subsequent commands  root in this case  sudo -u hdfs hdfs dfs -chown -R root hdfs  tmp data my exported table name    write the CSV header record to a separate file  make sure that its name is higher in the sort order than for the data file in the target HDFS directory    also  obviously  make sure that the number and the order of fields is the same as in the data file echo  field name 1 field name 2 field name 3 field name 4 field name 5    hadoop fs -put -  tmp data my exported table name  header csv    concatenate all  2  files in the target HDFS directory into the final CSV data file with a header    this is where the sort order of the file names is important  hadoop fs -cat  tmp data my exported table name     hadoop fs -put -  tmp data my exported table name my exported table name csv    give the permissions for the exported data to other users as necessary sudo -u hdfs hdfs dfs -chmod -R 777  tmp data hive extr drivers

[csv] How to export a Hive table into a CSV file?

Examples related to csv

Examples related to hive