How do I output the results of a HiveQL query to CSV

Question

we would like to put the results of a Hive query to a CSV file  I thought the command should look like this   insert overwrite directory   home output csv  select books from table    When I run it  it says it completeld successfully but I can never find the file  How do I find this file or should I be extracting the data in a different way

User · Answer

You can use hive string function CONCAT WS  string delimiter  string str1  string str2   strn    for ex   hive -e  select CONCAT WS     cola colb colc    coln  from Mytable   gt   home user Mycsv csv

User · Answer

Similar to Ray s answer above  Hive View 2 0 in Hortonworks Data Platform also allows you to run a Hive query and then save the output as csv

User · Answer

The default separator is   A   In python language  it is   x01    When I want to change the delimiter  I use SQL like   SELECT col1  delimiter  col2  delimiter  col3       FROM table   Then  regard delimiter   A  as a new delimiter

User · Answer

hive  --outputformat csv2 -e  select   from yourtable   gt  my file csv   or  hive  --outputformat csv2 -e  select   from yourtable   gt   your path  file name csv   For tsv  just change csv to tsv in the above queries and run your queries

User · Answer

I had a similar issue and this is how I was able to address it   Step 1 - Loaded the data from Hive table into another table as follows  DROP TABLE IF EXISTS TestHiveTableCSV  CREATE TABLE TestHiveTableCSV  ROW FORMAT DELIMITED  FIELDS TERMINATED BY     LINES TERMINATED BY   n  AS SELECT Column List FROM TestHiveTable    Step 2 - Copied the blob from Hive warehouse to the new location with appropriate extension  Start-AzureStorageBlobCopy -DestContext  destContext  -SrcContainer  Source Container  -SrcBlob  hive warehouse TestHiveTableCSV 000000 0  -DestContainer  Destination Container  -DestBlob  CSV TestHiveTable csv

User · Answer

You can use INSERT     DIRECTORY      as in this example   INSERT OVERWRITE LOCAL DIRECTORY   tmp ca employees  SELECT name  salary  address FROM employees WHERE se state    CA     OVERWRITE and LOCAL have the same interpretations as before and paths are interpreted following the usual rules  One or more files will be written to  tmp ca employees  depending on the number of reducers invoked

User · Answer

I was looking for a similar solution  but the ones mentioned here would not work  My data had all variations of whitespace  space  newline  tab  chars and commas    To make the column data tsv safe  I replaced all  t chars in the column data with a space  and executed python code on the commandline to generate a csv file  as shown below   hive -e  tab replaced hql query     python -c  exec  import sys import csv reader   csv reader sys stdin  dialect csv excel tab  writer   csv writer sys stdout  dialect csv excel  nfor row in reader  writer writerow row       This created a perfectly valid csv  Hope this helps those who come looking for this solution

User · Answer

You should use CREATE TABLE AS SELECT  CTAS  statement to create a directory in HDFS with the files containing the results of the query  After that you will have to export those files from HDFS to your regular disk and merge them into a single file   You also might have to do some trickery to convert the files from   001  - delimited to CSV  You could use a custom CSV SerDe or postprocess the extracted file

User · Answer

In case you are doing it from Windows you can use Python script hivehoney to extract table data to local CSV file   It will    Login to bastion host   pbrun   kinit   beeline  with your query    Save echo from beeline to a file on Windows    Execute it like this   set PROXY HOST your bastion host  set SERVICE USER you func user  set LINUX USER your SOID  set LINUX PWD your pwd  python hh py --query file query sql

User · Answer

This is most csv friendly way I found to output the results of HiveQL  You don t need any grep or sed commands to format the data  instead hive supports it  just need to add extra tag of outputformat   hive --outputformat csv2 -e  select   from  lt table name gt  limit 20   gt   path toStore data results csv

User · Answer

Although it is possible to use INSERT OVERWRITE to get data out of Hive  it might not be the best method for your particular case  First let me explain what INSERT OVERWRITE does  then I ll describe the method I use to get tsv files from Hive tables   According to the manual  your query will store the data in a directory in HDFS  The format will not be csv      Data written to the filesystem is serialized as text with columns separated by  A and rows separated by newlines  If any of the columns are not of primitive type  then those columns are serialized to JSON format    A slight modification  adding the LOCAL keyword  will store the data in a local directory   INSERT OVERWRITE LOCAL DIRECTORY   home lvermeer temp  select books from table    When I run a similar query  here s what the output looks like    lvermeer hadoop temp   ll total 4 -rwxr-xr-x 1 lvermeer users 811 Aug  9 09 21 000000 0  lvermeer hadoop temp   head 000000 0   row1  col1 1234 col3 1234FALSE  row2  col1 5678 col3 5678TRUE   Personally  I usually run my query directly through Hive on the command line for this kind of thing  and pipe it into the local file like so   hive -e  select books from table   gt   home lvermeer temp tsv   That gives me a tab-separated file that I can use  Hope that is useful for you as well   Based on this patch-3682  I suspect a better solution is available when using Hive 0 11  but I am unable to test this myself  The new syntax should allow the following   INSERT OVERWRITE LOCAL DIRECTORY   home lvermeer temp   ROW FORMAT DELIMITED  FIELDS TERMINATED BY      select books from table    Hope that helps

User · Answer

Just to cover more following steps after kicking off the query  INSERT OVERWRITE LOCAL DIRECTORY   home lvermeer temp   ROW FORMAT DELIMITED  FIELDS TERMINATED BY      select books from table   In my case  the generated data under temp folder is in deflate format   and it looks like this     ls 000000 0 deflate   000001 0 deflate   000002 0 deflate   000003 0 deflate   000004 0 deflate   000005 0 deflate   000006 0 deflate   000007 0 deflate   Here s the command to unzip the deflate files and put everything into one csv file   hadoop fs -text  file    home lvermeer temp     gt   home lvermeer result csv

User · Answer

If you are using HUE this is fairly simple as well  Simply go to the Hive editor in HUE  execute your hive query  then save the result file locally as XLS or CSV  or you can save the result file to HDFS

User · Answer

I may be late to this one  but would help with the answer   echo  COL NAME1 COL NAME2 COL NAME3 COL NAME4    SAMPLE Data csv hive -e   select distinct concat COL 1       COL 2       COL 3       COL 4  from table Name where clause if required      SAMPLE Data csv

User · Answer

Use the command  hive -e  quot use  database name   select   from  table name  LIMIT 10  quot   gt   path to file my file name csv I had a huge dataset whose details I was trying to organize and determine the types of attacks and the numbers of each type  An example that I used on my practice that worked  and had a little more details  goes something like this  hive -e  quot use DataAnalysis  select attack cat   case when attack cat     Backdoor  then  Backdoors   when length attack cat     0 then  Normal   when attack cat     Backdoors  then  Backdoors   when attack cat     Fuzzers  then  Fuzzers   when attack cat     Generic  then  Generic   when attack cat     Reconnaissance  then  Reconnaissance   when attack cat     Shellcode  then  Shellcode   when attack cat     Worms  then  Worms   when attack cat     Analysis  then  Analysis   when attack cat     DoS  then  DoS   when attack cat     Exploits  then  Exploits   when trim attack cat      Fuzzers  then  Fuzzers   when trim attack cat      Shellcode  then  Shellcode   when trim attack cat      Reconnaissance  then  Reconnaissance  end  count    from actualattacks group by attack cat  quot  gt  root data output results2 csv

User · Answer

This shell command prints the output format in csv to output txt without the column headers      hive --outputformat csv2 -f  hivedatascript hql  --hiveconf hive cli print header false  gt  output txt

User · Answer

If you want a CSV file then you can modify Lukas  solutions as follows  assuming you are on a linux box    hive -e  select books from table    sed  s    space        g   gt   home lvermeer temp csv

User · Answer

I tried various options  but this would be one of the simplest solution for Python Pandas   hive -e  select books from table    grep        gt  temp csv  df pd read csv  temp csv  sep        You can also use tr         to convert     to

[database] How do I output the results of a HiveQL query to CSV?

Examples related to database

Examples related to hadoop

Examples related to hive

Examples related to hiveql