Hadoop Hive Loading data from csv on a local machine

Question

As this is coming from a newbie     I had Hadoop and Hive set up for me  so I can run Hive queries on my computer accessing data on AWS cluster  Can I run Hive queries with  csv data stored on my computer  like I did with MS SQL Server    How do I load  csv data into Hive then  What does it have to do with Hadoop and which mode I should run that one   What settings I should care about so that if I did something wrong I can always go back and run queries on Amazon without compromising what was set up for me earlier

User · Answer

Let me work you through the following simple steps:

Steps:

First, create a table on hive using the field names in your csv file. Lets say for example, your csv file contains three fields (id, name, salary) and you want to create a table in hive called "staff". Use the below code to create the table in hive.

hive> CREATE TABLE Staff (id int, name string, salary double) row format delimited fields terminated by ',';

Second, now that your table is created in hive, let us load the data in your csv file to the "staff" table on hive.

hive>  LOAD DATA LOCAL INPATH '/home/yourcsvfile.csv' OVERWRITE INTO TABLE Staff;

Lastly, display the contents of your "Staff" table on hive to check if the data were successfully loaded

hive> SELECT * FROM Staff;

Thanks.

User · Answer

You can load local CSV file to Hive only if    You are doing it from one of the Hive cluster nodes   You installed Hive client on non-cluster node and using hive or beeline for upload

User · Answer

if you have a hive setup you can put the local dataset directly using Hive load command  in hdfs s3    You will need to use  Local  keyword when writing your load command   Syntax for hiveload command  LOAD DATA  LOCAL  INPATH  filepath   OVERWRITE  INTO TABLE tablename  PARTITION  partcol1 val1  partcol2 val2         Refer below link for more detailed information  https   cwiki apache org confluence display Hive LanguageManual 20DML LanguageManualDML-Loadingfilesintotables

User · Answer

For csv file formate data will be in below format    column1    column2   column3   column4    And if we will use field terminated by     then each column will get values like below     column1      column2       column3       column4    also if any of the column value has comma as value then it will not work at all    So the correct way to create a table would be by using OpenCSVSerde  create table tableName  column1 datatype  column2 datatype   column3 datatype   column4 datatype  ROW FORMAT SERDE   org apache hadoop hive serde2 OpenCSVSerde   STORED AS TEXTFILE

User · Answer

You may try this  Following are few examples on how files are generated   Tool -- https   sourceforge net projects csvtohive  source directory   Select a CSV file using Browse and set hadoop root directory ex   user bigdataproject  Tool Generates Hadoop script with all csv files and following is a sample of  generated Hadoop script to insert csv into Hadoop     bin bash -v hadoop fs -put   AllstarFull csv  user bigdataproject AllstarFull csv hive -f   AllstarFull hive  hadoop fs -put   Appearances csv  user bigdataproject Appearances csv hive -f   Appearances hive  hadoop fs -put   AwardsManagers csv  user bigdataproject AwardsManagers csv hive -f   AwardsManagers hive Sample of generated Hive scripts  CREATE DATABASE IF NOT EXISTS lahman  USE lahman  CREATE TABLE AllstarFull  playerID string yearID string gameNum string gameID string teamID string lgID string GP string startingPos string  row format delimited fields terminated by     stored as textfile  LOAD DATA INPATH   user bigdataproject AllstarFull csv  OVERWRITE INTO TABLE AllstarFull  SELECT   FROM AllstarFull    Thanks Vijay

User · Answer

There is another way of enabling this    use hadoop hdfs -copyFromLocal to copy the  csv data file from your local computer to somewhere in HDFS  say   path filename  enter Hive console  run the following script to load from the file to make it as a Hive table  Note that   054  is the ascii code of  comma  in octal number  representing fields delimiter      CREATE EXTERNAL TABLE table name  foo INT  bar STRING   COMMENT  from csv file   ROW FORMAT DELIMITED FIELDS TERMINATED BY   054   STORED AS TEXTFILE  LOCATION   path filename

[sql] Hadoop/Hive : Loading data from .csv on a local machine

Examples related to sql

Examples related to csv

Examples related to hadoop

Examples related to amazon-web-services

Examples related to hive