Where does Hive store files in HDFS

Question

I d like to know how to find the mapping between Hive tables and the actual HDFS files  or rather  directories  that they represent  I need to access the table files directly   Where does Hive store its files in HDFS

User · Answer

Another way to check where a specific table is stored would be execute this query on the hive interactive interface:

show create table table_name;

where table_name is the name of the subject table.

An example for the above query on 'customers' table would be something like this:

CREATE TABLE `customers`(
  `id` string, 
  `name` string)
COMMENT 'Imported by sqoop on 2016/03/01 13:01:49'
ROW FORMAT DELIMITED 
  FIELDS TERMINATED BY ',' 
  LINES TERMINATED BY '\n' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'hdfs://quickstart.cloudera:8020/user/hive/warehouse/
   sqoop_workspace.db/customers'
TBLPROPERTIES (
  'COLUMN_STATS_ACCURATE'='true', 
  'numFiles'='4', 
  'totalSize'='77', 
  'transient_lastDdlTime'='1456866115')

LOCATION in the example above is where you should focus on. That is your hdfs location for hive warehouse.

Don't forget to like if you like this solution. Cheers!

User · Answer

If you look at the hive-site xml file you will see something like this   lt property gt      lt name gt hive metastore warehouse dir lt  name gt      lt value gt  usr hive warehouse  lt  value gt      lt description gt location of the warehouse directory lt  description gt    lt  property gt     usr hive warehouse is the default location for all managed tables  External tables may be stored at a different location    describe formatted  lt table name gt  is the hive shell command which can be use more generally to find the location of data pertaining to a hive table

User · Answer

The location they are stored on the HDFS is fairly easy to figure out once you know where to look      If you go to http   NAMENODE MACHINE NAME 50070  in your browser it should take you to a page with a Browse the filesystem link   In the  HIVE HOME conf directory there is the hive-default xml and or hive-site xml which has the hive metastore warehouse dir property  That value is where you will want to navigate to after clicking the Browse the filesystem link   In mine  it s  usr hive warehouse  Once I navigate to that location  I see the names of my tables  Clicking on a table name  which is just a folder  will then expose the partitions of the table  In my case  I currently only have it partitioned on date  When I click on the folder at this level  I will then see files  more partitioning will have more levels   These files are where the data is actually stored on the HDFS   I have not attempted to access these files directly  I m assuming it can be done  I would take GREAT care if you are thinking about editing them     For me - I d figure out a way to do what I need to without direct access to the Hive data on the disk  If you need access to raw data  you can use a Hive query and output the result to a file  These will have the exact same structure  divider between columns  ect  as the files on the HDFS  I do queries like this all the time and convert them to CSVs   The section about how to write data from queries to disk is https   cwiki apache org confluence display Hive LanguageManual DML LanguageManualDML-Writingdataintothefilesystemfromqueries  UPDATE  Since Hadoop 3 0 0 - Alpha 1 there is a change in the default port numbers   NAMENODE MACHINE NAME 50070 changes to NAMENODE MACHINE NAME 9870   Use the latter if you are running on Hadoop 3 x   The full list of port changes are described in HDFS-9427

User · Answer

In Hive terminal type   hive gt  set hive metastore warehouse dir     it will print the path

User · Answer

In Hive  tables are actually stored in a few places  Specifically  if you use partitions  which you should  if your tables are very large or growing  then each partition can have its own storage   To show the default location where table data or partitions will be created if you create them through default HIVE commands   insert overwrite     partition     and such    describe formatted dbname tablename   To show the actual location of a particular partition within a HIVE table  instead do this   describe formatted dbname tablename partition  name value    If you look in your filesystem where a table  should  live  and you find no files there  it s very likely that the table is created  usually incrementally  by creating a new partition and pointing that partition at some other location  This is a great way of building tables from things like daily imports from third parties and such  which avoids having to copy the files around or storing them more than once in different places

User · Answer

It s also very possible that typing show create table  lt table name gt  in the hive cli will give you the exact location of your hive table

User · Answer

Hive tables may not necessarily be stored in a warehouse  since you can create tables located anywhere on the HDFS    You should use DESCRIBE FORMATTED  lt table name gt  command   hive -S -e  describe formatted  lt table name gt       grep  Location    awk    print  NF      Please note that partitions may be stored in different places and to get the location of the alpha foo beta bar partition you d have to add partition alpha  foo  beta  bar   after  lt table name gt

User · Answer

In sandbox   you need to go for  apps hive warehouse  and normal cluster  user hive warehouse

User · Answer

Hive database is nothing but directories within HDFS with  db extensions   So  from a Unix or Linux host which is connected to HDFS  search by following based on type of HDFS distribution   hdfs dfs -ls -R   2 gt  dev null grep db or hadoop fs -ls -R   2 gt  dev null grep db  You will see full path of  db database directories  All tables will be residing under respective  db database directories

User · Answer

describe formatted  lt table name gt   inside hive shell    Notice the  Location  value that shows the location of the table

User · Answer

Summarize few points posted earlier  in hive-site xml  property hive metastore warehouse dir specifies where the files located under hadoop HDFS   lt property gt      lt name gt hive metastore warehouse dir lt  name gt      lt value gt  user hive warehouse lt  value gt   lt  property gt    To view files  use this command   hadoop fs -ls  user hive warehouse   or   http   localhost 50070 Utilities  gt  Browse the file system or http   localhost 50070 explorer html     tested under hadoop-2 7 3  hive-2 1 1

User · Answer

Hive tables are stored in the Hive warehouse directory  By default  MapR configures the Hive warehouse directory to be  user hive warehouse under the root volume  This default is defined in the  HIVE HOME conf hive-default xml

[hadoop] Where does Hive store files in HDFS?

Examples related to hadoop

Examples related to hive

Examples related to hdfs