Add a column in a table in HIVE QL

Question

I m writing a code in HIVE to create a table consisting of 1300 rows and 6 columns   create table test1 as SELECT cd screen function       SUM access count  AS max count       MIN response time min  as response time min       AVG response time avg  as response time avg       MAX response time max  as response time max       SUM response time tot  as response time tot       COUNT    as row count      FROM sheet WHERE  ts update BETWEEN unix timestamp  2012-11-01 00 00 00   AND       unix timestamp  2012-11-30 00 00 00   and cd office    016       GROUP BY cd screen function ORDER BY max count DESC  cd screen function    Now I want to add another column as access count1 which consists one unique value for all 1300 rows and value will be sum max count   max count is a column in my existing table  How I can do that  I am trying to alter the table by this code ALTER TABLE test1 ADD COLUMNS  access count1 int  set default sum max count

User · Answer

You cannot add a column with a default value in Hive. You have the right syntax for adding the column ALTER TABLE test1 ADD COLUMNS (access_count1 int);, you just need to get rid of default sum(max_count). No changes to that files backing your table will happen as a result of adding the column. Hive handles the "missing" data by interpreting NULL as the value for every cell in that column.

So now your have the problem of needing to populate the column. Unfortunately in Hive you essentially need to rewrite the whole table, this time with the column populated. It may be easier to rerun your original query with the new column. Or you could add the column to the table you have now, then select all of its columns plus value for the new column.

You also have the option to always COALESCE the column to your desired default and leave it NULL for now. This option fails when you want NULL to have a meaning distinct from your desired default. It also requires you to depend on always remembering to COALESCE.

If you are very confident in your abilities to deal with the files backing Hive, you could also directly alter them to add your default. In general I would recommend against this because most of the time it will be slower and more dangerous. There might be some case where it makes sense though, so I've included this option for completeness.

[hadoop] Add a column in a table in HIVE QL

Examples related to hadoop

Examples related to hive

Examples related to hiveql