[null] Handling NULL values in Hive

I am trying to create a table (table 2) in Hive from another table (table 1). I am trying to exclude certain rows with NULL values and tried the following condition.

    insert overwrite table table2 partition (date = '2013-06-01')
    select column1, column 2....
    from table1
    where column1 is not NULL or column1 <> '';

However, when I try this following query with the new table I get 300+ rows with NULL vaues:

    select count(*) from table2 where column1 is NULL;

Could someone point to what is causing these NULL values?

Thank you.

Ravi

This question is related to null hive

The answer is


What is the datatype for column1 in your Hive table? Please note that if your column is STRING it won't be having a NULL value even though your external file does not have any data for that column.


I use below sql to exclude the null string and empty string lines.

select * from table where length(nvl(column1,0))>0

Because, the length of empty string is 0.

select length('');
+-----------+--+
| length()  |
+-----------+--+
| 0         |
+-----------+--+

Try to include length > 0 as well.

column1 is not NULL AND column1 <> '' AND length(column1) > 0 

Try using isnull(a), isnotnull(a), nvl(), etc. On some versions(potentially in conjunction with the server settings - atleast with the one I am working on) of hive the 'IS NULL' and 'IS NOT NULL' syntax does not execute the logic when it is compiled. Check here for more information.


To check for the NULL data for column1 and consider your datatype of it is String, you could use below command :

select * from tbl_name where column1 is null or column1 <> '';