[hadoop] Does Hive have a String split function?

I am looking for a in-built String split function in Hive? e.g. if String is:

A|B|C|D|E

Then I want to have a function like:

array<string> split(string input, char delimiter)

So that I get back:

[A,B,C,D,E]

Does such a in-built split function exist in Hive.

I can only see regexp_extract and regexp_replace. I would love to see a indexOf() and split() string functions.

This question is related to hadoop hive

The answer is


There does exist a split function based on regular expressions. It's not listed in the tutorial, but it is listed on the language manual on the wiki:

split(string str, string pat)
   Split str around pat (pat is a regular expression) 

In your case, the delimiter "|" has a special meaning as a regular expression, so it should be referred to as "\\|".


Another interesting usecase for split in Hive is when, for example, a column ipname in the table has a value "abc11.def.ghft.com" and you want to pull "abc11" out:

SELECT split(ipname,'[\.]')[0] FROM tablename;

Just a clarification on the answer given by Bkkbrad.

I tried this suggestion and it did not work for me.

For example,

split('aa|bb','\\|')

produced:

["","a","a","|","b","b",""]

But,

split('aa|bb','[|]')

produced the desired result:

["aa","bb"]

Including the metacharacter '|' inside the square brackets causes it to be interpreted literally, as intended, rather than as a metacharacter.

For elaboration of this behaviour of regexp, see: http://www.regular-expressions.info/charclass.html