Creating a Hive Multi Character Delimitered Table

0 votes
1 view
asked Aug 21, 2017 in Hadoop by admin (4,410 points)
SummaryHow to load the data into a Hive table with a multi-byte delimiter

 

Applies To
  • Hive
  • DDL
Symptoms

 

Cause
Instructions

"MultiDelimitSerDe" is available since CDH 5.1.4

Example:

CREATE TABLE test_multi 
(a string, b string, c string, d string, e string, f string) 
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' 
WITH SERDEPROPERTIES (
    "field.delim"="~|`",
    "collection.delim"=":",
    "mapkey.delim"="@"
);

After loading the data, the table looks similar to this:

hive> LOAD DATA LOCAL INPATH '/tmp/data-multi.txt' OVERWRITE INTO TABLE test_multi;

hive> SELECT * FROM test_multi;
+---------------+---------------+---------------+---------------+---------------+---------------+--+
| test_multi.a  | test_multi.b  | test_multi.c  | test_multi.d  | test_multi.e  | test_multi.f  |
+---------------+---------------+---------------+---------------+---------------+---------------+--+
| DTL           | 900           | 3103          | 2015          | DK            | 1002003829    |
| DTL           | 900           | 3103          | 2015          | CR            | 1002005103    |
| DTL           | 900           | 3103          | 2015          | DR            | 1002003829    |
| DTL           | 900           | 3103          | 2015          | DR            | 1002003829    |
| DTL           | 900           | 3103          | 2015          | CR            | 1002003829    |
+---------------+---------------+---------------+---------------+---------------+---------------+--+

This is the preferred way of loading multi-character delimited data into Hive over the use of "org.apache.hadoop.hive.serde2.RegexSerDe", as it is simpler and faster.

Please log in or register to answer this question.

...