Impala parses DECIMAL values as null if extra digits included, Hive truncates extra digits

0 votes
3 views
asked Aug 19, 2017 in Hadoop by admin (4,410 points)
Summary
Symptoms

When querying a table in Impala, some null values are returned with warnings, whereas the same query in Hive produces numbers instead of nulls.

Example Impala warning:

WARNINGS: Error converting column: 278 to DECIMAL(10, 2) 
Error parsing row: file: hdfs://example/user/hive/warehouse/db1.db/table1/delimited001.txt, before offset: 18882

Applies To

Impala, Hive
 

Cause

If you inspect the data files, you'll see the problematic rows contain more decimal places than the column is defined to use.

Impala is stricter about number formatting than Hive. If you have a column defined as DECIMAL(10,2) and the text for that column is "0.1234", Hive will parse the number as 0.12, whereas Impala will return a null value and issue a warning.

This is expected behavior. You can see it documented at the following page, under "Schema evolution considerations" for the DECIMAL type:

https://www.cloudera.com/documentation/enterprise/5-10-x/topics/impala_decimal.html

The reason for this behavior in Impala is that the input value cannot be accurately represented with the given type declaration. Typically for DECIMAL, precision is desired. Therefore, Impala returns a null with a warning rather than losing the full precision of the data from the source.

Hive is more lenient about accuracy when parsing DECIMAL columns, and truncates extra digits. 

Instructions

You can avoid this by making sure the number formatting in your files matches your DECIMAL column definitions. 

Please log in or register to answer this question.

...