Sqoop Teradata import truncates timestamp microseconds information

0 votes
2 views
asked Aug 30, 2017 in Hadoop by admin (4,410 points)
Summary

Symptoms

The following test case validates the issue:

  1. Create a table in Teradata:
    ​CREATE TABLE vmtest.test (a integer, b timestamp(6) 
    FORMAT 'yyyy-mm-ddbhh:mi:ss.s(6)') PRIMARY INDEX (a);
    
    INSERT INTO vmtest.test VALUES (1, '2016-04-05 11:27:24.699022');
  2. And sqoop import command:
    sqoop import --connect jdbc:teradata://<teradata-host>/database=vmtest \
        --username dbc --password dbc --target-dir /tmp/test --delete-target-dir \
        --as-textfile --fields-terminated-by "," --table test
  3. data stored in HDFS as below:
    [cloudera@quickstart ~]$ hadoop fs -cat /tmp/test/part*
    1,2016-04-05 11:27:24.699

Notice the microseconds part truncated from 699022 to 699

Applies To
Cause

This is caused by a bug in TDCH (TeraData Connector for Hadoop) from Teradata, which is used by Cloudera Connector Powered by Teradata. 

Instructions

The workaround is to make sure that the timestamp value is in String format before passing it to Sqoop, so that no conversion will happen. Below Sqoop command is an example:

sqoop import --connect jdbc:teradata://<teradata-host>/database=vmtest \
    --username dbc --password dbc --target-dir /tmp/test \
    --delete-target-dir --as-textfile --fields-terminated-by "," \
    --query "SELECT a, cast(cast(b as format 'YYYY-MM-DD HH:MI:SS.s(6)') as char(40)) from test WHERE \$CONDITIONS" \
    --split-by a

After import, data is stored in HDFS correctly:

[cloudera@quickstart ~]$ hadoop fs -cat /tmp/test/part*
1,2016-04-0511:27:24.699022

Please log in or register to answer this question.

...