TSB 2017-128: Silent data loss in Apache HBase replication

0 votes
asked Aug 20, 2017 in Hadoop by admin (4,410 points)
SummaryClusters where HBase replication is enabled with slow or unreliable network links may fail to replicate some WALs resulting in inconsistencies between cluster.

In deployments with slow or unreliable network links, HBase’s cross-datacenter replication code may believe it has reached the end of a write-ahead-log prior to successfully parsing the entire file. The remaining content is ignored and never sent to destination clusters. Eventually, the write-ahead-logs are cleaned up as a normal part of HBase operations, preventing later manual recovery.

The destination cluster will fail to receive all updates, causing inconsistencies between clusters replicating data.

The issue is fixed by HBASE-15984, which detects when a parsing error has occurred prior to reaching the end of a write-ahead-log file and subsequently retries.

Applies To

All CDH 5 releases prior to CDH 5.8 

To obtain a fix for this issue upgrade to CDH 5.8.0 or higher.

To obtain a version that will include metrics on how often the failures occur, upgrade to one of the following releases:
  • CDH 5.8.2 or higher
  • CDH 5.9.0 or higher

Please log in or register to answer this question.