The issue is caused by HBASE-15984, Handle premature EOF treatment of WALs in replication. In some particular situations, the Replication code believes it has reached the EOF for a WAL prior successfully parsing all bytes known to exist in a cleanly closed WAL file.
Consistently this failure happens due to an InvalidProtobufException after some number of seeks during attempts to tail the in-progress RegionServer WAL. As a fix, HBASE-15984 treats cleanly closed files differently than other execution paths. If an EOF is detected due
to parsing or other errors while there are still unparsed bytes before the end-of-file trailer, we now reset the WAL to the very beginning and attempt a clean read-through. A single reset should be sufficient to work around the observed replication failure. However, the above change will retry a given WAL file
indefinitely. On each such attempt, a log message similar to below will be emitted at the WARN level. If after applying HBASE-15984 these repeated WARNs are observed and repeated multiple times for the same WAL file then Cloudera Support should be engaged as this would indicate that there is an issue
with the given WAL file.
2017-02-28 07:13:06,194 WARN org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Processing end of WAL file
'hdfs://nameservice1/hbase/WALs/host1.company.org,60020,1487488335489/host1.company.org%2C60020%2C1487488335489.null0.1488287585184'. At position 5406208, which is too far away from reported file length 5932585.
Restarting WAL reading (see HBASE-15983 for details). stats: Total replicated edits: 1213042, current progress: walGroup [host1.company.org%2C60020%2C1487488335489.null0]: currently replicating
from: hdfs://nameservice1/hbase/WALs/host1.company.org,60020,1487488335489/host1.company.org%2C60020%2C1487488335489.null0.1488287585184 at position: 5406208