|IMPORTANT: Data will be lost by using -skipTrash. Per the Apache Documentation for the rm command, the -skipTrash option will bypass trash, if enabled, and delete the specified file(s) immediately. This is working as expected.|
If a small amount of data was accidentally deleted with -skipTrash, there is no way to recover the data from HDFS.
If a large amount of data was accidentally deleted with -skipTrash, immediately:
- Stop the HDFS service
- Put the NameNode into safeMode
- Contact Cloudera Support
If shutdown or safemode transition is done long after remove was done, then it is likely the blocks are already removed and there is no way to recover.
In Safemode, to confirm if ANY blocks can really be recovered :
Login to CM - Charts - Chart Builder and run : select pending_deletion_blocks
From Namenode UI, "Number of Blocks Pending Deletion
If this shows 0 blocks/relicas, then this means the blocks are already removed and nothing can be recovered. There is no point in doing anything further.
If HDFS snapshots were taken on the removed folder, then that could be an option for recovery.
If it has been more than 2 hours since the command with -skipTrash was issues, consider the data completely unrecoverable. The longer since the delete was performed, the less chance any data could be recovered. Even if HDFS is immediately stopped, some data will be lost.
When files are deleted with -skipTrash, the blocks associated with the files are queued for deletion from the DataNodes. For a small amount of data, this queue will be processed quickly and the blocks will be deleted. For a large amount of data, it may take a while for the queue to be processed to avoid overloading the DataNodes I/O with delete requests. The amount of time it will take will depend on the size of the cluster and the number of blocks (size of the data) being deleted. Large files with many blocks will be corrupt as the queued deletes are independent of file association.