Cannot Drop a Table in Hive | Table is Locked by an Obsolete Query

0 votes
0 views
asked Aug 30, 2017 in Hadoop by admin (4,410 points)
SummaryA Hive table cannot be dropped or a Hive query hangs after being locked by an obsolete query. 
The solution is to remove the erroneous ZNode in Zookeeper.

Symptoms

DROP TABLE, TRUNCATE TABLE, INSERT, or SELECT query appears to become "stuck" or "hangs."  After some time, the query fails with the following error message:

2017-02-06 08:00:53,152 ERROR ZooKeeperHiveLockManager: [HiveServer2-Background-Pool: Thread-48883]: Unable to acquire IMPLICIT, EXCLUSIVE lock <database>@<table> after 100 attempts.
2017-02-06 08:00:53,161 ERROR org.apache.hadoop.hive.ql.Driver: [HiveServer2-Background-Pool: Thread-48883]: FAILED: Error in acquiring locks: Locks on the underlying objects cannot be acquired. retry after some time
org.apache.hadoop.hive.ql.lockmgr.LockException: Locks on the underlying objects cannot be acquired. retry after some time
    at org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager.acquireLocks(DummyTxnManager.java:164)
    at org.apache.hadoop.hive.ql.Driver.acquireLocksAndOpenTxn(Driver.java:1122)
    at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1407)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1203)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1198)
    at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:187)
    at org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:76)
    at org.apache.hive.service.cli.operation.SQLOperation$2$1.run(SQLOperation.java:241)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1714)
    at org.apache.hive.service.cli.operation.SQLOperation$2.run(SQLOperation.java:254)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Applies To
  • Hive
  • ZooKeeper
Cause

The target table is locked by a query which failed during execution that did not subsequently cleanup the lock.

Instructions

Hive implements a global table locking mechanism between all instances of HiveServer2.  The most common locking mechanism depends on Apache ZooKeeper.

The following lock modes are defined in Hive:

  • Implicit
  • Explicit

Implicit locks are automatically acquired by a query depending on the target table and the operation being requests (SELECT, INSERT, etc).  These locks are normally released as part of the cleanup process of every query.

Explicit locks are created manually by a user.  These locks must be manually removed by a user.

HiveServer2 is responsible for creating and removing these locks.  In certain Hive failure scenarios, these locks may not be removed after a query has completed or a user simply may have forgotten to remove their explicit lock.  If a lock is left on a table, certain queries will block and eventually fail waiting for the lock to be removed.  It will be required to manually remove these locks for operation to continue.

Implicit Locks

When global locks are enabled, Hive uses ZooKeeper ephemeral nodes to represent implicit locks.  These nodes are normally removed after a query has completed.  These nodes will also be removed if the HiveServer2 which created the nodes fails or is restarted.  In a HiveServer2 failure/restart scenario, these nodes are removed 60 seconds after the HiveServer2 looses connectivity with ZooKeeper.

Explicit Locks

When global locks are enabled, Hive uses ZooKeeper persistent nodes to represent explicit locks.  These nodes can only be manually removed from Zookeeper directly or using Hive's UNLOCK statement.  They will not be removed if the HiveServer2 instance that created the node fails or is restarted.

Lock Timeouts

When a lock exists on a table, any attempts to issue a DELETE or INSERT statement against the table will block waiting for the lock to be removed.  In certain other scenarios, SELECT statements may block as well.  If the lock is not removed, the query will fail after a specified timeout.  This timeout is controlled by two configurations: hive.lock.numretries and hive.lock.sleep.between.retries.  If the environment has long duration queries, these configurations must be increased or queries that are blocked by legitimate locks will fail while waiting.

Manually Removing a Lock From ZooKeeper

  1. Ensure no active queries are running against the target table
  2. Launch the zookeeper-client on one of ZooKeeper nodes
  3. Use "ls" to browse into the hive zookeeper directories, ex. "hive_zookeeper_namespace_hive"
  4. Once the desired locked table is found, Run the following command in the zookeeper-client shell:
    > ls /
    [zookeeper, hive_zookeeper_namespace_hive]
    
    > rmr /hive_zookeeper_namespace_hive/<DATABASE>/<BROKEN_TABLE>
  5. Try the failing query again

Automatically Removing a Lock From ZooKeeper

This only applies to implicit locks.  This requires a service interruption during restart.

  1. Restart all instances of HiveServer2
  2. Wait 60 seconds
  3. Try the failing query again

Please log in or register to answer this question.

...