When a Hive session is initiated, and a query is submitted to the Spark processing engine, Hive maintains one or more Spark Executors on the cluster until the session is terminated. The initial setup of the Spark processing engine is time intensive. To avoid the overhead of having to create a new Spark processing engine for each query submitted, Hive maintains a Spark Application Master (YARN Spark Driver) and one or more Spark Executors for each Hive session. The trade-off however is that the Spark components will consume resources on YARN even though they may be in an idle phase, between queries, for long periods of time.
Until HIVE-14162 is addressed, users must terminate their Hive sessions to return the Spark resources to the YARN cluster or wait for Hive session timeouts.
These options include:
- Wait for Hive session to be terminated by HiveServer2 session timeout
- Explicitly terminate beeline shell once work has been completed
- Explicitly set Hive session processing engine back to MapReduce
Note: Since a user cannot explicitly terminate their shell with the Hue Hive Editor, only options #1 and #3 are effective for Hue users.