Spark application bundle collecting invalid spark_event_logs.zip with 22 bytes in size

0 votes
1 view
asked Aug 30, 2017 in Hadoop by admin (4,410 points)
Summary

Symptoms

Spark application bundle is including an invalid spark_event_logs.zip within it. Typically the log status appears like:

$ file spark_event_logs.zip
spark_event_logs.zip: Zip archive data (empty)

There is actually the aggregated event data existing in the Spark History Location (HDFS); configured to /user/spark/applicationHistoryby default.

e.g.
hdfs dfs -ls /user/spark/applicationHistory
Found 3 items
-rwxrwx---   3 test spark      94043 2017-03-02 17:17 /user/spark/applicationHistory/application_1488458658795_0004
-rwxrwx---   3 test spark      81991 2017-03-02 17:20 /user/spark/applicationHistory/application_1488458658795_0008
-rwxrwx---   3 root spark      53984 2017-03-03 01:09 /user/spark/applicationHistory/application_1488458658795_0011_1

Applies To

Spark on YARN
Spark2

Cause

When a Cloudera Manager manages multiple CDH clusters that have independent Spark on YARN services, SparkHistoryServer (SHS) location is always set to one particular host even it's expected to collect from the other SHS.

Instructions

Spark event logs should be collected separately with one of the following 2 ways.

1) Download from HDFS

Run as hdfs or spark user (or principal in a secure cluster)

hdfs dfs -get /user/spark/applicationHistory/<ApplicationId> /tmp/ ; zip /tmp/event_logs.zip /tmp/<ApplicationId>

2) Download via SHS API

Run the following command in CLI. Otherwise access to the URL in your browser.

curl -0 http://<SparkHistoryServer>:18088/api/v1/applications/<ApplicationId>/logs > event_logs.zip

Please log in or register to answer this question.

...