Spark application bundle is including an invalid spark_event_logs.zip within it. Typically the log status appears like:$ file spark_event_logs.zipspark_event_logs.zip: Zip archive data (empty)There is actually the aggregated event data existing in the Spark History Location (HDFS); configured to /user/spark/applicationHistoryby default.e.g.hdfs dfs -ls /user/spark/applicationHistoryFound 3 items-rwxrwx--- 3 test spark 94043 2017-03-02 17:17 /user/spark/applicationHistory/application_1488458658795_0004-rwxrwx--- 3 test spark 81991 2017-03-02 17:20 /user/spark/applicationHistory/application_1488458658795_0008-rwxrwx--- 3 root spark 53984 2017-03-03 01:09 /user/spark/applicationHistory/application_1488458658795_0011_1
Spark on YARNSpark2
When a Cloudera Manager manages multiple CDH clusters that have independent Spark on YARN services, SparkHistoryServer (SHS) location is always set to one particular host even it's expected to collect from the other SHS.
Spark event logs should be collected separately with one of the following 2 ways.1) Download from HDFSRun as hdfs or spark user (or principal in a secure cluster)hdfs dfs -get /user/spark/applicationHistory/<ApplicationId> /tmp/ ; zip /tmp/event_logs.zip /tmp/<ApplicationId>2) Download via SHS APIRun the following command in CLI. Otherwise access to the URL in your browser.curl -0 http://<SparkHistoryServer>:18088/api/v1/applications/<ApplicationId>/logs > event_logs.zip