Spark application bundle collecting invalid with 22 bytes in size

0 votes
asked Aug 30, 2017 in Hadoop by admin (4,410 points)


Spark application bundle is including an invalid within it. Typically the log status appears like:

$ file Zip archive data (empty)

There is actually the aggregated event data existing in the Spark History Location (HDFS); configured to /user/spark/applicationHistoryby default.

hdfs dfs -ls /user/spark/applicationHistory
Found 3 items
-rwxrwx---   3 test spark      94043 2017-03-02 17:17 /user/spark/applicationHistory/application_1488458658795_0004
-rwxrwx---   3 test spark      81991 2017-03-02 17:20 /user/spark/applicationHistory/application_1488458658795_0008
-rwxrwx---   3 root spark      53984 2017-03-03 01:09 /user/spark/applicationHistory/application_1488458658795_0011_1

Applies To

Spark on YARN


When a Cloudera Manager manages multiple CDH clusters that have independent Spark on YARN services, SparkHistoryServer (SHS) location is always set to one particular host even it's expected to collect from the other SHS.


Spark event logs should be collected separately with one of the following 2 ways.

1) Download from HDFS

Run as hdfs or spark user (or principal in a secure cluster)

hdfs dfs -get /user/spark/applicationHistory/<ApplicationId> /tmp/ ; zip /tmp/ /tmp/<ApplicationId>

2) Download via SHS API

Run the following command in CLI. Otherwise access to the URL in your browser.

curl -0 http://<SparkHistoryServer>:18088/api/v1/applications/<ApplicationId>/logs >

Please log in or register to answer this question.