How to debug failing/stuck Spark applications in CDSW

0 votes
3 views
asked Aug 19, 2017 in Hadoop by admin (4,410 points)
SummarySpark applications in CDSW sessions are running with ERROR level logging by default which makes it hard to understand why did an application get stuck or what was the context of a failure. This article describes how turn on lower level logging for Spark applications.
Applies To
  • Spark applications in CDSW
Symptoms

There are multiple scenarios where we need to review the context of an ERROR message or understand why doesn't an application starting or getting stuck.

 

Cause
Instructions

Cloudera Data Science Workbench allows you to update Spark‚Äôs internal logging configuration on a per-project basis. Spark 2 uses Apache Log4j, which can be configured through a properties file. By default, a log4j.properties file found in the root of your project will be appended to the existing Spark logging properties for every session and job.

  1. Create a new file log4j.properties in the root of your project
  2. Add the one of the following configurations to set the logging level for Spark jobs.
    shell.log.level=INFO

    PySpark logging levels should be set as follows:
    log4j.logger.org.apache.spark.api.python.PythonGatewayServer=INFO

    And Scala logging levels should be set as:
    log4j.logger.org.apache.spark.repl.Main=INFO
  3. Start a new session and reproduce the issue
Note: To specify a custom location, set the environmental variable LOG4J_CONFIG to the file location relative to your project.

Please log in or register to answer this question.

...