How To Submit Spark Job To Yarn Queue Using Oozie Spark Action

0 votes
19 views
asked Aug 30, 2017 in Hadoop by admin (4,410 points)
Summary
Applies To
  • Oozie
  • Spark
  • YARN
Symptoms

When launching Spark jobs from Oozie, it may be desirable to submit the job to a specific Yarn resource pool instead of the default. We might be tempted to try the following, which will ultimately fail:

<action name='spark-node'>
<spark
xmlns="uri:oozie:spark-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>oozie.launcher.oozie.libpath</name>
<value>${JARDIR}</value>
</property>
</configuration>
<master>${mode}</master>
<name>Spark-Job-Pi</name>
<class>org.apache.spark.examples.SparkPi</class>
<jar>${APPJAR}</jar>
<job-xml>awesome_job.xml</job-xml>
</spark>
<ok to="end-node"/>
<error to="fail-node"/>
</action>

This is awesome_job.xml:
<configuration>
<property>
<name>spark.yarn.queue</name>
<value>my_queue</value>
</property>
<property>
<name>mapreduce.job.queuename</name>
<value>my_queue</value>
</property>
<property>
<name>oozie.launcher.mapreduce.job.queuename</name>
<value>my_queue</value>
</property>
<property>
<name>oozie.launcher.spark.yarn.queue</name>
<value>my_queue</value>
</property>
</configuration>

Cause
Instructions

Assume the 'spark-submit' command was being used to submit the job. The queue would have been specified as an option[1], for example:
$ ./bin/spark-submit --class org.apache.spark.examples.SparkPi \
--queue thequeue \
...
lib/spark-examples*.jar \
10

Similarly, we can pass the option, --queue, in the Oozie Spark Action using the <spark-opts> element.[2]. For which the documentation states:

"The spark-opts element if present, contains a list of spark options that can be passed to spark driver. Spark configuration options can be passed by specifying '--conf key=value' here, or from oozie.service.SparkConfigurationService.spark.configurations in oozie-site.xml. The spark-opts configs have priority."

Note the use of the 'spark-opts' element in the sample Oozie Spark Action workflow that works below:

<workflow-app name="WF-Spark"
xmlns="uri:oozie:workflow:0.1">
<start to='spark-node'/>
<action name='spark-node'>
<spark
xmlns="uri:oozie:spark-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>oozie.launcher.oozie.libpath</name>
<value>${JARDIR}</value>
</property>
</configuration>
<master>${mode}</master>
<name>Spark-Job-Pi</name>
<class>org.apache.spark.examples.SparkPi</class>
<jar>${APPJAR}</jar>
<spark-opts> --queue spark-queue </spark-opts>
</spark>
<ok to="end-node"/>
<error to="fail-node"/>
</action>
<kill name="fail-node">
<message>Spark Job Failed!</message>
</kill>
<end name="end-node"/>
</workflow-app>




[1] - Launching Spark on YARN - http://spark.apache.org/docs/latest/running-on-yarn.html#launching-spark-on-yarn
[2] - Oozie Spark Action Extension - https://oozie.apache.org/docs/4.2.0/DG_SparkActionExtension.html

Please log in or register to answer this question.

...