Spark Job failure due to port binding error after 16 retries

0 votes
44 views
asked Nov 6, 2017 in Spark by admin (4,410 points)
17/11/03 06:50:34 INFO util.Utils: Successfully started service 'sparkDriver' on port 48108.

17/11/03 06:50:34 INFO slf4j.Slf4jLogger: Slf4jLogger started

17/11/03 06:50:34 INFO Remoting: Starting remoting

17/11/03 06:50:34 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@10.191.151.77:36506]

17/11/03 06:50:34 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriverActorSystem@10.191.151.77:36506]

17/11/03 06:50:34 INFO util.Utils: Successfully started service 'sparkDriverActorSystem' on port 36506.

17/11/03 06:50:34 INFO spark.SparkEnv: Registering MapOutputTracker

17/11/03 06:50:34 INFO spark.SparkEnv: Registering BlockManagerMaster

17/11/03 06:50:34 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-265b0a2b-e376-45fc-9888-05bbb2576f49

17/11/03 06:50:35 INFO storage.MemoryStore: MemoryStore started with capacity 8.3 GB

17/11/03 06:50:35 INFO spark.SparkEnv: Registering OutputCommitCoordinator

17/11/03 06:50:35 WARN util.Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.

17/11/03 06:50:35 WARN util.Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042.

17/11/03 06:50:35 WARN util.Utils: Service 'SparkUI' could not bind on port 4042. Attempting port 4043.

17/11/03 06:50:35 WARN util.Utils: Service 'SparkUI' could not bind on port 4043. Attempting port 4044.

17/11/03 06:50:35 WARN util.Utils: Service 'SparkUI' could not bind on port 4044. Attempting port 4045.

17/11/03 06:50:35 WARN util.Utils: Service 'SparkUI' could not bind on port 4045. Attempting port 4046.

17/11/03 06:50:35 WARN util.Utils: Service 'SparkUI' could not bind on port 4046. Attempting port 4047.

17/11/03 06:50:35 WARN util.Utils: Service 'SparkUI' could not bind on port 4047. Attempting port 4048.

17/11/03 06:50:35 WARN util.Utils: Service 'SparkUI' could not bind on port 4048. Attempting port 4049.

17/11/03 06:50:35 WARN util.Utils: Service 'SparkUI' could not bind on port 4049. Attempting port 4050.

17/11/03 06:50:35 WARN util.Utils: Service 'SparkUI' could not bind on port 4050. Attempting port 4051.

17/11/03 06:50:36 WARN util.Utils: Service 'SparkUI' could not bind on port 4051. Attempting port 4052.

17/11/03 06:50:36 WARN util.Utils: Service 'SparkUI' could not bind on port 4052. Attempting port 4053.

17/11/03 06:50:36 WARN util.Utils: Service 'SparkUI' could not bind on port 4053. Attempting port 4054.

17/11/03 06:50:36 WARN util.Utils: Service 'SparkUI' could not bind on port 4054. Attempting port 4055.

17/11/03 06:50:36 WARN util.Utils: Service 'SparkUI' could not bind on port 4055. Attempting port 4056.

17/11/03 06:50:36 ERROR ui.SparkUI: Failed to bind SparkUI

java.net.BindException: Address already in use: Service 'SparkUI' failed after 16 retries! Consider explicitly setting the appropriate port for the service 'SparkUI' (for example spark.ui.port for SparkUI) to an available port or increasing spark.port.maxRetries.

        at sun.nio.ch.Net.bind0(Native Method)

        at sun.nio.ch.Net.bind(Net.java:437)

        at sun.nio.ch.Net.bind(Net.java:429)

        at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)

        at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)

        at org.spark-project.jetty.server.nio.SelectChannelConnector.open(SelectChannelConnector.java:187)

        at org.spark-project.jetty.server.AbstractConnector.doStart(AbstractConnector.java:316)

        at org.spark-project.jetty.server.nio.SelectChannelConnector.doStart(SelectChannelConnector.java:265)

        at org.spark-project.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)

        at org.spark-project.jetty.server.Server.doStart(Server.java:293)

        at org.spark-project.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)

        at org.apache.spark.ui.JettyUtils$.org$apache$spark$ui$JettyUtils$$connect$1(JettyUtils.scala:283)

        at org.apache.spark.ui.JettyUtils$$anonfun$5.apply(JettyUtils.scala:293)

        at org.apache.spark.ui.JettyUtils$$anonfun$5.apply(JettyUtils.scala:293)

        at org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1989)

        at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)

        at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1980)

        at org.apache.spark.ui.JettyUtils$.startJettyServer(JettyUtils.scala:293)

        at org.apache.spark.ui.WebUI.bind(WebUI.scala:137)

        at org.apache.spark.SparkContext$$anonfun$14.apply(SparkContext.scala:492)

        at org.apache.spark.SparkContext$$anonfun$14.apply(SparkContext.scala:492)

        at scala.Option.foreach(Option.scala:236)

        at org.apache.spark.SparkContext.<init>(SparkContext.scala:492)

        at people$.main(people.scala:57)

        at people.main(people.scala)

        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

        at java.lang.reflect.Method.invoke(Method.java:497)

        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)

        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)

        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)

        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)

        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

17/11/03 06:50:36 INFO storage.DiskBlockManager: Shutdown hook called

17/11/03 06:50:36 INFO util.ShutdownHookManager: Shutdown hook called

17/11/03 06:50:36 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-52298fed-c32d-453a-92c7-1076f35e914f

17/11/03 06:50:36 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-52298fed-c32d-453a-92c7-1076f35e914f/userFiles-d6e07df0-4134-42e8-9766-24bbb79a55f9

SPARK JOB DONE

1 Answer

0 votes
answered Nov 6, 2017 by admin (4,410 points)

The error mentioned is spark binding error for the job running in Client mode.

Spark job in client mode tries to establish connection with a dedicated port and if its not able to establish connection then it retries for 16 times before it fails.

Ideally the job establishes its connection but during contention having large set of spark jobs running the other has to wait and after 16 retries it fails.

Suggestion : For debugging in DEV use spark submit use client mode , For all schedule / batch execution run the spark submit in Cluster mode.

Do NOT to run spark-driver on edge node. This can be achieved using --deploy-mode cluster parameter in spark-submit. 
Why? To avoid race condition caused by parallel job triggered on same time while binding http port for spark UI. Also, it helps to distribute the workload from edge node to other nodes on the cluster. 

Configuration to disable spark UI: --conf spark.ui.enabled=false 
[ This is on discretion of application team based on need of Spark UI. When application is running as scheduled job usually no active monitoring or inspection is done, hence spark.ui is not quite needed. By disabling the UI, it will not be available for the duration of job execution, however will be available post completion on history server. All the logs can be extracted using yarn logs -applicationId <application ID> ] 

Example: spark-submit --conf spark.ui.enabled=false --class com.ms.bi.hadoop.samples.SparkApp --master yarn --deploy-mode cluster --num-executors 3 --driver-memory 512m --executor-memory 512m --executor-cores 1 <pathname/filename>

...