Configuring Spark with Python 2.7 | Spark | Python 2.7

0 votes
asked Aug 28, 2017 in Hadoop by admin (4,410 points)
SummaryHow to configure Spark with alternate version of Python (Python 2.7)


RedHat standard Python 2.6 executable and modules are used by default with Spark Python applications.

Applies To
  • CDH5
  • Spark
CauseRedHat 6.x systems require Python 2.6 for yum and potentially other system functions and therefore must be installed.

Cloudera supports CDH 5 with Spark using Python 2.7 as an alternative to the OS default of Python 2.6.

NOTE:  You must keep Python 2.6 installed as a RedHat dependency (needed for Yum on RedHat 6.x).

To install Python 2.7:

  1. Download Python. For example:

  1. Build and install on all worker (YARN NodeManager) and gateway nodes in the cluster:
‚Äčtar xzvf Python-2.7.tgz
cd Python-2.7
./configure --prefix=/opt/python
sudo make install

            The python2.7 binary will be installed to /opt/python/bin/python2.7, please change the Make file configuration if you want to install python to somewhere else.

  1.  There are different settings for client and cluster mode:
    For client mode, such as pyspark shell:  Configure the environment variable PYSPARK_PYTHON=/path/to/python2.7 in "Spark Client Advanced Configuration Snippet (Safety Valve) for spark-conf/". 

    For cluster mode, such as spark-submit: Configure the environment variables       spark.yarn.appMasterEnv.PYSPARK_PYTHON=/path/to/python2.7
     in "Spark Client Advanced Configuration Snippet (Safety Valve) for spark-conf/spark-defaults.conf" .   

  2.  Deploy client configurations.

Please log in or register to answer this question.