Cloudera supports CDH 5 with Spark using Python 2.7 as an alternative to the OS default of Python 2.6.
NOTE: You must keep Python 2.6 installed as a RedHat dependency (needed for Yum on RedHat 6.x).
To install Python 2.7:
- Download Python. For example:
- Build and install on all worker (YARN NodeManager) and gateway nodes in the cluster:
tar xzvf Python-2.7.tgz
sudo make install
The python2.7 binary will be installed to /opt/python/bin/python2.7, please change the Make file configuration if you want to install python to somewhere else.
There are different settings for client and cluster mode:
For client mode, such as pyspark shell: Configure the environment variable PYSPARK_PYTHON=/path/to/python2.7 in "Spark Client Advanced Configuration Snippet (Safety Valve) for spark-conf/spark-env.sh".
For cluster mode, such as spark-submit: Configure the environment variables spark.yarn.appMasterEnv.PYSPARK_PYTHON=/path/to/python2.7
in "Spark Client Advanced Configuration Snippet (Safety Valve) for spark-conf/spark-defaults.conf" .
- Deploy client configurations.