How do I set the driver s python version in spark

Question

I m using spark 1 4 0-rc2 so I can use python 3 with spark   If I add export PYSPARK PYTHON python3 to my  bashrc file  I can run spark interactively with python 3   However  if I want to run a standalone program in local mode  I get an error   Exception  Python in worker has different version 3 4 than that in driver 2 7  PySpark cannot run with different minor versions   How can I specify the version of python for the driver   Setting export PYSPARK DRIVER PYTHON python3 didn t work

User · Answer

Ran into this today at work  An admin thought it prudent to hard code Python 2 7 as the PYSPARK PYTHON and PYSPARK DRIVER PYTHON in  SPARK HOME conf spark-env sh  Needless to say this broke all of our jobs that utilize any other python versions or environments  which is   90  of our jobs    PhillipStich points out correctly that you may not always have write permissions for this file  as is our case  While setting the configuration in the spark-submit call is an option  another alternative  when running in yarn cluster mode  is to set the SPARK CONF DIR environment variable to point to another configuration script  There you could set your PYSPARK PYTHON and any other options you may need  A template can be found in the spark-env sh source code on github

User · Answer

Error      Exception  Python in worker has different version 2 6 than that in driver     2 7  PySpark cannot run with different minor versions        Fix  for Cloudera environment    Edit this file   opt cloudera parcels cdh5 5 4 p0 9 lib spark conf spark-env sh Add these lines   export PYSPARK PYTHON  usr bin python export PYSPARK DRIVER PYTHON python

User · Answer

If you are working on mac  use the following commands export SPARK HOME  brew info apache-spark   grep  usr   tail -n 1   cut -f 1 -d  quot   quot   libexec export PYTHONPATH  SPARK HOME python  PYTHONPATH  export HADOOP HOME  brew info hadoop   grep  usr   head -n 1   cut -f 1 -d  quot   quot   libexec export LD LIBRARY PATH  HADOOP HOME lib native   LD LIBRARY PATH export PYSPARK PYTHON python3  If you are using another OS  check the following link  https   github com GalvanizeDataScience spark-install

User · Answer

Helped in my case   import os  os environ  SPARK HOME       usr local Cellar apache-spark 1 5 1   os environ  PYSPARK PYTHON     usr local bin python3

User · Answer

I came across the same error message and I have tried three ways mentioned above  I listed the results as a complementary reference to others    Change the PYTHON SPARK and PYTHON DRIVER SPARK value in spark-env sh does not work for me  Change the value inside python script using os environ  PYSPARK PYTHON     usr bin python3 5  os environ  PYSPARK DRIVER PYTHON     usr bin python3 5  does not work for me  Change the value in    bashrc works like a charm

User · Answer

I had the same problem  just forgot to activate my virtual environment  For anyone out there who also had a mental blank

User · Answer

In my case  Ubuntu 18 04   I ran this code in terminal   sudo vim    bashrc   and then edited SPARK HOME as follows   export SPARK HOME  home muser programs anaconda2019 lib python3 7 site-packages pyspark export PATH  PATH  SPARK HOME bin  SPARK HOME sbin   By doing so  my SPARK HOME will refer to the pyspark package I installed in the site-package   To learn how to use vim  go to this link

User · Answer

You can specify the version of Python for the driver by setting the appropriate environment variables in the   conf spark-env sh file  If it doesn t already exist  you can use the spark-env sh template file provided which also includes lots of other variables   Here is a simple example of a spark-env sh file to set the relevant Python environment variables      usr bin env bash    This file is sourced when running various Spark programs  export PYSPARK PYTHON  usr bin python3        export PYSPARK DRIVER PYTHON  usr bin ipython   In this case it sets the version of Python used by the workers executors to Python3 and the driver version of Python to iPython for a nicer shell to work in   If you don t already have a spark-env sh file  and don t need to set any other variables  this one should do what you want  assuming that paths to the relevant python binaries are correct  verify with which   I had a similar problem and this fixed it

User · Answer

In case you only want to change the python version for current task  you can use following pyspark start command       PYSPARK DRIVER PYTHON  home user1 anaconda2 bin python PYSPARK PYTHON  usr local anaconda2 bin python pyspark --master

User · Answer

I got the same issue on standalone spark in windows  My version of fix is like this  I had my environment variables setting as bellow  PYSPARK SUBMIT ARGS  pyspark-shell  PYSPARK DRIVER PYTHON jupyter PYSPARK DRIVER PYTHON OPTS  notebook  pyspark   With this setting I executed an Action on pyspark and got the following exception   Python in worker has different version 3 6 than that in driver 3 5  PySpark cannot run with different minor versions  Please check environment variables PYSPARK PYTHON and PYSPARK DRIVER PYTHON are correctly set    To check with which python version my spark-worker is using hit the following in the cmd prompt    python --version Python 3 6 3   which showed me Python 3 6 3  So clearly my spark-worker is using system python which is v3 6 3   Now as I set my spark-driver to run jupyter by setting PYSPARK DRIVER PYTHON jupyter so I need to check the python version jupyter is using   To do this check open Anaconda Prompt and hit   python --version Python 3 5 X    Anaconda  Inc    Here got the jupyter python is using the v3 5 x  You can check this version also in any Notebook  Help- About    Now I need to update the jupyter python to the version v3 6 6  To do that open up the Anaconda Prompt and hit     conda search python   This will give you a list of available python versions in Anaconda  Install your desired one with     conda install python 3 6 3   Now I have both of the Python installation of same version 3 6 3 Spark should not comply and it didn t when I ran an Action on Spark-driver  Exception is gone   Happy coding

User · Answer

I am using the following environment    python --version  ipython --version  jupyter --version Python 3 5 2  5 3 0 5 0 0   and the following aliases work well for me  alias pyspark  PYSPARK PYTHON  usr local bin python3 PYSPARK DRIVER PYTHON ipython   spark-2 1 1-bin-hadoop2 7 bin pyspark --packages graphframes graphframes 0 5 0-spark2 1-s 2 11      alias pysparknotebook  PYSPARK PYTHON  usr bin python3 PYSPARK DRIVER PYTHON jupyter PYSPARK DRIVER PYTHON OPTS  notebook    spark-2 1 1-bin-hadoop2 7 bin pyspark --packages graphframes graphframes 0 5 0-spark2 1-s 2 11    In the notebook  I set up the environment as follows  from pyspark context import SparkContext sc   SparkContext getOrCreate

User · Answer

You need to make sure the standalone project you re launching is launched with Python 3  If you are submitting your standalone program through spark-submit then it should work fine  but if you are launching it with python make sure you use python3 to start your app   Also  make sure you have set your env variables in   conf spark-env sh  if it doesn t exist you can use spark-env sh template as a base

User · Answer

Please look at the below snippet    setting environment variable for pyspark in linux  ubuntu  goto ---  usr local spark conf  create a new file named spark-env sh copy all content of spark-env sh template to it  then add below lines to it  with path to python  PYSPARK PYTHON   usr bin python3  PYSPARK DRIVER PYTHON   usr bin python3  PYSPARK DRIVER PYTHON OPTS  notebook --no-browser   i was running python 3 6   run -  which python  in terminal to find the path of python

User · Answer

I just faced the same issue and these are the steps that I follow in order to provide Python version  I wanted to run my PySpark jobs with Python 2 7 instead of 2 6     Go to the folder where  SPARK HOME is pointing to  in my case is  home cloudera spark-2 1 0-bin-hadoop2 7   Under folder conf  there is a file called spark-env sh  In case you have a file called spark-env sh template you will need to copy the file to a new file called spark-env sh  Edit the file and write the next three lines     export PYSPARK PYTHON  usr local bin python2 7      export PYSPARK DRIVER PYTHON  usr local bin python2 7      export SPARK YARN USER ENV  PYSPARK PYTHON  usr local bin python2 7   Save it and launch your application again       In that way  if you download a new Spark standalone version  you can set the Python version which you want to run PySpark to

User · Answer

Setting PYSPARK PYTHON python3 and PYSPARK DRIVER PYTHON python3 both to python3 works for me  I did this using export in my  bashrc  In the end  these are the variables I create   export SPARK HOME   HOME Downloads spark-1 4 0-bin-hadoop2 4  export IPYTHON 1 export PYSPARK PYTHON  usr bin python3 export PYSPARK DRIVER PYTHON ipython3 export PYSPARK DRIVER PYTHON OPTS  notebook    I also followed this tutorial to make it work from within Ipython3 notebook  http   ramhiser com 2015 02 01 configuring-ipython-notebook-support-for-pyspark

User · Answer

Run   ls -l  usr local bin python    The first row in this example shows the python3 symlink  To set it as the default python symlink run the following   ln -s -f  usr local bin python3  usr local bin python   then reload your shell

User · Answer

I was running it in IPython  as described in this link by Jacek Wasilewski   and was getting this exception  Added PYSPARK PYTHON to the IPython kernel file and used jupyter notebook to run  and started working   vi     ipython kernels pyspark kernel json      display name    pySpark  Spark 1 4 0      language    python     argv         usr bin python2      -m      IPython kernel      --profile pyspark      -f       connection file          env        SPARK HOME     usr local spark-1 6 1-bin-hadoop2 6       PYTHONPATH     usr local spark-1 6 1-bin-hadoop2 6 python   usr local spark-1  6 1-bin-hadoop2 6 python lib py4j-0 8 2 1-src zip      PYTHONSTARTUP     usr local spark-1 6 1-bin-hadoop2 6 python pyspark shell py       PYSPARK SUBMIT ARGS    --master spark   127 0 0 1 7077 pyspark-shell      PYSPARK DRIVER PYTHON   ipython2       PYSPARK PYTHON    python2

User · Answer

If you re running Spark in a larger organization and are unable to update the  spark-env sh file  exporting the environment variables may not work   You can add the specific Spark settings through the --conf option when submitting the job at run time   pyspark --master yarn -- other settings      --conf  spark pyspark python  your python loc bin python      --conf  spark pyspark driver python  your python loc bin python

[apache-spark] How do I set the driver's python version in spark?

Examples related to apache-spark

Examples related to pyspark