importing pyspark in python shell

Question

This is a copy of someone else s question on another forum that was never answered  so I thought I d re-ask it here  as I have the same issue    See http   geekple com blogs feeds Xgzu7 posts 351703064084736   I have Spark installed properly on my machine and am able to run python programs with the pyspark modules without error when using   bin pyspark as my python interpreter   However  when I attempt to run the regular Python shell  when I try to import pyspark modules I get this error   from pyspark import SparkContext   and it says   No module named pyspark     How can I fix this   Is there an environment variable I need to set to point Python to the pyspark headers libraries etc    If my spark installation is  spark   which pyspark paths do I need to include   Or can pyspark programs only be run from the pyspark interpreter

User · Answer

For Linux users  the following is the correct  and non-hard-coded   way of including the pyspark libaray in PYTHONPATH  Both PATH parts are necessary    The path to the pyspark Python module itself  and The path to the zipped library that that pyspark module relies on when imported   Notice below that the zipped library version is dynamically determined  so we do not hard-code it   export PYTHONPATH   SPARK HOME  python    echo   SPARK HOME  python lib py4j- -src zip    PYTHONPATH

User · Answer

You can get the pyspark path in python using pip  if you have installed pyspark using PIP  as below  pip show pyspark

User · Answer

export PYSPARK PYTHON  home user anaconda3 bin python export PYSPARK DRIVER PYTHON jupyter export PYSPARK DRIVER PYTHON OPTS  notebook    This is what I did for using my Anaconda distribution with Spark  This is Spark version independent  You can change the first line to your users  python bin   Also  as of Spark 2 2 0 PySpark is available as a Stand-alone package on PyPi but I am yet to test it out

User · Answer

I had this same problem and would add one thing to the proposed solutions above  When using Homebrew on Mac OS X to install Spark you will need to correct the py4j path address to include libexec in the path  remembering to change py4j version to the one you have    PYTHONPATH  SPARK HOME libexec python lib py4j-0 9-src zip  PYTHONPATH

User · Answer

On Windows 10 the following worked for me  I added the following environment variables using Settings   Edit environment variables for your account   SPARK HOME C  Programming spark-2 0 1-bin-hadoop2 7 PYTHONPATH  SPARK HOME  python  PYTHONPATH     change  C  Programming      to the folder in which you have installed spark

User · Answer

By exporting the SPARK path and the Py4j path  it started to work   export SPARK HOME  usr local Cellar apache-spark 1 5 1 export PYTHONPATH  SPARK HOME libexec python  SPARK HOME libexec python build  PYTHONPATH PYTHONPATH  SPARK HOME python lib py4j-0 8 2 1-src zip  PYTHONPATH  export PYTHONPATH  SPARK HOME python  SPARK HOME python build  PYTHONPATH   So  if you don t want to type these everytime you want to fire up the Python shell  you might want to add it to your  bashrc file

User · Answer

In the case of DSE  DataStax Cassandra  amp  Spark   The following location needs to be added to PYTHONPATH  export PYTHONPATH  usr share dse resources spark python  PYTHONPATH   Then use the dse pyspark to get the modules in path   dse pyspark

User · Answer

I had the same problem   Also make sure you are using right python version and you are installing it with right pip version  in my case  I had both python 2 7 and 3 x   I have installed pyspark with   pip2 7 install pyspark  and it worked

User · Answer

If it prints such error      ImportError  No module named py4j java gateway   Please add  SPARK HOME python build to PYTHONPATH   export SPARK HOME  Users pzhang apps spark-1 1 0-bin-hadoop2 4 export PYTHONPATH  SPARK HOME python  SPARK HOME python build  PYTHONPATH

User · Answer

dont run your py file as  python filename py instead use  spark-submit filename py

User · Answer

In my case it was getting install at a different python dist package  python 3 5  whereas I was using python 3 6  so the below helped   python -m pip install pyspark

User · Answer

Turns out that the pyspark bin is LOADING python and automatically loading the correct library paths   Check out  SPARK HOME bin pyspark      Add the PySpark classes to the Python path  export PYTHONPATH  SPARK HOME python   PYTHONPATH   I added this line to my  bashrc file and the modules are now correctly found

User · Answer

I got this error because the python script I was trying to submit was called pyspark py  facepalm    The fix was to set my PYTHONPATH as recommended above  then rename the script to pyspark test py and clean up the pyspark pyc that was created based on my scripts original name and that cleared this error up

User · Answer

Here is a simple method  If you don t bother about how it works         Use findspark    Go to your python shell  pip install findspark  import findspark findspark init    import the necessary modules  from pyspark import SparkContext from pyspark import SparkConf  Done

User · Answer

I am running a spark cluster  on CentOS VM  which is installed from cloudera yum packages   Had to set the following variables to run pyspark   export SPARK HOME  usr lib spark  export PYTHONPATH  SPARK HOME python  SPARK HOME python lib py4j-0 9-src zip  PYTHONPATH

User · Answer

For a Spark execution in pyspark two components are required to work together    pyspark python package Spark instance in a JVM   When launching things with spark-submit or pyspark  these scripts will take care of both  i e  they set up your PYTHONPATH  PATH  etc  so that your script can find pyspark  and they also start the spark instance  configuring according to your params  e g  --master X  Alternatively  it is possible to bypass these scripts and run your spark application directly in the python interpreter likepython myscript py  This is especially interesting when spark scripts start to become more complex and eventually receive their own args    Ensure the pyspark package  can be found by the Python interpreter  As already discussed either add the spark python dir to PYTHONPATH or directly install pyspark using pip install  Set the parameters of spark instance from your script  those that used to be passed to pyspark      For spark configurations as you d normally set with --conf they are defined with a config object  or string configs  in SparkSession builder config For main options  like --master  or --driver-mem  for the moment you can set them by writing to the PYSPARK SUBMIT ARGS environment variable  To make things cleaner and safer you can set it from within Python itself  and spark will read it when starting   Start the instance  which just requires you to call getOrCreate   from the builder object    Your script can therefore have something like this   from pyspark sql import SparkSession  if   name         main         if spark main opts            Set main options  e g   --master local 4           os environ  PYSPARK SUBMIT ARGS     spark main opts     pyspark-shell         Set spark config     spark    SparkSession builder               config  spark checkpoint compress   True                config  spark jars packages    graphframes graphframes 0 5 0-spark2 1-s 2 11                 getOrCreate

User · Answer

You can also create a Docker container with Alpine as the OS and the install Python and Pyspark as packages  That will have it all containerised

User · Answer

On Mac  I use Homebrew to install Spark  formula  apache-spark    Then  I set the PYTHONPATH this way so the Python import works   export SPARK HOME  usr local Cellar apache-spark 1 2 0 export PYTHONPATH  SPARK HOME libexec python  SPARK HOME libexec python build  PYTHONPATH   Replace the  1 2 0  with the actual apache-spark version on your mac

User · Answer

To get rid of ImportError  No module named py4j java gateway  you need to add following lines   import os import sys   os environ  SPARK HOME      D  python spark-1 4 1-bin-hadoop2 4    sys path append  D  python spark-1 4 1-bin-hadoop2 4 python   sys path append  D  python spark-1 4 1-bin-hadoop2 4 python lib py4j-0 8 2 1-src zip    try      from pyspark import SparkContext     from pyspark import SparkConf      print   success    except ImportError as e      print   error importing spark modules   e      sys exit 1

[python] importing pyspark in python shell

Examples related to python

Examples related to apache-spark

Examples related to pyspark