[python] Is there a way to use PhantomJS in Python?

I want to use PhantomJS in Python. I googled this problem but couldn't find proper solutions.

I find os.popen() may be a good choice. But I couldn't pass some arguments to it.

Using subprocess.Popen() may be a proper solution for now. I want to know whether there's a better solution or not.

Is there a way to use PhantomJS in Python?

This question is related to python phantomjs

The answer is


The answer by @Pykler is great but the Node requirement is outdated. The comments in that answer suggest the simpler answer, which I've put here to save others time:

  1. Install PhantomJS

    As @Vivin-Paliath points out, it's a standalone project, not part of Node.

    Mac:

    brew install phantomjs
    

    Ubuntu:

    sudo apt-get install phantomjs
    

    etc

  2. Set up a virtualenv (if you haven't already):

    virtualenv mypy  # doesn't have to be "mypy". Can be anything.
    . mypy/bin/activate
    

    If your machine has both Python 2 and 3 you may need run virtualenv-3.6 mypy or similar.

  3. Install selenium:

    pip install selenium
    
  4. Try a simple test, like this borrowed from the docs:

    from selenium import webdriver
    from selenium.webdriver.common.keys import Keys
    
    driver = webdriver.PhantomJS()
    driver.get("http://www.python.org")
    assert "Python" in driver.title
    elem = driver.find_element_by_name("q")
    elem.clear()
    elem.send_keys("pycon")
    elem.send_keys(Keys.RETURN)
    assert "No results found." not in driver.page_source
    driver.close()
    

Now since the GhostDriver comes bundled with the PhantomJS, it has become even more convenient to use it through Selenium.

I tried the Node installation of PhantomJS, as suggested by Pykler, but in practice I found it to be slower than the standalone installation of PhantomJS. I guess standalone installation didn't provided these features earlier, but as of v1.9, it very much does so.

  1. Install PhantomJS (http://phantomjs.org/download.html) (If you are on Linux, following instructions will help https://stackoverflow.com/a/14267295/382630)
  2. Install Selenium using pip.

Now you can use like this

import selenium.webdriver
driver = selenium.webdriver.PhantomJS()
driver.get('http://google.com')
# do some processing

driver.quit()

If using Anaconda, install with:

conda install PhantomJS

in your script:

from selenium import webdriver
driver=webdriver.PhantomJS()

works perfectly.


Here's how I test javascript using PhantomJS and Django:

mobile/test_no_js_errors.js:

var page = require('webpage').create(),
    system = require('system'),
    url = system.args[1],
    status_code;

page.onError = function (msg, trace) {
    console.log(msg);
    trace.forEach(function(item) {
        console.log('  ', item.file, ':', item.line);
    });
};

page.onResourceReceived = function(resource) {
    if (resource.url == url) {
        status_code = resource.status;
    }
};

page.open(url, function (status) {
    if (status == "fail" || status_code != 200) {
        console.log("Error: " + status_code + " for url: " + url);
        phantom.exit(1);
    }
    phantom.exit(0);
});

mobile/tests.py:

import subprocess
from django.test import LiveServerTestCase

class MobileTest(LiveServerTestCase):
    def test_mobile_js(self):
        args = ["phantomjs", "mobile/test_no_js_errors.js", self.live_server_url]
        result = subprocess.check_output(args)
        self.assertEqual(result, "")  # No result means no error

Run tests:

manage.py test mobile


In case you are using Buildout, you can easily automate the installation processes that Pykler describes using the gp.recipe.node recipe.

[nodejs]
recipe = gp.recipe.node
version = 0.10.32
npms = phantomjs
scripts = phantomjs

That part installs node.js as binary (at least on my system) and then uses npm to install PhantomJS. Finally it creates an entry point bin/phantomjs, which you can call the PhantomJS webdriver with. (To install Selenium, you need to specify it in your egg requirements or in the Buildout configuration.)

driver = webdriver.PhantomJS('bin/phantomjs')

PhantomJS recently dropped Python support altogether. However, PhantomJS now embeds Ghost Driver.

A new project has since stepped up to fill the void: ghost.py. You probably want to use that instead:

from ghost import Ghost
ghost = Ghost()

with ghost.start() as session:
    page, extra_resources = ghost.open("http://jeanphi.me")
    assert page.http_status==200 and 'jeanphix' in ghost.content

this is what I do, python3.3. I was processing huge lists of sites, so failing on the timeout was vital for the job to run through the entire list.

command = "phantomjs --ignore-ssl-errors=true "+<your js file for phantom>
process = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE)

# make sure phantomjs has time to download/process the page
# but if we get nothing after 30 sec, just move on
try:
    output, errors = process.communicate(timeout=30)
except Exception as e:
    print("\t\tException: %s" % e)
    process.kill()

# output will be weird, decode to utf-8 to save heartache
phantom_output = ''
for out_line in output.splitlines():
    phantom_output += out_line.decode('utf-8')