Python Requests and persistent sessions

131

I am using the requests module (version 0.10.0 with Python 2.5). I have figured out how to submit data to a login form on a website and retrieve the session key, but I can't see an obvious way to use this session key in subsequent requests. Can someone fill in the ellipsis in the code below or suggest another approach?

>>> import requests
>>> login_data =  {'formPosted':'1', 'login_email':'[email protected]', 'password':'pw'}
>>> r = requests.post('https://localhost/login.py', login_data)
>>> 
>>> r.text
u'You are being redirected <a href="profilePage?_ck=1349394964">here</a>'
>>> r.cookies
{'session_id_myapp': '127-0-0-1-825ff22a-6ed1-453b-aebc-5d3cf2987065'}
>>> 
>>> r2 = requests.get('https://localhost/profile_data.json', ...)

This question is tagged with python python-requests

~ Asked on 2012-10-05 00:06:36

The Best Answer is


227

You can easily create a persistent session using:

s = requests.Session()

After that, continue with your requests as you would:

s.post('https://localhost/login.py', login_data)
#logged in! cookies saved for future requests.
r2 = s.get('https://localhost/profile_data.json', ...)
#cookies sent automatically!
#do whatever, s will keep your cookies intact :)

For more about sessions: https://requests.kennethreitz.org/en/master/user/advanced/#session-objects

~ Answered on 2012-10-05 00:24:06


29

the other answers help to understand how to maintain such a session. Additionally, I want to provide a class which keeps the session maintained over different runs of a script (with a cache file). This means a proper "login" is only performed when required (timout or no session exists in cache). Also it supports proxy settings over subsequent calls to 'get' or 'post'.

It is tested with Python3.

Use it as a basis for your own code. The following snippets are release with GPL v3

import pickle
import datetime
import os
from urllib.parse import urlparse
import requests    

class MyLoginSession:
    """
    a class which handles and saves login sessions. It also keeps track of proxy settings.
    It does also maintine a cache-file for restoring session data from earlier
    script executions.
    """
    def __init__(self,
                 loginUrl,
                 loginData,
                 loginTestUrl,
                 loginTestString,
                 sessionFileAppendix = '_session.dat',
                 maxSessionTimeSeconds = 30 * 60,
                 proxies = None,
                 userAgent = 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:40.0) Gecko/20100101 Firefox/40.1',
                 debug = True,
                 forceLogin = False,
                 **kwargs):
        """
        save some information needed to login the session

        you'll have to provide 'loginTestString' which will be looked for in the
        responses html to make sure, you've properly been logged in

        'proxies' is of format { 'https' : 'https://user:[email protected]:port', 'http' : ...
        'loginData' will be sent as post data (dictionary of id : value).
        'maxSessionTimeSeconds' will be used to determine when to re-login.
        """
        urlData = urlparse(loginUrl)

        self.proxies = proxies
        self.loginData = loginData
        self.loginUrl = loginUrl
        self.loginTestUrl = loginTestUrl
        self.maxSessionTime = maxSessionTimeSeconds
        self.sessionFile = urlData.netloc + sessionFileAppendix
        self.userAgent = userAgent
        self.loginTestString = loginTestString
        self.debug = debug

        self.login(forceLogin, **kwargs)

    def modification_date(self, filename):
        """
        return last file modification date as datetime object
        """
        t = os.path.getmtime(filename)
        return datetime.datetime.fromtimestamp(t)

    def login(self, forceLogin = False, **kwargs):
        """
        login to a session. Try to read last saved session from cache file. If this fails
        do proper login. If the last cache access was too old, also perform a proper login.
        Always updates session cache file.
        """
        wasReadFromCache = False
        if self.debug:
            print('loading or generating session...')
        if os.path.exists(self.sessionFile) and not forceLogin:
            time = self.modification_date(self.sessionFile)         

            # only load if file less than 30 minutes old
            lastModification = (datetime.datetime.now() - time).seconds
            if lastModification < self.maxSessionTime:
                with open(self.sessionFile, "rb") as f:
                    self.session = pickle.load(f)
                    wasReadFromCache = True
                    if self.debug:
                        print("loaded session from cache (last access %ds ago) "
                              % lastModification)
        if not wasReadFromCache:
            self.session = requests.Session()
            self.session.headers.update({'user-agent' : self.userAgent})
            res = self.session.post(self.loginUrl, data = self.loginData, 
                                    proxies = self.proxies, **kwargs)

            if self.debug:
                print('created new session with login' )
            self.saveSessionToCache()

        # test login
        res = self.session.get(self.loginTestUrl)
        if res.text.lower().find(self.loginTestString.lower()) < 0:
            raise Exception("could not log into provided site '%s'"
                            " (did not find successful login string)"
                            % self.loginUrl)

    def saveSessionToCache(self):
        """
        save session to a cache file
        """
        # always save (to update timeout)
        with open(self.sessionFile, "wb") as f:
            pickle.dump(self.session, f)
            if self.debug:
                print('updated session cache-file %s' % self.sessionFile)

    def retrieveContent(self, url, method = "get", postData = None, **kwargs):
        """
        return the content of the url with respect to the session.

        If 'method' is not 'get', the url will be called with 'postData'
        as a post request.
        """
        if method == 'get':
            res = self.session.get(url , proxies = self.proxies, **kwargs)
        else:
            res = self.session.post(url , data = postData, proxies = self.proxies, **kwargs)

        # the session has been updated on the server, so also update in cache
        self.saveSessionToCache()            

        return res

A code snippet for using the above class may look like this:

if __name__ == "__main__":
    # proxies = {'https' : 'https://user:[email protected]:port',
    #           'http' : 'http://user:[email protected]:port'}

    loginData = {'user' : 'usr',
                 'password' :  'pwd'}

    loginUrl = 'https://...'
    loginTestUrl = 'https://...'
    successStr = 'Hello Tom'
    s = MyLoginSession(loginUrl, loginData, loginTestUrl, successStr, 
                       #proxies = proxies
                       )

    res = s.retrieveContent('https://....')
    print(res.text)

    # if, for instance, login via JSON values required try this:
    s = MyLoginSession(loginUrl, None, loginTestUrl, successStr, 
                       #proxies = proxies,
                       json = loginData)

~ Answered on 2016-05-09 14:32:28


Most Viewed Questions: