urllib2 HTTPError HTTP Error 403 Forbidden

Question

I am trying to automate download of historic stock data using python  The URL I am trying to open responds with a CSV file  but I am unable to open using urllib2  I have tried changing user agent as specified in few questions earlier  I even tried to accept response cookies  with no luck  Can you please help    Note  The same method works for yahoo Finance   Code   import urllib2 cookielib  site   http   www nseindia com live market dynaContent live watch get quote getHistoricalData jsp symbol JPASSOCIAT amp fromDate 1-JAN-2012 amp toDate 1-AUG-2012 amp datePeriod unselected amp hiddDwnld true   hdr     User-Agent   Mozilla 5 0    req   urllib2 Request site headers hdr   page   urllib2 urlopen req    Error     File  C  Python27 lib urllib2 py   line 527  in http error default       raise HTTPError req get full url    code  msg  hdrs  fp  urllib2 HTTPError  HTTP Error 403  Forbidden   Thanks for your assistance

User · Answer

NSE website has changed and the older scripts are semi-optimum to current website. This snippet can gather daily details of security. Details include symbol, security type, previous close, open price, high price, low price, average price, traded quantity, turnover, number of trades, deliverable quantities and ratio of delivered vs traded in percentage. These conveniently presented as list of dictionary form.

Python 3.X version with requests and BeautifulSoup

from requests import get
from csv import DictReader
from bs4 import BeautifulSoup as Soup
from datetime import date
from io import StringIO 

SECURITY_NAME="3MINDIA" # Change this to get quote for another stock
START_DATE= date(2017, 1, 1) # Start date of stock quote data DD-MM-YYYY
END_DATE= date(2017, 9, 14)  # End date of stock quote data DD-MM-YYYY


BASE_URL = "https://www.nseindia.com/products/dynaContent/common/productsSymbolMapping.jsp?symbol={security}&segmentLink=3&symbolCount=1&series=ALL&dateRange=+&fromDate={start_date}&toDate={end_date}&dataType=PRICEVOLUMEDELIVERABLE"




def getquote(symbol, start, end):
    start = start.strftime("%-d-%-m-%Y")
    end = end.strftime("%-d-%-m-%Y")

    hdr = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11',
         'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
         'Referer': 'https://cssspritegenerator.com',
         'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
         'Accept-Encoding': 'none',
         'Accept-Language': 'en-US,en;q=0.8',
         'Connection': 'keep-alive'}

    url = BASE_URL.format(security=symbol, start_date=start, end_date=end)
    d = get(url, headers=hdr)
    soup = Soup(d.content, 'html.parser')
    payload = soup.find('div', {'id': 'csvContentDiv'}).text.replace(':', '\n')
    csv = DictReader(StringIO(payload))
    for row in csv:
        print({k:v.strip() for k, v in row.items()})


 if __name__ == '__main__':
     getquote(SECURITY_NAME, START_DATE, END_DATE)

Besides this is relatively modular and ready to use snippet.

User · Answer

This will work in Python 3  import urllib request  user agent    Mozilla 5 0  Windows  U  Windows NT 5 1  en-US  rv 1 9 0 7  Gecko 2009021910 Firefox 3 0 7   url    http   en wikipedia org wiki List of TCP and UDP port numbers  headers   User-Agent  user agent     request urllib request Request url None headers   The assembled request response   urllib request urlopen request  data   response read     The data u need

User · Answer

import urllib request  bank pdf list     quot https   www hdfcbank com content bbp repositories 723fb80a-2dde-42a3-9793-7ae1be57c87f  path  Personal Home content rates pdf quot    quot https   www yesbank in pdf forexcardratesenglish pdf quot    quot https   www sbi co in documents 16012 1400784 FOREX CARD RATES pdf quot     def get pdf url       user agent    Mozilla 5 0  Windows  U  Windows NT 5 1  en-US  rv 1 9 0 7  Gecko 2009021910 Firefox 3 0 7            url    quot https   www yesbank in pdf forexcardratesenglish pdf quot      headers   User-Agent  user agent             request urllib request Request url None headers   The assembled request     response   urllib request urlopen request       print response text      data   response read        print type data            name   url split  quot www  quot   -1  split  quot    quot   -1  split  quot   quot   0   quot  FOREX CARD RATES pdf quot      f   open name   wb       f write data      f close         for bank url in bank pdf list      try           get pdf bank url      except          pass

User · Answer

By adding a few more headers I was able to get the data   import urllib2 cookielib  site   http   www nseindia com live market dynaContent live watch get quote getHistoricalData jsp symbol JPASSOCIAT amp fromDate 1-JAN-2012 amp toDate 1-AUG-2012 amp datePeriod unselected amp hiddDwnld true  hdr     User-Agent    Mozilla 5 0  X11  Linux x86 64  AppleWebKit 537 11  KHTML  like Gecko  Chrome 23 0 1271 64 Safari 537 11           Accept    text html application xhtml xml application xml q 0 9     q 0 8           Accept-Charset    ISO-8859-1 utf-8 q 0 7   q 0 3           Accept-Encoding    none           Accept-Language    en-US en q 0 8           Connection    keep-alive    req   urllib2 Request site  headers hdr   try      page   urllib2 urlopen req  except urllib2 HTTPError  e      print e fp read    content   page read   print content   Actually  it works with just this one additional header    Accept    text html application xhtml xml application xml q 0 9     q 0 8

[python] urllib2.HTTPError: HTTP Error 403: Forbidden

Python 3.X version with requests and BeautifulSoup

Examples related to python

Examples related to http

Examples related to urllib