I am new to Python and am getting this error:
Traceback (most recent call last):
File "/usr/local/bin/scrapy", line 4, in <module>
execute()
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/scrapy/cmdline.py", line 130, in execute
_run_print_help(parser, _run_command, cmd, args, opts)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/scrapy/cmdline.py", line 96, in _run_print_help
func(*a, **kw)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/scrapy/cmdline.py", line 136, in _run_command
cmd.run(args, opts)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/scrapy/commands/crawl.py", line 42, in run
q = self.crawler.queue
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/scrapy/command.py", line 31, in crawler
self._crawler.configure()
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/scrapy/crawler.py", line 36, in configure
self.spiders = spman_cls.from_settings(self.settings)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/scrapy/spidermanager.py", line 33, in from_settings
return cls(settings.getlist('SPIDER_MODULES'))
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/scrapy/spidermanager.py", line 23, in __init__
for module in walk_modules(name):
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/scrapy/utils/misc.py", line 65, in walk_modules
submod = __import__(fullpath, {}, {}, [''])
File "/my_crawler/empt/empt/spiders/empt_spider.py", line 59
check_exists_sql = "SELECT * FROM LINKS WHERE link = '%s' LIMIT 1" % item['link']
^
IndentationError: unexpected indent
On this bit of code:
def parse_item(self, response):
hxs = HtmlXPathSelector(response)
sites = hxs.select('//a[contains(@href, ".mp3")]/@href').extract()
items = [ ]
#for site in sites:
#link = site.select('a/@href').extract()
#print site
for site in sites:
item = EmptItem()
item['link'] = site #site.select('a/@href').extract()
#### DB INSERT ATTEMPT ###
#MySQL Test
#open db connection
db = MySQLdb.connect("localhost","root","str0ng","TESTDB")
#prepare a cursor object using cursor() method
cursor = db.cursor()
#see if any links in the DB match the crawled link
check_exists_sql = "SELECT * FROM LINKS WHERE link = '%s' LIMIT 1" % item['link']
cursor.execute(check_exists_sql)
if cursor.rowcount = 0:
#prepare SQL query to insert a record into the db.
sql = "INSERT INTO LINKS ( link ) VALUES ( '%s')" % item['link']
try:
#execute the sql command
cursor.execute(sql)
#commit your changes to the db
db.commit()
except:
#rollback on error
db.rollback()
#fetch a single row using fetchone() method.
#data = cursor.fetchone()
#print "Database version: %s " % data
#disconnect from server
db.close()
### end mysql
items.append(item)
return items?
This question is related to
python
syntax
indentation
The indentation is wrong, as the error tells you. As you can see, you have indented the code beginning with the indicated line too little to be in the for
loop, but too much to be at the same level as the for loop. Python sees the lack of indentation as ending the for
loop, then complains you have indented the rest of the code too much. (The def
line I'm betting is just an artifact of how Stack Overflow wants you to format your code.)
Edit: Given your correction, I'm betting you have a mixture of tabs and spaces in the source file, such that it looks to the human eye like the code lines up, but Python considers it not to. As others have suggested, using only spaces is the recommended practice (see PEP 8). If you start Python with python -t
, you will get warnings if there are mixed tabs and spaces in your code, which should help you pinpoint the issue.
This error occur when you don't correctly write blocks. Forgetting a ":", or not using "Tab" button for blocks and use spaces. When you are transporting a code from one editor to another editor,it can happen. And never forget this: errors aren't always on that line. I came here for this, but I've forgotten an except after a try. because of my unstandard editor, it happend. But it's possible in normal editor.
import urllib.request
import requests
from bs4 import BeautifulSoup
r = requests.get('https://icons8.com/icons/set/favicon')
If you try to connect to such a site, you will get an indent error.
import urllib.request
import requests
from bs4 import BeautifulSoup
r = requests.get('https://icons8.com/icons/set/favicon')
Python cares about indents
As the error says you have not correctly indented code, check_exists_sql
is not aligned with line above it cursor = db.cursor()
.
Also use 4 spaces for indentation.
Read this http://diveintopython.net/getting_to_know_python/indenting_code.html
The error is pretty straightforward - the line starting with check_exists_sql
isn't indented properly. From the context of your code, I'd indent it and the following lines to match the line before it:
#open db connection
db = MySQLdb.connect("localhost","root","str0ng","TESTDB")
#prepare a cursor object using cursor() method
cursor = db.cursor()
#see if any links in the DB match the crawled link
check_exists_sql = "SELECT * FROM LINKS WHERE link = '%s' LIMIT 1" % item['link']
cursor.execute(check_exists_sql)
And keep indenting it until the for
loop ends (all the way through to and including items.append(item)
.
Source: Stackoverflow.com