I am using BeautifulSoup and parsing some HTMLs.
I'm getting a certain data from each HTML (using for loop) and adding that data to a certain list.
The problem is, some of the HTMLs have different format (and they don't have the data that I want in them).
So, I was trying to use exception handling and add value null
to the list (I should do this since the sequence of data is important.)
For instance, I have a code like:
soup = BeautifulSoup(links)
dlist = soup.findAll('dd', 'title')
# I'm trying to find content between <dd class='title'> and </dd>
gotdata = dlist[1]
# and what i want is the 2nd content of those
newlist.append(gotdata)
# and I add that to a newlist
and some of the links don't have any <dd class='title'>
, so what I want to do is add string null
to the list instead.
The error appears:
list index out of range.
What I have done tried is to add some lines like this:
if not dlist[1]:
newlist.append('null')
continue
But it doesn't work out. It still shows error:
list index out of range.
What should I do about this? Should I use exception handling? or is there any easier way?
Any suggestions? Any help would be really great!
This question is related to
python
list
exception-handling
For anyone interested in a shorter way:
gotdata = len(dlist)>1 and dlist[1] or 'null'
But for best performance, I suggest using False
instead of 'null'
, then a one line test will suffice:
gotdata = len(dlist)>1 and dlist[1]
Taking reference of ThiefMaster? sometimes we get an error with value given as '\n' or null and perform for that required to handle ValueError:
Handling the exception is the way to go
try:
gotdata = dlist[1]
except (IndexError, ValueError):
gotdata = 'null'
You have two options; either handle the exception or test the length:
if len(dlist) > 1:
newlist.append(dlist[1])
continue
or
try:
newlist.append(dlist[1])
except IndexError:
pass
continue
Use the first if there often is no second item, the second if there sometimes is no second item.
for i in range (1, len(list))
try:
print (list[i])
except ValueError:
print("Error Value.")
except indexError:
print("Erorr index")
except :
print('error ')
A ternary will suffice. change:
gotdata = dlist[1]
to
gotdata = dlist[1] if len(dlist) > 1 else 'null'
this is a shorter way of expressing
if len(dlist) > 1:
gotdata = dlist[1]
else:
gotdata = 'null'
Source: Stackoverflow.com