I already extract some information from a forum. It is the raw string I have now:
string = 'i think mabe 124 + <font color="black"><font face="Times New Roman">but I don\'t have a big experience it just how I see it in my eyes <font color="green"><font face="Arial">fun stuff'
The thing I do not like is the sub string "<font color="black"><font face="Times New Roman">"
and "<font color="green"><font face="Arial">"
. I do want to keep the other part of string except this. So the result should be like this
resultString = "i think mabe 124 + but I don't have a big experience it just how I see it in my eyes fun stuff"
How could I do this? Actually I used beautiful soup to extract the string above from a forum. Now I may prefer regular expression to remove the part.
BeautifulSoup(text, features="html.parser").text
For the people who were seeking deep info in my answer, sorry.
I'll explain it.
Beautifulsoup is a widely use python package that helps the user (developer) to interact with HTML within python.
The above like just take all the HTML text (text
) and cast it to Beautifulsoup object - that means behind the sense its parses everything up (Every HTML tag within the given text)
Once done so, we just request all the text from within the HTML object.
>>> import re
>>> st = " i think mabe 124 + <font color=\"black\"><font face=\"Times New Roman\">but I don't have a big experience it just how I see it in my eyes <font color=\"green\"><font face=\"Arial\">fun stuff"
>>> re.sub("<.*?>","",st)
" i think mabe 124 + but I don't have a big experience it just how I see it in my eyes fun stuff"
>>>
Source: Stackoverflow.com