Python BeautifulSoup - how to remove all tags from an element

Question

How can I simply strip all tags from an element I find in BeautifulSoup

User · Answer

With BeautifulStoneSoup gone in bs4  it s even simpler in Python3  from bs4 import BeautifulSoup  soup   BeautifulSoup html  text   soup get text   print text

User · Answer

Here is the source code  you can get the text which is exactly in the URL  URL      page   requests get URL  soup   bs4 BeautifulSoup page content  html parser   get text   print soup

User · Answer

Code to simply get the contents as text instead of html    html text  parameter is the string which you will pass in this function to get the text  from bs4 import BeautifulSoup  soup   BeautifulSoup html text   lxml   text   soup get text   print text

User · Answer

why has no answer I ve seen mentioned anything about the unwrap method  Or  even easier  the get text method  http   www crummy com software BeautifulSoup bs4 doc  unwrap http   www crummy com software BeautifulSoup bs4 doc  get-text

User · Answer

You can use the decompose method in bs4   soup   bs4 BeautifulSoup   lt body gt  lt a href  http   example com   gt I linked to  lt i gt example com lt  i gt  lt  a gt  lt  body gt     for a in soup find  a   children      if isinstance a bs4 element Tag           a decompose    print soup  Out   lt html gt  lt body gt  lt a href  http   example com   gt I linked to  lt  a gt  lt  body gt  lt  html gt

User · Answer

it looks like this is the way to do  as simple as that  with this line you are joining together the all text parts within the current element     join htmlelement find text True

User · Answer

Use get text    it returns all the text in a document or beneath a tag  as a single Unicode string   For instance  remove all different script tags from the following text    lt td gt  lt a href  http   www irit fr SC  gt Signal et Communication lt  a gt   lt br  gt  lt a href  http   www irit fr IRT  gt Ing  nierie R  seaux et T  l  communications lt  a gt   lt  td gt    The expected result is   Signal et Communication Ing  nierie R  seaux et T  l  communications   Here is the source code      usr bin env python3 from bs4 import BeautifulSoup  text        lt td gt  lt a href  http   www irit fr SC  gt Signal et Communication lt  a gt   lt br  gt  lt a href  http   www irit fr IRT  gt Ing  nierie R  seaux et T  l  communications lt  a gt   lt  td gt      soup   BeautifulSoup text   print soup get text

[python] Python/BeautifulSoup - how to remove all tags from an element?

Examples related to python

Examples related to beautifulsoup