How do I remove all HTML tags from a string without knowing which tags are in it

Question

Is there any easy way to remove all HTML tags or ANYTHING HTML related from a string   For example   string title     lt b gt  Hulk Hogan s Celebrity Championship Wrestling  amp nbsp  amp nbsp  amp nbsp  lt font color    228b22   gt  Proj   206010  lt  font gt  lt  b gt  amp nbsp  amp nbsp  amp nbsp   Reality Series   amp nbsp      The above should really be     Hulk Hogan s Celebrity Championship Wrestling  Proj   206010   Reality Series

User · Accepted Answer

You can use a simple regex like this   public static string StripHTML string input       return Regex Replace input    lt     gt    String Empty       Be aware that this solution has its own flaw  See Remove HTML tags in String for more information  especially the comments of  mehaase   Another solution would be to use the HTML Agility Pack  You can find an example using the library here  HTML agility pack - removing unwanted tags without removing content

User · Answer

You can use the below code on your string and you will get the complete string without html part   string title     lt b gt  Hulk Hogan s Celebrity Championship Wrestling  amp nbsp  amp nbsp  amp nbsp  lt font color    228b22   gt  Proj   206010  lt  font gt  lt  b gt  amp nbsp  amp nbsp  amp nbsp   Reality Series   amp nbsp    Replace   amp nbsp   string Empty                       string s   Regex Replace title    lt     gt    String Empty

User · Answer

You can parse the string using Html Agility pack and get the InnerText       HtmlDocument htmlDoc   new HtmlDocument        htmlDoc LoadHtml    lt b gt  Hulk Hogan s Celebrity Championship Wrestling  amp nbsp  amp nbsp  amp nbsp  lt font color    228b22   gt  Proj   206010  lt  font gt  lt  b gt  amp nbsp  amp nbsp  amp nbsp   Reality Series   amp nbsp          string result   htmlDoc DocumentNode InnerText

[c#] How do I remove all HTML tags from a string without knowing which tags are in it?

Examples related to c#

Examples related to html