String to HtmlDocument

Question

I m fetching the html document by URL using WebClient DownloadString url  but then its very hard to find the element content that I m looking for  Whilst reading around I ve spotted HtmlDocument and that it has neat things like GetElementById  How can I populate an HtmlDocument with the html returned by url

User · Answer

I ve adapted Nikhil s answer somewhat to simplify it  Admittedly  I have not run it through a  net compiler and there are likely very good reasons for the lines Nikhil put in which I have omitted  However  at least in my use case  a very simple page  they were unnecessary   My use case was for a quick powershell script    htmlText     New-Object  System Net WebClient  DownloadString   lt URI HERE gt     Get the HTML document from a webserver  browser   New-Object System Windows Forms WebBrowser  browser DocumentText    htmlText  browser Document Write  htmlText   response    browser document   For my case  this returned an HTMLDocument object with HTMLElement objects in it  instead of   ComObject object types  which are a challenge to use in powershell class code  returned by a call to Invoke-WebRequest in PS 5 1 14393 1944  I believe the equivalent C  code is   public System Windows Forms HtmlDocument GetHtmlDocument string html        WebBrowser browser   new WebBrowser        browser DocumentText   html      browser Document Write html       return browser Document

User · Answer

For those who don t want to use HTML agility pack and want to get HtmlDocument from string using native  net code only here is a good article on how to convert string to HtmlDocument  Here is the code block to use  public System Windows Forms HtmlDocument GetHtmlDocument string html                        WebBrowser browser   new WebBrowser                browser ScriptErrorsSuppressed   true              browser DocumentText   html              browser Document OpenNew true               browser Document Write html               browser Refresh                return browser Document

User · Answer

Using Html Agility Pack as suggested by SLaks  this becomes very easy   string html   webClient DownloadString url   var doc   new HtmlDocument    doc LoadHtml html    HtmlNode specificNode   doc GetElementById  nodeId    HtmlNodeCollection nodesMatchingXPath   doc DocumentNode SelectNodes  x path nodes

User · Answer

To answer the original question   HTMLDocument doc   new HTMLDocument    IHTMLDocument2 doc2    IHTMLDocument2 doc  doc2 write fileText      now use doc   Then to convert back to a string   doc documentElement outerHTML

User · Answer

you could get a htmldocument by    System Net WebClient wc   new System Net WebClient      System IO Stream stream   wc OpenRead url    System IO StreamReader reader   new System IO StreamReader stream    string s   reader ReadToEnd      HtmlDocument doc   new HtmlDocument     doc LoadHtml s     so you have getbiyid and getbyname     but any further you d better of with HTML Agility Pack as suggested   f e  you can do  doc DocumentNode SelectNodes xpathselector   or regex to parse the doc     btw  why not regex     its soo cool if you can use it right    but xpath is also very mighty     so  choose your poison    cu

User · Answer

You could try with OpenNew and then with Write but that s a bit strange use of that class  More info on MSDN

User · Answer

The HtmlDocument class is a wrapper around the native IHtmlDocument2 COM interface  You cannot easily create it from a string   You should use the HTML Agility Pack

[c#] String to HtmlDocument

Examples related to c#

Examples related to html