Get HTML code from website in C

Question

How to get the HTML code from a website  save it  and find some text by a LINQ expression   I m using the following code to get the source of a web page   public static String code string Url        HttpWebRequest myRequest    HttpWebRequest WebRequest Create Url       myRequest Method    GET       WebResponse myResponse   myRequest GetResponse        StreamReader sr   new StreamReader myResponse GetResponseStream    System Text Encoding UTF8       string result   sr ReadToEnd        sr Close        myResponse Close         return result       How do I find the text in a div in the source of the web page

User · Answer

I am using AngleSharp and have been very satisfied with it.

Here is a simple example how to fetch a page:

var config = Configuration.Default.WithDefaultLoader();
var document = await BrowsingContext.New(config).OpenAsync("https://www.google.com");

And now you have a web page in document variable. Then you can easily access it by LINQ or other methods. For example if you want to get a string value from a HTML table:

var someStringValue = document.All.Where(m =>
        m.LocalName == "td" &&
        m.HasAttribute("class") &&
        m.GetAttribute("class").Contains("pid-1-bid")
    ).ElementAt(0).TextContent.ToString();

To use CSS selectors please see AngleSharp examples.

User · Answer

Here s an example of using the HttpWebRequest class to fetch a URL  private void buttonl Click object sender  EventArgs e          String url   TextBox url Text      HttpWebRequest request    HttpWebRequest  WebRequest Create url        HttpWebResponse response    HttpWebResponse  request GetResponse         StreamReader sr   new StreamReader response GetResponseStream          richTextBox1 Text   sr ReadToEnd         sr Close

User · Answer

Best thing to use is HTMLAgilityPack  You can also look into using Fizzler or CSQuery depending on your needs for selecting the elements from the retrieved page  Using LINQ or Regukar Expressions is just to error prone  especially when the HTML can be malformed  missing closing tags  have nested child elements etc   You need to stream the page into an HtmlDocument object and then select your required element      Call the page and get the generated HTML var doc   new HtmlAgilityPack HtmlDocument    HtmlAgilityPack HtmlNode ElementsFlags  br     HtmlAgilityPack HtmlElementFlag Empty  doc OptionWriteEmptyNodes   true   try       var webRequest   HttpWebRequest Create pageUrl       Stream stream   webRequest GetResponse   GetResponseStream        doc Load stream       stream Close      catch  System UriFormatException uex        Log Fatal  There was an error in the format of the url      itemUrl  uex       throw    catch  System Net WebException wex        Log Fatal  There was an error connecting to the url      itemUrl  wex       throw       get the div by id and then get the inner text  string testDivSelector      div  id  test     var divString   doc DocumentNode SelectSingleNode testDivSelector  InnerHtml ToString       EDIT  Actually  scrap that  The simplest method is to use FizzlerEx  an updated jQuery CSS3-selectors implementation of the original Fizzler project   Code sample directly from their site   using HtmlAgilityPack  using Fizzler Systems HtmlAgilityPack     get the page var web   new HtmlWeb    var document   web Load  http   example com page html    var page   document DocumentNode     loop through all div tags with item css class foreach var item in page QuerySelectorAll  div item          var title   item QuerySelector  h3 not  share    InnerText      var date   DateTime Parse item QuerySelector  span eq 2    InnerText       var description   item QuerySelector  span has b    InnerHtml      I don t think it can get any simpler than that

User · Answer

Better you can use the Webclient class to simplify your task   using System Net   using  WebClient client   new WebClient          string htmlCode   client DownloadString  http   somesite com default html

User · Answer

Getting HTML code from a website  You can use code like this   string urlAddress    http   google com    HttpWebRequest request    HttpWebRequest WebRequest Create urlAddress   HttpWebResponse response    HttpWebResponse request GetResponse     if  response StatusCode    HttpStatusCode OK      Stream receiveStream   response GetResponseStream      StreamReader readStream   null     if  String IsNullOrWhiteSpace response CharacterSet        readStream   new StreamReader receiveStream     else      readStream   new StreamReader receiveStream  Encoding GetEncoding response CharacterSet       string data   readStream ReadToEnd       response Close      readStream Close        This will give you the returned HTML code from the website  But find text via LINQ is not that easy  Perhaps it is better to use regular expression but that does not play well with HTML code

User · Answer

You can use WebClient to download the html for any url  Once you have the html  you can use a third-party library like HtmlAgilityPack to lookup values in the html as in below code - public static string GetInnerHtmlFromDiv string url                string HTML          using  var wc   new WebClient                          HTML   wc DownloadString url                     var doc   new HtmlAgilityPack HtmlDocument            doc LoadHtml HTML                    HtmlNode element   doc DocumentNode SelectSingleNode  quot   div  id   lt div id here gt    quot            if  element    null                        return element InnerHtml ToString                         return null

User · Answer

Try this solution  It works fine    try          String url   textBox1 Text          HttpWebRequest request    HttpWebRequest WebRequest Create url           HttpWebResponse response    HttpWebResponse request GetResponse            StreamReader sr   new StreamReader response GetResponseStream             HtmlAgilityPack HtmlDocument doc   new HtmlAgilityPack HtmlDocument            doc Load sr           var aTags   doc DocumentNode SelectNodes    a            int counter   1          if  aTags    null                        foreach  var aTag in aTags                                richTextBox1 Text     aTag InnerHtml      n                    counter                                    sr Close                      catch  Exception ex                        MessageBox Show  Failed to retrieve related keywords     ex

[c#] Get HTML code from website in C#

Examples related to c#

Examples related to html

Examples related to linq