[c#] HTML Agility pack - parsing tables

I want to use the HTML agility pack to parse tables from complex web pages, but I am somehow lost in the object model.

I looked at the link example, but did not find any table data this way. Can I use XPath to get the tables? I am basically lost after having loaded the data as to how to get the tables. I have done this in Perl before and it was a bit clumsy, but worked. (HTML::TableParser).

I am also happy if one can just shed a light on the right object order for the parsing.

This question is related to c# html html-parsing html-agility-pack

The answer is


I know this is a pretty old question but this was my solution that helped with visualizing the table so you can create a class structure. This is also using the HTML Agility Pack

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(@"<html><body><p><table id=""foo""><tr><th>hello</th></tr><tr><td>world</td></tr></table></body></html>");
var table = doc.DocumentNode.SelectSingleNode("//table");
var tableRows = table.SelectNodes("tr");
var columns = tableRows[0].SelectNodes("th/text()");
for (int i = 1; i < tableRows.Count; i++)
{
    for (int e = 0; e < columns.Count; e++)
    {
        var value = tableRows[i].SelectSingleNode($"td[{e + 1}]");
        Console.Write(columns[e].InnerText + ":" + value.InnerText);
    }
Console.WriteLine();
}

Line from above answer:

HtmlDocument doc = new HtmlDocument();

This doesn't work in VS 2015 C#. You cannot construct an HtmlDocument any more.

Another MS "feature" that makes things more difficult to use. Try HtmlAgilityPack.HtmlWeb and check out this link for some sample code.


The most simple what I've found to get the XPath for a particular Element is to install FireBug extension for Firefox go to the site/webpage press F12 to bring up firebug; right select and right click the element on the page that you want to query and select "Inspect Element" Firebug will select the element in its IDE then right click the Element in Firebug and choose "Copy XPath" this function will give you the exact XPath Query you need to get the element you want using HTML Agility Library.


In my case, there is a single table which happens to be a device list from a router. If you wish to read the table using TR/TH/TD (row, header, data) instead of a matrix as mentioned above, you can do something like the following:

    List<TableRow> deviceTable = (from table in document.DocumentNode.SelectNodes(XPathQueries.SELECT_TABLE)
                                       from row in table?.SelectNodes(HtmlBody.TR)
                                       let rows = row.SelectSingleNode(HtmlBody.TR)
                                       where row.FirstChild.OriginalName != null && row.FirstChild.OriginalName.Equals(HtmlBody.T_HEADER)
                                       select new TableRow
                                       {
                                           Header = row.SelectSingleNode(HtmlBody.T_HEADER)?.InnerText,
                                           Data = row.SelectSingleNode(HtmlBody.T_DATA)?.InnerText}).ToList();
                                       }  

TableRow is just a simple object with Header and Data as properties. The approach takes care of null-ness and this case:

_x000D_
_x000D_
<tr>_x000D_
    <td width="28%">&nbsp;</td>_x000D_
</tr>
_x000D_
_x000D_
_x000D_

which is row without a header. The HtmlBody object with the constants hanging off of it are probably readily deduced but I apologize for it even still. I came from the world where if you have " in your code, it should either be constant or localizable.


Examples related to c#

How can I convert this one line of ActionScript to C#? Microsoft Advertising SDK doesn't deliverer ads How to use a global array in C#? How to correctly write async method? C# - insert values from file into two arrays Uploading into folder in FTP? Are these methods thread safe? dotnet ef not found in .NET Core 3 HTTP Error 500.30 - ANCM In-Process Start Failure Best way to "push" into C# array

Examples related to html

Embed ruby within URL : Middleman Blog Please help me convert this script to a simple image slider Generating a list of pages (not posts) without the index file Why there is this "clear" class before footer? Is it possible to change the content HTML5 alert messages? Getting all files in directory with ajax DevTools failed to load SourceMap: Could not load content for chrome-extension How to set width of mat-table column in angular? How to open a link in new tab using angular? ERROR Error: Uncaught (in promise), Cannot match any routes. URL Segment

Examples related to html-parsing

PHP: HTML: send HTML select option attribute in POST Read a HTML file into a string variable in memory Parsing HTML using Python Parse an HTML string with JS HTML Text with tags to formatted text in an Excel cell How do I parse a HTML page with Node.js Regex select all text between tags How to extract string following a pattern with grep, regex or perl How to strip HTML tags from string in JavaScript? How do you parse and process HTML/XML in PHP?

Examples related to html-agility-pack

Html Agility Pack get all elements by class How to use HTML Agility pack HTML Agility pack - parsing tables