Html Agility Pack get all elements by class

Question

I am taking a stab at html agility pack and having trouble finding the right way to go about this   For example   var findclasses    doc DocumentNode Descendants  div   Where d   gt  d Attributes Contains  class       However  obviously you can add classes to a lot more then divs so I tried this    var allLinksWithDivAndClass    doc DocumentNode SelectNodes       class   float         But that doesn t handle the cases where you add multiple classes and  float  is just one of them like this    class  className float anotherclassName    Is there a way to handle all of this   I basically want to select all nodes that have a class   and contains float     Answer has been documented on my blog with a full explanation at  Html Agility Pack Get All Elements by Class

User · Answer

You can solve your issue by using the  contains  function within your Xpath query  as below   var allElementsWithClassFloat        doc DocumentNode SelectNodes      contains  class  float        To reuse this in a function do something similar to the following    string classToFind    float       var allElementsWithClassFloat        doc DocumentNode SelectNodes string Format      contains  class   0       classToFind

User · Answer

I used this extension method a lot in my project  Hope it will help one of you guys   public static bool HasClass this HtmlNode node  params string   classValueArray                var classValue   node GetAttributeValue  class                var classValues   classValue Split               return classValueArray All c   gt  classValues Contains c

User · Answer

public static List lt HtmlNode gt  GetTagsWithClass string html List lt string gt   class                   LoadHtml html                      var result   htmlDocument DocumentNode Descendants                Where x   gt x Attributes Contains  class    amp  amp   class Contains x Attributes  class   Value   ToList                      return result

User · Answer

You can use the following script   var findclasses    doc DocumentNode Descendants  div   Where d   gt       d Attributes Contains  class    amp  amp  d Attributes  class   Value Contains  float

User · Answer

Updated 2018-03-17   The problem   The problem  as you ve spotted  is that String Contains does not perform a word-boundary check  so Contains  float   will return true for both  foo float bar   correct  and  unfloating   which is incorrect    The solution is to ensure that  float   or whatever your desired class-name is  appears alongside a word-boundary at both ends  A word-boundary is either the start  or end  of a string  or line   whitespace  certain punctuation  etc  In most regular-expressions this is  b  So the regex you want is simply   bfloat b   A downside to using a Regex instance is that they can be slow to run if you don t use the  Compiled option - and they can be slow to compile  So you should cache the regex instance  This is more difficult if the class-name you re looking for changes at runtime   Alternatively you can search a string for words by word-boundaries without using a regex by implementing the regex as a C  string-processing function  being careful not to cause any new string or other object allocation  e g  not using String Split    Approach 1  Using a regular-expression   Suppose you just want to look for elements with a single  design-time specified class-name   class Program        private static readonly Regex  classNameRegex   new Regex     bfloat b   RegexOptions Compiled         private static IEnumerable lt HtmlNode gt  GetFloatElements HtmlDocument doc            return doc              Descendants                Where  n   gt  n NodeType    NodeType Element                Where  e   gt  e Name     div   amp  amp   classNameRegex IsMatch  e GetAttributeValue  class                      If you need to choose a single class-name at runtime then you can build a regex   private static IEnumerable lt HtmlNode gt  GetElementsWithClass HtmlDocument doc  String className         Regex regex   new Regex     b    Regex Escape  className        b   RegexOptions Compiled         return doc          Descendants            Where  n   gt  n NodeType    NodeType Element            Where  e   gt  e Name     div   amp  amp  regex IsMatch  e GetAttributeValue  class                If you have multiple class-names and you want to match all of them  you could create an array of Regex objects and ensure they re all matching  or combine them into a single Regex using lookarounds  but this results in horrendously complicated expressions - so using a Regex   is probably better   using System Linq   private static IEnumerable lt HtmlNode gt  GetElementsWithClass HtmlDocument doc  String   classNames         Regex   exprs   new Regex  classNames Length        for  Int32 i   0  i  lt  exprs Length  i               exprs i    new Regex     b    Regex Escape  classNames i         b   RegexOptions Compiled               return doc          Descendants            Where  n   gt  n NodeType    NodeType Element            Where  e   gt              e Name     div   amp  amp              exprs All  r   gt                  r IsMatch  e GetAttributeValue  class                                      Approach 2  Using non-regex string matching   The advantage of using a custom C  method to do string matching instead of a regex is hypothetically faster performance and reduced memory usage  though Regex may be faster in some circumstances - always profile your code first  kids    This method below  CheapClassListContains provides a fast word-boundary-checking string matching function that can be used the same way as regex IsMatch   private static IEnumerable lt HtmlNode gt  GetElementsWithClass HtmlDocument doc  String className         return doc          Descendants            Where  n   gt  n NodeType    NodeType Element            Where  e   gt              e Name     div   amp  amp              CheapClassListContains                  e GetAttributeValue  class                        className                  StringComparison Ordinal                                  lt summary gt Performs optionally-whitespace-padded string search without new string allocations  lt  summary gt       lt remarks gt A regex might also work  but constructing a new regex every time this method is called would be expensive  lt  remarks gt  private static Boolean CheapClassListContains String haystack  String needle  StringComparison comparison        if  String Equals  haystack  needle  comparison     return true      Int32 idx   0      while  idx   needle Length  lt   haystack Length                 idx   haystack IndexOf  needle  idx  comparison            if  idx    -1   return false           Int32 end   idx   needle Length              Needle must be enclosed in whitespace or be at the start end of string         Boolean validStart   idx    0                  Char IsWhiteSpace  haystack idx - 1             Boolean validEnd     end    haystack Length    Char IsWhiteSpace  haystack end             if  validStart  amp  amp  validEnd   return true           idx              return false      Approach 3  Using a CSS Selector library   HtmlAgilityPack is somewhat stagnated doesn t support  querySelector and  querySelectorAll  but there are third-party libraries that extend HtmlAgilityPack with it  namely Fizzler and CssSelectors  Both Fizzler and CssSelectors implement QuerySelectorAll  so you can use it like so   private static IEnumerable lt HtmlNode gt  GetDivElementsWithFloatClass HtmlDocument doc         return doc QuerySelectorAll   div float         With runtime-defined classes   private static IEnumerable lt HtmlNode gt  GetDivElementsWithClasses HtmlDocument doc  IEnumerable lt String gt  classNames         String selector    div     String Join       classNames         return doc QuerySelectorAll  selector

[c#] Html Agility Pack get all elements by class

Examples related to c#

Examples related to html

Examples related to html-agility-pack