Html Agility Pack is a power tool for parsing through document source. I had a need where I needed to parse a document using html agility pack to get all elements by class name. It really is a simple function with Html Agility Pack but getting the syntax right was the difficult part for me.
Here is my use case:
I need to select all elements that have the class float
on them. I started with this query which was working for just div
tags.
var findclasses = _doc.DocumentNode.Descendants("div").Where(d =>
d.Attributes.Contains("class") && d.Attributes["class"].Contains("float")
);
What this does is it takes your _doc
and finds all divs within where they have an attribute names class
and then goes one step farther to ensure that class
contains float
Whew what a mouth full. Lets look at what an example node it would select.
<div class="className float anotherclassName">
</div>
So now, how do we get ALL ELEMENTS
in the doc that contain the same class of float. If we take a look back at our HTML Agility Pack query there is one small change we can make to the .Descendants
portion that will return all elements by class. This may seem simple, but took quite awhile to come to, if you simply leave .Descendants
empty, it will return all elements. Look below:
var findclasses = _doc.DocumentNode.Descendants().Where(d =>
d.Attributes.Contains("class") && d.Attributes["class"].Contains("float")
);
The query above will return ALL ELEMENTS that include a class with the name of float.
Documented based off my question on stackoverflow here: Html Agility Pack get all elements by class
Recent Comments