Html Agility Pack Get All Elements by Class
Html Agility Pack is a power tool for parsing through document source. I had a need where I needed to parse a document using html agility pack to get all elements by class name. It really is a simple function with Html Agility Pack but getting the syntax right was the difficult part for me.
Here is my use case:
I need to select all elements that have the class float
on them. I started with this query which was working for just div
tags.
var findclasses = _doc.DocumentNode.Descendants("div").Where(d =>
d.Attributes.Contains("class") && d.Attributes["class"].Contains("float")
);
What this does is it takes your _doc
and finds all divs within where they have an attribute names class
and then goes one step farther to ensure that class
contains float
Whew what a mouth full. Lets look at what an example node it would select.
<div class="className float anotherclassName">
</div>
So now, how do we get ALL ELEMENTS
in the doc that contain the same class of float. If we take a look back at our HTML Agility Pack query there is one small change we can make to the .Descendants
portion that will return all elements by class. This may seem simple, but took quite awhile to come to, if you simply leave .Descendants
empty, it will return all elements. Look below:
var findclasses = _doc.DocumentNode.Descendants().Where(d =>
d.Attributes.Contains("class") && d.Attributes["class"].Contains("float")
);
The query above will return ALL ELEMENTS that include a class with the name of float.
Documented based off my question on stackoverflow here: Html Agility Pack get all elements by class
Hi Adam
I just wanted to thank you for this helpful article!
Hi Adam,
Thanks for the helpful article.
Wanted to suggest one edit
it should be
&& d.Attributes[“class”].Value.Contains(“float”)
instead of
&& d.Attributes[“class”].Contains(“float”)
Please correct me if I misunderstood.
Thanks,
Rahul Muley
Hi,
and than’k for your example-it is helpful for understanding…
just-I am using c# and this syntax is not allowed.
I have a html file and am looking to find all tags(links)which have the name “lifestyle/”…in them.
IEnumerable linkAtributes =doc.DocumentNode.Descendants(“a”).Where(d =>
d.Attributes .Contains(“href”) && d.Attributes[“href”].Contains(“lifestyle/”)
-this kode does not work for me-it’s Causing an error.
I’ll be happy for some help,
thank you!!.
Pingback: Html Agility Pack get all elements by class - PhotoLens