This sounds like a good task description of W3C XPath technology. It's easy to express queries like "return all href
attributes in img
tags that are nested in <foo><bar><baz> elements
." Not being a PHP buff, I can't tell you in what form XPath may be available. If you can call an external program to process the HTML file you should be able to use a command line version of XPath.
For a quick intro, see http://en.wikipedia.org/wiki/XPath.