I also have the situation that I have a set of documents to be crawled. I start with an initial "seed" document which should be processed, that document contains links to other documents which should also be processed, and so on.
In my main program, I just want to write something like the following, where Crawler
controls a bunch of threads.
Crawler c = new Crawler();
c.schedule(seedDocument);
c.waitUntilCompletion()
The same situation would happen if I wanted to navigate a tree; i would pop in the root node, the processor for each node would add children to the queue as necessary, and a bunch of threads would process all the nodes in the tree, until there were no more.
I couldn't find anything in the JVM which I thought was a bit surprising. So I wrote a class ThreadPool
which one can either use directly or subclass to add methods suitable for the domain, e.g. schedule(Document)
. Hope it helps!