I had the same problem and I'm having great success with JTidy (http://jtidy.sourceforge.net/index.html)
Example:
Tidy t = new Tidy();
t.setIndentContent(true);
Document d = t.parseDOM(
new ByteArrayInputStream("HTML goes here", null);
OutputStream out = new ByteArrayOutputStream();
t.pprint(d, out);
String html = out.toString();