[android] Parse HTML in Android

I am trying to parse HTML in android from a webpage, and since the webpage it not well formed, I get SAXException.

Is there a way to parse HTML in Android?

This question is related to android html parsing

The answer is


Have you tried using Html.fromHtml(source)?

I think that class is pretty liberal with respect to source quality (it uses TagSoup internally, which was designed with real-life, bad HTML in mind). It doesn't support all HTML tags though, but it does come with a handler you can implement to react on tags it doesn't understand.


We all know that programming have endless possibilities.There are numbers of solutions available for a single problem so i think all of the above solutions are perfect and may be helpful for someone but for me this one save my day..

So Code goes like this

  private void getWebsite() {
    new Thread(new Runnable() {
      @Override
      public void run() {
        final StringBuilder builder = new StringBuilder();

        try {
          Document doc = Jsoup.connect("http://www.ssaurel.com/blog").get();
          String title = doc.title();
          Elements links = doc.select("a[href]");

          builder.append(title).append("\n");

          for (Element link : links) {
            builder.append("\n").append("Link : ").append(link.attr("href"))
            .append("\n").append("Text : ").append(link.text());
          }
        } catch (IOException e) {
          builder.append("Error : ").append(e.getMessage()).append("\n");
        }

        runOnUiThread(new Runnable() {
          @Override
          public void run() {
            result.setText(builder.toString());
          }
        });
      }
    }).start();
  }

You just have to call the above function in onCreate Method of your MainActivity

I hope this one is also helpful for you guys.

Also read the original blog at Medium


Maybe you can use WebView, but as you can see in the doc WebView doesn't support javascript and other stuff like widgets by default.

http://developer.android.com/reference/android/webkit/WebView.html

I think that you can enable javascript if you need it.


String tmpHtml = "<html>a whole bunch of html stuff</html>";
String htmlTextStr = Html.fromHtml(tmpHtml).toString();

I just encountered this problem. I tried a few things, but settled on using JSoup. The jar is about 132k, which is a bit big, but if you download the source and take out some of the methods you will not be using, then it is not as big.
=> Good thing about it is that it will handle badly formed HTML

Here's a good example from their site.

File input = new File("/tmp/input.html");
Document doc = Jsoup.parse(input, "UTF-8", "http://example.com/");

//http://jsoup.org/cookbook/input/load-document-from-url
//Document doc = Jsoup.connect("http://example.com/").get();

Element content = doc.getElementById("content");
Elements links = content.getElementsByTag("a");
for (Element link : links) {
  String linkHref = link.attr("href");
  String linkText = link.text();
}

Examples related to android

Under what circumstances can I call findViewById with an Options Menu / Action Bar item? How to implement a simple scenario the OO way My eclipse won't open, i download the bundle pack it keeps saying error log getting " (1) no such column: _id10 " error java doesn't run if structure inside of onclick listener Cannot retrieve string(s) from preferences (settings) strange error in my Animation Drawable how to put image in a bundle and pass it to another activity FragmentActivity to Fragment A failure occurred while executing com.android.build.gradle.internal.tasks

Examples related to html

Embed ruby within URL : Middleman Blog Please help me convert this script to a simple image slider Generating a list of pages (not posts) without the index file Why there is this "clear" class before footer? Is it possible to change the content HTML5 alert messages? Getting all files in directory with ajax DevTools failed to load SourceMap: Could not load content for chrome-extension How to set width of mat-table column in angular? How to open a link in new tab using angular? ERROR Error: Uncaught (in promise), Cannot match any routes. URL Segment

Examples related to parsing

Got a NumberFormatException while trying to parse a text file for objects Uncaught SyntaxError: Unexpected end of JSON input at JSON.parse (<anonymous>) Python/Json:Expecting property name enclosed in double quotes Correctly Parsing JSON in Swift 3 How to get response as String using retrofit without using GSON or any other library in android UIButton action in table view cell "Expected BEGIN_OBJECT but was STRING at line 1 column 1" How to convert an XML file to nice pandas dataframe? How to extract multiple JSON objects from one file? How to sum digits of an integer in java?