Wednesday, January 28, 2009

Android HTML parsing

A few weeks ago, I spent an extended period of time trying to find an html parser that would work reasonably well in the Android mobile operating system. Long story short: fail. This site provides a helpful list of open source html parsers for java, but alas each and every one of them was too slow to be usable in android. I wasn't even trying to parse a large page, only ~100K, but it was still taking ~ 1 minute to parse. In the end, I gave up. Instead of parsing the site and redisplaying some of the info, I just manually skipped down to the content part of the page and fed the content as a String to the built-in webkit browser. Fortunately, Android makes it extremely straightforward to embed the browser in an app.

3 comments:

  1. how have you got them to work on android ? I tried compiling a few but couldnt get it to work

    ReplyDelete
  2. I would recommend Jsoup. It an excellent html parser that takes care of a lot of issues that comes along with html parsing.

    http://theandroidcoder.com/utilities/android-parsing-html-with-jsoup/

    ReplyDelete