On Android, I have a WebView
that is displaying a page.
How do I get the page source without requesting the page again?
It seems WebView
should have some kind of getPageSource()
method that returns a string, but alas it does not.
If I enable JavaScript, what is the appropriate JavaScript to put in this call to get the contents?
webview.loadUrl("javascript:(function() { " +
"document.getElementsByTagName('body')[0].style.color = 'red'; " +
"})()");
This question is related to
android
android-webview
This is an answer based on jluckyiv's, but I think it is better and simpler to change Javascript as follows.
browser.loadUrl("javascript:HTMLOUT.processHTML(document.documentElement.outerHTML);");
If you are working on kitkat and above, you can use the chrome remote debugging tools to find all the requests and responses going in and out of your webview and also the the html source code of the page viewed.
I managed to get this working using the code from @jluckyiv's answer but I had to add in @JavascriptInterface annotation to the processHTML method in the MyJavaScriptInterface.
class MyJavaScriptInterface
{
@SuppressWarnings("unused")
@JavascriptInterface
public void processHTML(String html)
{
// process the html as needed by the app
}
}
You also need to annotate the method with @JavascriptInterface if your targetSdkVersion is >= 17 - because there is new security requirements in SDK 17, i.e. all javascript methods must be annotated with @JavascriptInterface. Otherwise you will see error like: Uncaught TypeError: Object [object Object] has no method 'processHTML' at null:1
Have you considered fetching the HTML separately, and then loading it into a webview?
String fetchContent(WebView view, String url) throws IOException {
HttpClient httpClient = new DefaultHttpClient();
HttpGet get = new HttpGet(url);
HttpResponse response = httpClient.execute(get);
StatusLine statusLine = response.getStatusLine();
int statusCode = statusLine.getStatusCode();
HttpEntity entity = response.getEntity();
String html = EntityUtils.toString(entity); // assume html for simplicity
view.loadDataWithBaseURL(url, html, "text/html", "utf-8", url); // todo: get mime, charset from entity
if (statusCode != 200) {
// handle fail
}
return html;
}
Per issue 12987, Blundell's answer crashes (at least on my 2.3 VM). Instead, I intercept a call to console.log with a special prefix:
// intercept calls to console.log
web.setWebChromeClient(new WebChromeClient() {
public boolean onConsoleMessage(ConsoleMessage cmsg)
{
// check secret prefix
if (cmsg.message().startsWith("MAGIC"))
{
String msg = cmsg.message().substring(5); // strip off prefix
/* process HTML */
return true;
}
return false;
}
});
// inject the JavaScript on page load
web.setWebViewClient(new WebViewClient() {
public void onPageFinished(WebView view, String address)
{
// have the page spill its guts, with a secret prefix
view.loadUrl("javascript:console.log('MAGIC'+document.getElementsByTagName('html')[0].innerHTML);");
}
});
web.loadUrl("http://www.google.com");
Source: Stackoverflow.com