[java] Read url to string in few lines of java code

I'm trying to find Java's equivalent to Groovy's:

String content = "http://www.google.com".toURL().getText();

I want to read content from a URL into string. I don't want to pollute my code with buffered streams and loops for such a simple task. I looked into apache's HttpClient but I also don't see a one or two line implementation.

This question is related to java http url

The answer is


Or just use Apache Commons IOUtils.toString(URL url), or the variant that also accepts an encoding parameter.


URL to String in pure Java

Example call

 String str = getStringFromUrl("YourUrl");

Implementation

You can use the method described in this answer, on How to read URL to an InputStream and combine it with this answer on How to read InputStream to String.

The outcome will be something like

public String getStringFromUrl(URL url) throws IOException {
        return inputStreamToString(urlToInputStream(url,null));
}

public String inputStreamToString(InputStream inputStream) throws IOException {
    try(ByteArrayOutputStream result = new ByteArrayOutputStream()) {
        byte[] buffer = new byte[1024];
        int length;
        while ((length = inputStream.read(buffer)) != -1) {
            result.write(buffer, 0, length);
        }

        return result.toString(UTF_8);
    }
}

private InputStream urlToInputStream(URL url, Map<String, String> args) {
    HttpURLConnection con = null;
    InputStream inputStream = null;
    try {
        con = (HttpURLConnection) url.openConnection();
        con.setConnectTimeout(15000);
        con.setReadTimeout(15000);
        if (args != null) {
            for (Entry<String, String> e : args.entrySet()) {
                con.setRequestProperty(e.getKey(), e.getValue());
            }
        }
        con.connect();
        int responseCode = con.getResponseCode();
        /* By default the connection will follow redirects. The following
         * block is only entered if the implementation of HttpURLConnection
         * does not perform the redirect. The exact behavior depends to 
         * the actual implementation (e.g. sun.net).
         * !!! Attention: This block allows the connection to 
         * switch protocols (e.g. HTTP to HTTPS), which is <b>not</b> 
         * default behavior. See: https://stackoverflow.com/questions/1884230 
         * for more info!!!
         */
        if (responseCode < 400 && responseCode > 299) {
            String redirectUrl = con.getHeaderField("Location");
            try {
                URL newUrl = new URL(redirectUrl);
                return urlToInputStream(newUrl, args);
            } catch (MalformedURLException e) {
                URL newUrl = new URL(url.getProtocol() + "://" + url.getHost() + redirectUrl);
                return urlToInputStream(newUrl, args);
            }
        }
        /*!!!!!*/

        inputStream = con.getInputStream();
        return inputStream;
    } catch (Exception e) {
        throw new RuntimeException(e);
    }
}

Pros

  • It is pure java

  • It can be easily enhanced by adding different headers (instead of passing a null object, like the example above does), authentication, etc.

  • Handling of protocol switches is supported


Additional example using Guava:

URL xmlData = ...
String data = Resources.toString(xmlData, Charsets.UTF_8);

Here's Jeanne's lovely answer, but wrapped in a tidy function for muppets like me:

private static String getUrl(String aUrl) throws MalformedURLException, IOException
{
    String urlData = "";
    URL urlObj = new URL(aUrl);
    URLConnection conn = urlObj.openConnection();
    try (BufferedReader reader = new BufferedReader(new InputStreamReader(conn.getInputStream(), StandardCharsets.UTF_8))) 
    {
        urlData = reader.lines().collect(Collectors.joining("\n"));
    }
    return urlData;
}

Now that some time has passed since the original answer was accepted, there's a better approach:

String out = new Scanner(new URL("http://www.google.com").openStream(), "UTF-8").useDelimiter("\\A").next();

If you want a slightly fuller implementation, which is not a single line, do this:

public static String readStringFromURL(String requestURL) throws IOException
{
    try (Scanner scanner = new Scanner(new URL(requestURL).openStream(),
            StandardCharsets.UTF_8.toString()))
    {
        scanner.useDelimiter("\\A");
        return scanner.hasNext() ? scanner.next() : "";
    }
}

The following works with Java 7/8, secure urls, and shows how to add a cookie to your request as well. Note this is mostly a direct copy of this other great answer on this page, but added the cookie example, and clarification in that it works with secure urls as well ;-)

If you need to connect to a server with an invalid certificate or self signed certificate, this will throw security errors unless you import the certificate. If you need this functionality, you could consider the approach detailed in this answer to this related question on StackOverflow.

Example

String result = getUrlAsString("https://www.google.com");
System.out.println(result);

outputs

<!doctype html><html itemscope="" .... etc

Code

import java.net.URL;
import java.net.URLConnection;
import java.io.BufferedReader;
import java.io.InputStreamReader;

public static String getUrlAsString(String url)
{
    try
    {
        URL urlObj = new URL(url);
        URLConnection con = urlObj.openConnection();

        con.setDoOutput(true); // we want the response 
        con.setRequestProperty("Cookie", "myCookie=test123");
        con.connect();

        BufferedReader in = new BufferedReader(new InputStreamReader(con.getInputStream()));

        StringBuilder response = new StringBuilder();
        String inputLine;

        String newLine = System.getProperty("line.separator");
        while ((inputLine = in.readLine()) != null)
        {
            response.append(inputLine + newLine);
        }

        in.close();

        return response.toString();
    }
    catch (Exception e)
    {
        throw new RuntimeException(e);
    }
}

This answer refers to an older version of Java. You may want to look at ccleve's answer.


Here is the traditional way to do this:

import java.net.*;
import java.io.*;

public class URLConnectionReader {
    public static String getText(String url) throws Exception {
        URL website = new URL(url);
        URLConnection connection = website.openConnection();
        BufferedReader in = new BufferedReader(
                                new InputStreamReader(
                                    connection.getInputStream()));

        StringBuilder response = new StringBuilder();
        String inputLine;

        while ((inputLine = in.readLine()) != null) 
            response.append(inputLine);

        in.close();

        return response.toString();
    }

    public static void main(String[] args) throws Exception {
        String content = URLConnectionReader.getText(args[0]);
        System.out.println(content);
    }
}

As @extraneon has suggested, ioutils allows you to do this in a very eloquent way that's still in the Java spirit:

 InputStream in = new URL( "http://jakarta.apache.org" ).openStream();

 try {
   System.out.println( IOUtils.toString( in ) );
 } finally {
   IOUtils.closeQuietly(in);
 }

There's an even better way as of Java 9:

URL u = new URL("http://www.example.com/");
try (InputStream in = u.openStream()) {
    return new String(in.readAllBytes(), StandardCharsets.UTF_8);
}

Like the original groovy example, this assumes that the content is UTF-8 encoded. (If you need something more clever than that, you need to create a URLConnection and use it to figure out the encoding.)


If you have the input stream (see Joe's answer) also consider ioutils.toString( inputstream ).

http://commons.apache.org/io/api-1.4/org/apache/commons/io/IOUtils.html#toString(java.io.InputStream)


Java 11+:

URI uri = URI.create("http://www.google.com");
HttpRequest request = HttpRequest.newBuilder(uri).build();
String content = HttpClient.newHttpClient().send(request, BodyHandlers.ofString()).body();

If your url is of type java.net.URL, then you can just convert it to uri and then to string

url.toURI().toString()

Now that more time has passed, here's a way to do it in Java 8:

URLConnection conn = url.openConnection();
try (BufferedReader reader = new BufferedReader(new InputStreamReader(conn.getInputStream(), StandardCharsets.UTF_8))) {
    pageText = reader.lines().collect(Collectors.joining("\n"));
}

Examples related to java

Under what circumstances can I call findViewById with an Options Menu / Action Bar item? How much should a function trust another function How to implement a simple scenario the OO way Two constructors How do I get some variable from another class in Java? this in equals method How to split a string in two and store it in a field How to do perspective fixing? String index out of range: 4 My eclipse won't open, i download the bundle pack it keeps saying error log

Examples related to http

Access blocked by CORS policy: Response to preflight request doesn't pass access control check Axios Delete request with body and headers? Read response headers from API response - Angular 5 + TypeScript Android 8: Cleartext HTTP traffic not permitted Angular 4 HttpClient Query Parameters Load json from local file with http.get() in angular 2 Angular 2: How to access an HTTP response body? What is HTTP "Host" header? Golang read request body Angular 2 - Checking for server errors from subscribe

Examples related to url

What is the difference between URL parameters and query strings? Allow Access-Control-Allow-Origin header using HTML5 fetch API File URL "Not allowed to load local resource" in the Internet Browser Slack URL to open a channel from browser Getting absolute URLs using ASP.NET Core How do I load an HTTP URL with App Transport Security enabled in iOS 9? Adding form action in html in laravel React-router urls don't work when refreshing or writing manually URL for public Amazon S3 bucket How can I append a query parameter to an existing URL?