Reading entire html file to String

Question

Are there better ways to read an entire html file to a single string variable than       String content           try           BufferedReader in   new BufferedReader new FileReader  mypage html             String str          while   str   in readLine       null                content   str                    in close          catch  IOException e

User · Answer

As Jean mentioned  using a StringBuilder instead of    would be better  But if you re looking for something simpler  Guava  IOUtils  and Jsoup are all good options   Example with Guava   String content   Files asCharSource new File   path to mypage html    StandardCharsets UTF 8  read      Example with IOUtils   InputStream in   new URL   path to mypage html   openStream    String content   try      content   IOUtils toString in  StandardCharsets UTF 8      finally      IOUtils closeQuietly in        Example with Jsoup   String content   Jsoup parse new File   path to mypage html     UTF-8   toString      or  String content   Jsoup parse new File   path to mypage html     UTF-8   outerHtml      NOTES      Files readLines   and Files toString     These are now deprecated as of Guava release version 22 0  May 22  2017    Files asCharSource   should be used instead as seen in the example above   version 22 0 release diffs      IOUtils toString InputStream  and Charsets UTF 8   Deprecated as of Apache Commons-IO version 2 5  May 6  2016   IOUtils toString should now be passed the InputStream and the Charset as seen in the example above  Java 7 s StandardCharsets should be used instead of Charsets as seen in the example above   deprecated Charsets UTF 8

User · Answer

You can use JSoup  It s a very strong HTML parser for java

User · Answer

I prefers using Guava    import com google common base Charsets  import com google common io Files  File file   new File   path to file   Charsets UTF 8   String content   Files toString file

User · Answer

Here s a solution to retrieve  the html of a webpage using only standard java libraries   import java io    import java net     String urlToRead    https   google com   URL url     The URL to read HttpURLConnection conn     The actual connection to the web page BufferedReader rd     Used to read results from the web page String line     An individual line of the web page HTML String result          A long string containing all the HTML try    url   new URL urlToRead    conn    HttpURLConnection  url openConnection     conn setRequestMethod  GET     rd   new BufferedReader new InputStreamReader conn getInputStream       while   line   rd readLine       null      result    line      rd close      catch  Exception e     e printStackTrace       System out println result       SRC

User · Answer

There s the IOUtils toString     utility from Apache Commons    If you re using Guava there s also Files readLines     and Files toString

User · Answer

You should use a StringBuilder   StringBuilder contentBuilder   new StringBuilder    try       BufferedReader in   new BufferedReader new FileReader  mypage html         String str      while   str   in readLine       null            contentBuilder append str             in close      catch  IOException e      String content   contentBuilder toString

User · Answer

For string operations use StringBuilder or StringBuffer classes for accumulating string data blocks  Do not use    operations for string objects  String class is immutable and you will produce a large amount of string objects upon runtime and it will affect on performance   Use  append   method of StringBuilder StringBuffer class instance instead

[java] Reading entire html file to String?

Examples related to java

Examples related to file-io