[java] How to read all rows from huge table?

I have a problem with processing all rows from database (PostgreSQL). I get an error: org.postgresql.util.PSQLException: Ran out of memory retrieving query results. I think that I need to read all rows in small pieces, but it doesn't work - it reads only 100 rows (code below). How to do that?

    int i = 0;      
    Statement s = connection.createStatement();
    s.setMaxRows(100); // bacause of: org.postgresql.util.PSQLException: Ran out of memory retrieving query results.
    ResultSet rs = s.executeQuery("select * from " + tabName);      
    for (;;) {
        while (rs.next()) {
            i++;
            // do something...
        }
        if ((s.getMoreResults() == false) && (s.getUpdateCount() == -1)) {
            break;
        }           
    }

This question is related to java postgresql jdbc

The answer is


I think your question is similar to this thread: JDBC Pagination which contains solutions for your need.

In particular, for PostgreSQL, you can use the LIMIT and OFFSET keywords in your request: http://www.petefreitag.com/item/451.cfm

PS: In Java code, I suggest you to use PreparedStatement instead of simple Statements: http://download.oracle.com/javase/tutorial/jdbc/basics/prepared.html


At lest in my case the problem was on the client that tries to fetch the results.

Wanted to get a .csv with ALL the results.

I found the solution by using

psql -U postgres -d dbname  -c "COPY (SELECT * FROM T) TO STDOUT WITH DELIMITER ','"

(where dbname the name of the db...) and redirecting to a file.


So it turns out that the crux of the problem is that by default, Postgres starts in "autoCommit" mode, and also it needs/uses cursors to be able to "page" through data (ex: read the first 10K results, then the next, then the next), however cursors can only exist within a transaction. So the default is to read all rows, always, into RAM, and then allow your program to start processing "the first result row, then the second" after it has all arrived, for two reasons, it's not in a transaction (so cursors don't work), and also a fetch size hasn't been set.

So how the psql command line tool achieves batched response (its FETCH_COUNT setting) for queries, is to "wrap" its select queries within a short-term transaction (if a transaction isn't yet open), so that cursors can work. You can do something like that also with JDBC:

  static void readLargeQueryInChunksJdbcWay(Connection conn, String originalQuery, int fetchCount, ConsumerWithException<ResultSet, SQLException> consumer) throws SQLException {
    boolean originalAutoCommit = conn.getAutoCommit();
    if (originalAutoCommit) {
      conn.setAutoCommit(false); // start temp transaction
    }
    try (Statement statement = conn.createStatement()) {
      statement.setFetchSize(fetchCount);
      ResultSet rs = statement.executeQuery(originalQuery);
      while (rs.next()) {
        consumer.accept(rs); // or just do you work here
      }
    } finally {
      if (originalAutoCommit) {
        conn.setAutoCommit(true); // reset it, also ends (commits) temp transaction
      }
    }
  }
  @FunctionalInterface
  public interface ConsumerWithException<T, E extends Exception> {
    void accept(T t) throws E;
  }

This gives the benefit of requiring less RAM, and, in my results, seemed to run overall faster, even if you don't need to save the RAM. Weird. It also gives the benefit that your processing of the first row "starts faster" (since it process it a page at a time).

And here's how to do it the "raw postgres cursor" way, along with full demo code, though in my experiments it seemed the JDBC way, above, was slightly faster for whatever reason.

Another option would be to have autoCommit mode off, everywhere, though you still have to always manually specify a fetchSize for each new Statement (or you can set a default fetch size in the URL string).


The short version is, call stmt.setFetchSize(50); and conn.setAutoCommit(false); to avoid reading the entire ResultSet into memory.

Here's what the docs say:

Getting results based on a cursor

By default the driver collects all the results for the query at once. This can be inconvenient for large data sets so the JDBC driver provides a means of basing a ResultSet on a database cursor and only fetching a small number of rows.

A small number of rows are cached on the client side of the connection and when exhausted the next block of rows is retrieved by repositioning the cursor.

Note:

  • Cursor based ResultSets cannot be used in all situations. There a number of restrictions which will make the driver silently fall back to fetching the whole ResultSet at once.

  • The connection to the server must be using the V3 protocol. This is the default for (and is only supported by) server versions 7.4 and later.-

  • The Connection must not be in autocommit mode. The backend closes cursors at the end of transactions, so in autocommit mode the backend will have closed the cursor before anything can be fetched from it.-

  • The Statement must be created with a ResultSet type of ResultSet.TYPE_FORWARD_ONLY. This is the default, so no code will need to be rewritten to take advantage of this, but it also means that you cannot scroll backwards or otherwise jump around in the ResultSet.-

  • The query given must be a single statement, not multiple statements strung together with semicolons.

Example 5.2. Setting fetch size to turn cursors on and off.

Changing code to cursor mode is as simple as setting the fetch size of the Statement to the appropriate size. Setting the fetch size back to 0 will cause all rows to be cached (the default behaviour).

// make sure autocommit is off
conn.setAutoCommit(false);
Statement st = conn.createStatement();

// Turn use of the cursor on.
st.setFetchSize(50);
ResultSet rs = st.executeQuery("SELECT * FROM mytable");
while (rs.next()) {
   System.out.print("a row was returned.");
}
rs.close();

// Turn the cursor off.
st.setFetchSize(0);
rs = st.executeQuery("SELECT * FROM mytable");
while (rs.next()) {
   System.out.print("many rows were returned.");
}
rs.close();

// Close the statement.
st.close();


I did it like below. Not the best way i think, but it works :)

    Connection c = DriverManager.getConnection("jdbc:postgresql://....");
    PreparedStatement s = c.prepareStatement("select * from " + tabName + " where id > ? order by id");
    s.setMaxRows(100);
    int lastId = 0;
    for (;;) {
        s.setInt(1, lastId);
        ResultSet rs = s.executeQuery();

        int lastIdBefore = lastId;
        while (rs.next()) {
            lastId = Integer.parseInt(rs.getObject(1).toString());
            // ...
        }

        if (lastIdBefore == lastId) {
            break;
        }
    }

Examples related to java

Under what circumstances can I call findViewById with an Options Menu / Action Bar item? How much should a function trust another function How to implement a simple scenario the OO way Two constructors How do I get some variable from another class in Java? this in equals method How to split a string in two and store it in a field How to do perspective fixing? String index out of range: 4 My eclipse won't open, i download the bundle pack it keeps saying error log

Examples related to postgresql

Subtracting 1 day from a timestamp date pgadmin4 : postgresql application server could not be contacted. Psql could not connect to server: No such file or directory, 5432 error? How to persist data in a dockerized postgres database using volumes input file appears to be a text format dump. Please use psql Postgres: check if array field contains value? Add timestamp column with default NOW() for new rows only Can't connect to Postgresql on port 5432 How to insert current datetime in postgresql insert query Connecting to Postgresql in a docker container from outside

Examples related to jdbc

Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver class is `com.mysql.cj.jdbc.Driver' Hibernate Error executing DDL via JDBC Statement Unable to create requested service [org.hibernate.engine.jdbc.env.spi.JdbcEnvironment] MySQL JDBC Driver 5.1.33 - Time Zone Issue Spring-Boot: How do I set JDBC pool properties like maximum number of connections? Where can I download mysql jdbc jar from? Print the data in ResultSet along with column names How to set up datasource with Spring for HikariCP? java.lang.ClassNotFoundException: sun.jdbc.odbc.JdbcOdbcDriver Exception occurring. Why? java.sql.SQLException: No suitable driver found for jdbc:mysql://localhost:3306/dbname