How to read all rows from huge table

Question

I have a problem with processing all rows from database  PostgreSQL   I get an error  org postgresql util PSQLException  Ran out of memory retrieving query results  I think that I need to read all rows in small pieces  but it doesn t work - it reads only 100 rows  code below   How to do that       int i   0            Statement s   connection createStatement        s setMaxRows 100      bacause of  org postgresql util PSQLException  Ran out of memory retrieving query results      ResultSet rs   s executeQuery  select   from     tabName             for                while  rs next                  i                   do something                      if   s getMoreResults      false   amp  amp   s getUpdateCount      -1                 break

User · Accepted Answer

Use a CURSOR in PostgreSQL or let the JDBC-driver handle this for you   LIMIT and OFFSET will get slow when handling large datasets

User · Answer

At lest in my case the problem was on the client that tries to fetch the results   Wanted to get a  csv with ALL the results   I found the solution by using   psql -U postgres -d dbname  -c  COPY  SELECT   FROM T  TO STDOUT WITH DELIMITER         where dbname the name of the db     and redirecting to a file

User · Answer

I think your question is similar to this thread  JDBC Pagination which contains solutions for your need   In particular  for PostgreSQL  you can use the LIMIT and OFFSET keywords in your request  http   www petefreitag com item 451 cfm  PS  In Java code  I suggest you to use PreparedStatement instead of simple Statements  http   download oracle com javase tutorial jdbc basics prepared html

User · Answer

The short version is  call stmt setFetchSize 50   and conn setAutoCommit false   to avoid reading the entire ResultSet into memory   Here s what the docs say      Getting results based on a cursor      By default the driver collects all the results for the query at once    This can be inconvenient for large data sets so the JDBC driver   provides a means of basing a ResultSet on a database cursor and only   fetching a small number of rows       A small number of rows are cached on the client side of the connection   and when exhausted the next block of rows is retrieved by   repositioning the cursor       Note          Cursor based ResultSets cannot be    used in all situations  There a    number of restrictions which will    make the driver silently   fall back to    fetching the whole ResultSet at once    The connection to the server must be    using the V3 protocol  This is the    default for  and is only supported    by  server versions   7 4 and later -   The Connection must not be in    autocommit mode  The backend closes    cursors at the end of transactions     so in autocommit mode   the backend    will have closed the cursor before    anything can be   fetched from it -   The Statement must be created with a    ResultSet type of    ResultSet TYPE FORWARD ONLY  This is    the default  so no code will   need to    be rewritten to take advantage of    this  but it also   means that you    cannot scroll backwards or otherwise    jump around   in the ResultSet -   The query given must be a single statement  not multiple statements strung together with semicolons       Example 5 2  Setting fetch size to turn cursors on and off   Changing code to cursor mode is as simple as setting the fetch size of the Statement to the appropriate size  Setting the fetch size back to 0 will cause all rows to be cached  the default behaviour       make sure autocommit is off conn setAutoCommit false   Statement st   conn createStatement        Turn use of the cursor on  st setFetchSize 50   ResultSet rs   st executeQuery  SELECT   FROM mytable    while  rs next         System out print  a row was returned       rs close        Turn the cursor off  st setFetchSize 0   rs   st executeQuery  SELECT   FROM mytable    while  rs next         System out print  many rows were returned       rs close        Close the statement  st close

User · Answer

So it turns out that the crux of the problem is that by default  Postgres starts in  autoCommit  mode  and also it needs uses cursors to be able to  page  through data  ex  read the first 10K results  then the next  then the next   however cursors can only exist within a transaction   So the default is to read all rows  always  into RAM  and then allow your program to start processing  the first result row  then the second  after it has all arrived  for two reasons  it s not in a transaction  so cursors don t work   and also a fetch size hasn t been set   So how the psql command line tool achieves batched response  its FETCH COUNT setting  for queries  is to  wrap  its select queries within a short-term transaction  if a transaction isn t yet open   so that cursors can work   You can do something like that also with JDBC     static void readLargeQueryInChunksJdbcWay Connection conn  String originalQuery  int fetchCount  ConsumerWithException lt ResultSet  SQLException gt  consumer  throws SQLException       boolean originalAutoCommit   conn getAutoCommit        if  originalAutoCommit          conn setAutoCommit false      start temp transaction           try  Statement statement   conn createStatement            statement setFetchSize fetchCount         ResultSet rs   statement executeQuery originalQuery         while  rs next              consumer accept rs      or just do you work here               finally         if  originalAutoCommit            conn setAutoCommit true      reset it  also ends  commits  temp transaction                      FunctionalInterface   public interface ConsumerWithException lt T  E extends Exception gt        void accept T t  throws E        This gives the benefit of requiring less RAM  and  in my results  seemed to run overall faster  even if you don t need to save the RAM   Weird  It also gives the benefit that your processing of the first row  starts faster   since it process it a page at a time    And here s how to do it the  raw postgres cursor  way  along with full demo code  though in my experiments it seemed the JDBC way  above  was slightly faster for whatever reason   Another option would be to have autoCommit mode off  everywhere  though you still have to always manually specify a fetchSize for each new Statement  or you can set a default fetch size in the URL string

User · Answer

I did it like below  Not the best way i think  but it works         Connection c   DriverManager getConnection  jdbc postgresql               PreparedStatement s   c prepareStatement  select   from     tabName     where id  gt    order by id        s setMaxRows 100       int lastId   0      for                s setInt 1  lastId           ResultSet rs   s executeQuery             int lastIdBefore   lastId          while  rs next                  lastId   Integer parseInt rs getObject 1  toString                                           if  lastIdBefore    lastId                break

[java] How to read all rows from huge table?

Examples related to java

Examples related to postgresql

Examples related to jdbc