Fast way to discover the row count of a table in PostgreSQL

Question

I need to know the number of rows in a table to calculate a percentage. If the total count is greater than some predefined constant, I will use the constant value. Otherwise, I will use the actual number of rows.

I can use SELECT count(*) FROM table. But if my constant value is 500,000 and I have 5,000,000,000 rows in my table, counting all rows will waste a lot of time.

Is it possible to stop counting as soon as my constant value is surpassed?

I need the exact number of rows only as long as it's below the given limit. Otherwise, if the count is above the limit, I use the limit value instead and want the answer as fast as possible.

Something like this:

SELECT text,count(*), percentual_calculus()  
FROM token  
GROUP BY text  
ORDER BY count DESC;

User · Answer

I did this once in a postgres app by running:

EXPLAIN SELECT * FROM foo;

Then examining the output with a regex, or similar logic. For a simple SELECT *, the first line of output should look something like this:

Seq Scan on uids  (cost=0.00..1.21 rows=8 width=75)

You can use the rows=(\d+) value as a rough estimate of the number of rows that would be returned, then only do the actual SELECT COUNT(*) if the estimate is, say, less than 1.5x your threshold (or whatever number you deem makes sense for your application).

Depending on the complexity of your query, this number may become less and less accurate. In fact, in my application, as we added joins and complex conditions, it became so inaccurate it was completely worthless, even to know how within a power of 100 how many rows we'd have returned, so we had to abandon that strategy.

But if your query is simple enough that Pg can predict within some reasonable margin of error how many rows it will return, it may work for you.

User · Answer

You can get the count by the below query  without   or any column names     select from table name

User · Answer

Counting rows in big tables is known to be slow in PostgreSQL  To get a precise number it has to do a full count of rows due to the nature of MVCC  There is a way to speed this up dramatically if the count does not have to be exact like it seems to be in your case   Instead of getting the exact count  slow with big tables    SELECT count    AS exact count FROM myschema mytable    You get a close estimate like this  extremely fast    SELECT reltuples  bigint AS estimate FROM pg class where relname  mytable     How close the estimate is depends on whether you run ANALYZE enough  It is usually very close  See the PostgreSQL Wiki FAQ  Or the dedicated wiki page for count    performance   Better yet  The article in the PostgreSQL Wiki is was a bit sloppy  It ignored the possibility that there can be multiple tables of the same name in one database - in different schemas  To account for that   SELECT c reltuples  bigint AS estimate FROM   pg class c JOIN   pg namespace n ON n oid   c relnamespace WHERE  c relname    mytable  AND    n nspname    myschema    Or better still  SELECT reltuples  bigint AS estimate FROM   pg class WHERE  oid    myschema mytable   regclass    Faster  simpler  safer  more elegant  See the manual on Object Identifier Types   Use to regclass  myschema mytable   in Postgres 9 4  to avoid exceptions for invalid table names    How to check if a table exists in a given schema     TABLESAMPLE SYSTEM  n  in Postgres 9 5   SELECT 100   count    AS estimate FROM mytable TABLESAMPLE SYSTEM  1     Like  a horse commented  the newly added clause for the SELECT command might be useful if statistics in pg class are not current enough for some reason  For example    No autovacuum running  Immediately after a big INSERT or DELETE  TEMPORARY tables  which are not covered by autovacuum     This only looks at a random n    1 in the example  selection of blocks and counts rows in it  A bigger sample increases the cost and reduces the error  your pick  Accuracy depends on more factors    Distribution of row size  If a given block happens to hold wider than usual rows  the count is lower than usual etc  Dead tuples or a FILLFACTOR occupy space per block  If unevenly distributed across the table  the estimate may be off  General rounding errors    In most cases the estimate from pg class will be faster and more accurate   Answer to actual question     First  I need to know the number of rows in that table  if the total   count is greater than some predefined constant    And whether it             is possible at the moment the count pass my constant value  it will   stop the counting  and not wait to finish the counting to inform the   row count is greater     Yes  You can use a subquery with LIMIT   SELECT count    FROM  SELECT 1 FROM token LIMIT 500000  t    Postgres actually stops counting beyond the given limit  you get an exact and current count for up to n rows  500000 in the example   and n otherwise  Not nearly as fast as the estimate in pg class  though

User · Answer

Reference taken from this Blog    You can use below to query to find row count    Using pg class       SELECT reltuples  bigint AS EstimatedCount     FROM   pg class     WHERE  oid    public TableName   regclass    Using pg stat user tables   SELECT      schemaname      relname      n live tup AS EstimatedCount  FROM pg stat user tables  ORDER BY n live tup DESC

User · Answer

In Oracle  you could use rownum to limit the number of rows returned  I am guessing similar construct exists in other SQLs as well  So  for the example you gave  you could limit the number of rows returned to 500001 and apply a count    then   SELECT  case when cnt  gt  500000 then 500000 else cnt end  myCnt FROM  SELECT count    cnt FROM table WHERE rownum lt  500001

User · Answer

How wide is the text column   With a GROUP BY there s not much you can do to avoid a data scan  at least an index scan    I d recommend    If possible  changing the schema to remove duplication of text data  This way the count will happen on a narrow foreign key field in the  many  table  Alternatively  creating a generated column with a HASH of the text  then GROUP BY the hash column  Again  this is to decrease the workload  scan through a narrow column index    Edit     Your original question did not quite match your edit  I m not sure if you re aware that the COUNT  when used with a GROUP BY  will return the count of items per group and not the count of items in the entire table

User · Answer

For SQL Server  2005 or above  a quick and reliable method is   SELECT SUM  row count  FROM sys dm db partition stats WHERE object id OBJECT ID  MyTableName      AND  index id 0 or index id 1     Details about sys dm db partition stats are explained in MSDN  The query adds rows from all parts of a  possibly  partitioned table   index id 0 is an unordered table  Heap  and index id 1 is an ordered table  clustered index   Even faster  but unreliable  methods are detailed here

[sql] Fast way to discover the row count of a table in PostgreSQL

The answer is

Examples related to sql

Examples related to postgresql

Examples related to count

Examples related to row

Tags