GROUP BY and COUNT in PostgreSQL

Question

The query   SELECT COUNT    as count all          posts id as post id  FROM posts    INNER JOIN votes ON votes post id   posts id  GROUP BY posts id    Returns n records in Postgresql    count all   post id ----------- ---------  1           6  3           4  3           5  3           1  1           9  1           10  6 rows    I just want to retrieve the number of records returned  6    I used a subquery to achieve what I want  but this doesn t seem optimum   SELECT COUNT    FROM       SELECT COUNT    as count all  posts id as post id      FROM posts      INNER JOIN votes ON votes post id   posts id      GROUP BY posts id   as x    How would I get the number of records in this context right in PostgreSQL

User · Accepted Answer

I think you just need COUNT DISTINCT post id  FROM votes   See  4 2 7  Aggregate Expressions  section in http   www postgresql org docs current static sql-expressions html   EDIT  Corrected my careless mistake per Erwin s comment

User · Answer

Using OVER   and LIMIT 1   SELECT COUNT 1  OVER   FROM posts     INNER JOIN votes ON votes post id   posts id  GROUP BY posts id LIMIT 1

User · Answer

There is also EXISTS   SELECT count    AS post ct FROM   posts p WHERE  EXISTS  SELECT FROM votes v WHERE v post id   p id     In Postgres and with multiple entries on the n-side like you probably have  it s generally faster than count DISTINCT post id     SELECT count DISTINCT p id  AS post ct FROM   posts p JOIN   votes v ON v post id   p id    The more rows per post there are in votes  the bigger the difference in performance  Test with EXPLAIN ANALYZE   count DISTINCT post id  has to read all rows  sort or hash them  and then only consider the first per identical set  EXISTS will only scan votes  or  preferably  an index on post id  until the first match is found   If every post id in votes is guaranteed to be present in the table posts  referential integrity enforced with a foreign key constraint   this short form is equivalent to the longer form   SELECT count DISTINCT post id  AS post ct FROM   votes    May actually be faster than the EXISTS query with no or few entries per post   The query you had works in simpler form  too   SELECT count    AS post ct FROM        SELECT FROM posts      JOIN   votes ON votes post id   posts id      GROUP  BY posts id       sub    Benchmark  To verify my claims I ran a benchmark on my test server with limited resources  All in a separate schema   Test setup  Fake a typical post   vote situation   CREATE SCHEMA y  SET search path   y   CREATE TABLE posts     id   int PRIMARY KEY   post text     INSERT INTO posts SELECT g  repeat chr g 100   32    random    500   int   -- random text FROM   generate series 1 10000  g   DELETE FROM posts WHERE random    gt  0 9   -- create   10   dead tuples  CREATE TABLE votes     vote id serial PRIMARY KEY   post id int REFERENCES posts id    up down bool     INSERT INTO votes  post id  up down  SELECT g    FROM       SELECT   random    21  3   int   1111 AS post id  -- uneven distribution           random    int  bool AS up down    FROM   generate series 1 70000       g JOIN   posts p ON p id   g post id    All of the following queries returned the same result  8093 of 9107 posts had votes   I ran 4 tests with EXPLAIN ANALYZE ant took the best of five on Postgres 9 1 4 with each of the three queries and appended the resulting total runtimes    As is  After       ANALYZE posts  ANALYZE votes   After     CREATE INDEX foo on votes post id    After     VACUUM FULL ANALYZE posts  CLUSTER votes using foo     count        WHERE EXISTS   253 ms 220 ms 85 ms  -- winner  seq scan on posts  index scan on votes  nested loop  85 ms   count DISTINCT x  - long form with join   354 ms 358 ms 373 ms --  index scan on posts  index scan on votes  merge join  330 ms   count DISTINCT x  - short form without join   164 ms 164 ms 164 ms --  always seq scan  142 ms   Best time for original query in question    353 ms   For simplified version    348 ms    wildplasser s query with a CTE uses the same plan as the long form  index scan on posts  index scan on votes  merge join  plus a little overhead for the CTE  Best time    366 ms   Index-only scans in the upcoming PostgreSQL 9 2 can improve the result for each of these queries  most of all for EXISTS   Related  more detailed benchmark for Postgres 9 5  actually retrieving distinct rows  not just counting     Select first row in each GROUP BY group

User · Answer

WITH uniq AS           SELECT DISTINCT posts id as post id         FROM posts         JOIN votes ON votes post id   posts id         -- GROUP BY not needed anymore         -- GROUP BY posts id           SELECT COUNT    FROM uniq

[sql] GROUP BY and COUNT in PostgreSQL

Examples related to sql

Examples related to postgresql

Examples related to count

Examples related to distinct

Examples related to aggregate-functions