Select first row in each GROUP BY group

Question

As the title suggests  I d like to select the first row of each set of rows grouped with a GROUP BY   Specifically  if I ve got a purchases table that looks like this   SELECT   FROM purchases    My Output    id   customer   total --- ---------- ------  1   Joe        5  2   Sally      3  3   Joe        2  4   Sally      1   I d like to query for the id of the largest purchase  total  made by each customer  Something like this   SELECT FIRST id   customer  FIRST total  FROM  purchases GROUP BY customer ORDER BY total DESC    Expected Output    FIRST id    customer   FIRST total  ---------- ---------- -------------         1   Joe        5         2   Sally      3

User · Answer

This way it work for me  SELECT article  dealer  price FROM   shop s1 WHERE  price  SELECT MAX s2 price                FROM shop s2               WHERE s1 article   s2 article               GROUP BY s2 article  ORDER BY article   Select highest price on each article

User · Answer

In Postgres you can use array agg like this   SELECT  customer           array agg id ORDER BY total DESC   1           max total  FROM purchases GROUP BY customer   This will give you the id of each customer s largest purchase   Some things to note    array agg is an aggregate function  so it works with GROUP BY  array agg lets you specify an ordering scoped to just itself  so it doesn t constrain the structure of the whole query  There is also syntax for how you sort NULLs  if you need to do something different from the default  Once we build the array  we take the first element   Postgres arrays are 1-indexed  not 0-indexed   You could use array agg in a similar way for your third output column  but max total  is simpler  Unlike DISTINCT ON  using array agg lets you keep your GROUP BY  in case you want that for other reasons

User · Answer

This is common greatest-n-per-group problem  which already has well tested and highly optimized solutions  Personally I prefer the left join solution by Bill Karwin  the original post with lots of other solutions    Note that bunch of solutions to this common problem can surprisingly be found in the one of most official sources  MySQL manual  See Examples of Common Queries    The Rows Holding the Group-wise Maximum of a Certain Column

User · Answer

In SQL Server you can do this   SELECT   FROM   SELECT ROW NUMBER   OVER PARTITION BY customer ORDER BY total DESC  AS StRank    FROM Purchases  n WHERE StRank   1   Explaination Here  Group by is done on the basis of customer and then order it by total then each such group is given serial number as StRank and we are taking out first 1 customer whose StRank is 1

User · Answer

The accepted OMG Ponies   Supported by any database  solution has good speed from my test   Here I provide a same-approach  but more complete and clean any-database solution    Ties are considered  assume desire to get only one row for each customer  even multiple records for max total per customer   and other purchase fields  e g  purchase payment id  will be selected for the real matching rows in the purchase table   Supported by any database   select   from purchase join       select min id  as id from purchase     join           select customer  max total  as total from purchase         group by customer       t1 using  customer  total      group by customer   t2 using  id  order by customer   This query is reasonably fast especially when there is a composite index like  customer  total  on the purchase table   Remark    t1  t2 are subquery alias which could be removed depending on database  Caveat  the using       clause is currently not supported in MS-SQL and Oracle db as of this edit on Jan 2017  You have to expand it yourself to e g  on t2 id   purchase id etc   The USING syntax works in SQLite  MySQL and PostgreSQL

User · Answer

On Oracle 9 2   not 8i  as originally stated   SQL Server 2005   PostgreSQL 8 4   DB2  Firebird 3 0   Teradata  Sybase  Vertica   WITH summary AS       SELECT p id              p customer              p total              ROW NUMBER   OVER PARTITION BY p customer                                   ORDER BY p total DESC  AS rk       FROM PURCHASES p  SELECT s     FROM summary s  WHERE s rk   1   Supported by any database   But you need to add logic to break ties     SELECT MIN x id    -- change to MAX if you want the highest          x customer            x total     FROM PURCHASES x     JOIN  SELECT p customer                   MAX total  AS max total             FROM PURCHASES p         GROUP BY p customer  y ON y customer   x customer                               AND y max total   x total GROUP BY x customer  x total

User · Answer

I use this way  postgresql only   https   wiki postgresql org wiki First last  28aggregate 29  -- Create a function that always returns the first non-NULL item CREATE OR REPLACE FUNCTION public first agg   anyelement  anyelement   RETURNS anyelement LANGUAGE sql IMMUTABLE STRICT AS            SELECT  1       -- And then wrap an aggregate around it CREATE AGGREGATE public first           sfunc      public first agg          basetype   anyelement          stype      anyelement     -- Create a function that always returns the last non-NULL item CREATE OR REPLACE FUNCTION public last agg   anyelement  anyelement   RETURNS anyelement LANGUAGE sql IMMUTABLE STRICT AS            SELECT  2       -- And then wrap an aggregate around it CREATE AGGREGATE public last           sfunc      public last agg          basetype   anyelement          stype      anyelement      Then your example should work almost as is   SELECT FIRST id   customer  FIRST total  FROM  purchases GROUP BY customer ORDER BY FIRST total  DESC    CAVEAT  It ignore s NULL rows    Edit 1 - Use the postgres extension instead  Now I use this way  http   pgxn org dist first last agg   To install on ubuntu 14 04   apt-get install postgresql-server-dev-9 3 git build-essential -y git clone git   github com wulczer first last agg git cd first last app make  amp  amp  sudo make install psql -c  create extension first last agg    It s a postgres extension that gives you first and last functions  apparently faster than the above way     Edit 2 - Ordering and filtering  If you use aggregate functions  like these   you can order the results  without the need to have the data already ordered   http   www postgresql org docs current static sql-expressions html SYNTAX-AGGREGATES   So the equivalent example  with ordering would be something like   SELECT first id order by id   customer  first total order by id    FROM purchases  GROUP BY customer  ORDER BY first total     Of course you can order and filter as you deem fit within the aggregate  it s very powerful syntax

User · Answer

In PostgreSQL this is typically simpler and faster  more performance optimization below   SELECT DISTINCT ON  customer         id  customer  total FROM   purchases ORDER  BY customer  total DESC  id  Or shorter  if not as clear  with ordinal numbers of output columns  SELECT DISTINCT ON  2         id  customer  total FROM   purchases ORDER  BY 2  3 DESC  1   If total can be NULL  won t hurt either way  but you ll want to match existing indexes       ORDER  BY customer  total DESC NULLS LAST  id  Major points DISTINCT ON is a PostgreSQL extension of the standard  where only DISTINCT on the whole SELECT list is defined   List any number of expressions in the DISTINCT ON clause  the combined row value defines duplicates  The manual   Obviously  two rows are considered distinct if they differ in at least one column value  Null values are considered equal in this comparison   Bold emphasis mine  DISTINCT ON can be combined with ORDER BY  Leading expressions in ORDER BY must be in the set of expressions in DISTINCT ON  but you can rearrange order among those freely  Example  You can add additional expressions to ORDER BY to pick a particular row from each group of peers  Or  as the manual puts it   The DISTINCT ON expression s  must match the leftmost ORDER BY expression s   The ORDER BY clause will normally contain additional expression s  that determine the desired precedence of rows within each DISTINCT ON group   I added id as last item to break ties   quot Pick the row with the smallest id from each group sharing the highest total  quot  To order results in a way that disagrees with the sort order determining the first per group  you can nest above query in an outer query with another ORDER BY  Example  If total can be NULL  you most probably want the row with the greatest non-null value  Add NULLS LAST like demonstrated  See   Sort by column ASC  but NULL values first   The SELECT list is not constrained by expressions in DISTINCT ON or ORDER BY in any way   Not needed in the simple case above    You don t have to include any of the expressions in DISTINCT ON or ORDER BY   You can include any other expression in the SELECT list  This is instrumental for replacing much more complex queries with subqueries and aggregate   window functions    I tested with Postgres versions 8 3     13  But the feature has been there at least since version 7 1  so basically always  Index The perfect index for the above query would be a multi-column index spanning all three columns in matching sequence and with matching sort order  CREATE INDEX purchases 3c idx ON purchases  customer  total DESC  id    May be too specialized  But use it if read performance for the particular query is crucial  If you have DESC NULLS LAST in the query  use the same in the index so that sort order matches and the index is applicable  Effectiveness   Performance optimization Weigh cost and benefit before creating tailored indexes for each query  The potential of above index largely depends on data distribution  The index is used because it delivers pre-sorted data  In Postgres 9 2 or later the query can also benefit from an index only scan if the index is smaller than the underlying table  The index has to be scanned in its entirety  though  For few rows per customer  high cardinality in column customer   this is very efficient  Even more so if you need sorted output anyway  The benefit shrinks with a growing number of rows per customer  Ideally  you have enough work mem to process the involved sort step in RAM and not spill to disk  But generally setting work mem too high can have adverse effects  Consider SET LOCAL for exceptionally big queries  Find how much you need with EXPLAIN ANALYZE  Mention of  quot Disk  quot  in the sort step indicates the need for more   Configuration parameter work mem in PostgreSQL on Linux Optimize simple query using ORDER BY date and text  For many rows per customer  low cardinality in column customer   a loose index scan  a k a   quot skip scan quot   would be  much  more efficient  but that s not implemented up to Postgres 13   An implementation for index-only scans is in development for Postgres 14  See here and here   For now  there are faster query techniques to substitute for this  In particular if you have a separate table holding unique customers  which is the typical use case  But also if you don t   Optimize GROUP BY query to retrieve latest row per user Optimize groupwise maximum query Query last N related rows per row  Benchmark I had a simple benchmark here which is outdated by now  I replaced it with a detailed benchmark in this separate answer

User · Answer

Snowflake Teradata supports QUALIFY clause which works like HAVING for windowed functions   SELECT id  customer  total FROM PURCHASES QUALIFY ROW NUMBER   OVER PARTITION BY p customer ORDER BY p total DESC    1

User · Answer

Benchmark Testing the most interesting candidates with Postgres 9 4 and 9 5 with a halfway realistic table of 200k rows in purchases and 10k distinct customer id  avg  20 rows per customer   For Postgres 9 5 I ran a 2nd test with effectively 86446 distinct customers  See below  avg  2 3 rows per customer   Setup Main table CREATE TABLE purchases     id          serial   customer id int  -- REFERENCES customer   total       int  -- could be amount of money in Cent   some column text -- to make the row bigger  more realistic     I use a serial  PK constraint added below  and an integer customer id since that s a more typical setup  Also added some column to make up for typically more columns  Dummy data  PK  index - a typical table also has some dead tuples  INSERT INTO purchases  customer id  total  some column     -- insert 200k rows SELECT  random     10000   int             AS customer id  -- 10k customers         random     random     100000   int AS total              note       repeat  x    random   2   random     random     500   int  FROM   generate series 1 200000  g   ALTER TABLE purchases ADD CONSTRAINT purchases id pkey PRIMARY KEY  id    DELETE FROM purchases WHERE random    gt  0 9  -- some dead rows  INSERT INTO purchases  customer id  total  some column  SELECT  random     10000   int             AS customer id  -- 10k customers         random     random     100000   int AS total              note       repeat  x    random   2   random     random     500   int  FROM   generate series 1 20000  g   -- add 20k to make it   200k  CREATE INDEX purchases 3c idx ON purchases  customer id  total DESC  id    VACUUM ANALYZE purchases   customer table - for superior query  CREATE TABLE customer AS SELECT customer id   customer      customer id AS customer FROM   purchases GROUP  BY 1 ORDER  BY 1   ALTER TABLE customer ADD CONSTRAINT customer customer id pkey PRIMARY KEY  customer id    VACUUM ANALYZE customer   In my second test for 9 5 I used the same setup  but with random     100000 to generate customer id to get only few rows per customer id  Object sizes for table purchases Generated with a query taken from this related answer   Measure the size of a PostgreSQL table row                 what                  bytes ct   bytes pretty   bytes per row ----------------------------------- ---------- -------------- ---------------  core relation size                  20496384   20 MB                    102  visibility map                             0   0 bytes                    0  free space map                         24576   24 kB                      0  table size incl toast               20529152   20 MB                    102  indexes size                        10977280   10 MB                     54  total size incl toast and indexes   31506432   30 MB                    157  live rows in text representation    13729802   13 MB                     68  ------------------------------                                 row count                             200045                   live tuples                           200045                   dead tuples                            19955                   Queries 1  row number   in CTE   see other answer  WITH cte AS      SELECT id  customer id  total           row number   OVER PARTITION BY customer id ORDER BY total DESC  AS rn    FROM   purchases      SELECT id  customer id  total FROM   cte WHERE  rn   1     row number   in subquery  my optimization   SELECT id  customer id  total FROM        SELECT id  customer id  total           row number   OVER PARTITION BY customer id ORDER BY total DESC  AS rn    FROM   purchases      sub WHERE  rn   1   3  DISTINCT ON  see other answer  SELECT DISTINCT ON  customer id         id  customer id  total FROM   purchases ORDER  BY customer id  total DESC  id   4  rCTE with LATERAL subquery  see here  WITH RECURSIVE cte AS         -- parentheses required    SELECT id  customer id  total    FROM   purchases    ORDER  BY customer id  total DESC    LIMIT  1         UNION ALL    SELECT u      FROM   cte c           LATERAL         SELECT id  customer id  total       FROM   purchases       WHERE  customer id  gt  c customer id  -- lateral reference       ORDER  BY customer id  total DESC       LIMIT  1         u      SELECT id  customer id  total FROM   cte ORDER  BY customer id   5  customer table with LATERAL  see here  SELECT l   FROM   customer c        LATERAL      SELECT id  customer id  total    FROM   purchases    WHERE  customer id   c customer id  -- lateral reference    ORDER  BY total DESC    LIMIT  1      l   6  array agg   with ORDER BY  see other answer  SELECT  array agg id ORDER BY total DESC   1  AS id        customer id        max total  AS total FROM   purchases GROUP  BY customer id   Results Execution time for above queries with EXPLAIN ANALYZE  and all options off   best of 5 runs  All queries used an Index Only Scan on purchases2 3c idx  among other steps   Some of them just for the smaller size of the index  others more effectively  A  Postgres 9 4 with 200k rows and   20 per customer id 1  273 274 ms   2  194 572 ms   3  111 067 ms   4   92 922 ms   5   37 679 ms  -- winner 6  189 495 ms  B  The same with Postgres 9 5 1  288 006 ms 2  223 032 ms   3  107 074 ms   4   78 032 ms   5   33 944 ms  -- winner 6  211 540 ms    C  Same as B   but with   2 3 rows per customer id 1  381 573 ms 2  311 976 ms 3  124 074 ms  -- winner 4  710 631 ms 5  311 976 ms 6  421 679 ms   Related benchmarks Here is a new one by  quot ogr quot  testing with 10M rows and 60k unique  quot customers quot  on Postgres 11 5  current as of Sep  2019   Results are still in line with what we have seen so far   Proper way to access latest row for each individual identifier   Original  outdated  benchmark from 2011 I ran three tests with PostgreSQL 9 1 on a real life table of 65579 rows and single-column btree indexes on each of the three columns involved and took the best execution time of 5 runs  Comparing  OMGPonies  first query  A  to the above DISTINCT ON solution  B    Select the whole table  results in 5958 rows in this case   A  567 218 ms B  386 673 ms   Use condition WHERE customer BETWEEN x AND y resulting in 1000 rows   A  249 136 ms B   55 111 ms   Select a single customer with WHERE customer   x   A    0 143 ms B    0 072 ms  Same test repeated with the index described in the other answer CREATE INDEX purchases 3c idx ON purchases  customer  total DESC  id     1A  277 953 ms   1B  193 547 ms  2A  249 796 ms -- special index not used   2B   28 679 ms  3A    0 120 ms   3B    0 048 ms

User · Answer

The solution is not very efficient as pointed by Erwin  because of presence of SubQs  select   from purchases p1 where total in  select max total  from purchases where p1 customer customer  order by total desc

User · Answer

For SQl Server the most efficient way is      with ids as   --condition for split table into groups     select i from  values  9   12   17   18   19   20   22   21   23   10   as v i       src as        select   from yourTable where   lt condition gt  --use this as filter for other conditions    joined as       select tops   from ids      cross apply --it s like for each rows               select top 1             from src         where CommodityId   ids i        as tops   select   from joined   and don t forget to create clustered index for used columns

User · Answer

If you want to select any  by your some specific condition  row from the set of aggregated rows   If you want to use another  sum avg  aggregation function in addition to max min  Thus you can not use clue with DISTINCT ON   You can use next subquery   SELECT                  SELECT   id   FROM t2           WHERE id   ANY   ARRAY AGG  tf id     AND amount   MAX  tf amount            id        name         MAX amount  ma        SUM  ratio     FROM t2  tf   GROUP BY name   You can replace amount   MAX  tf amount   with any condition you want with one restriction  This subquery must not return more than one row  But if you wanna to do such things you probably looking for window functions

User · Answer

The Query  SELECT purchases   FROM purchases LEFT JOIN purchases as p  ON    p customer   purchases customer    AND    purchases total  lt  p total WHERE p total IS NULL  HOW DOES THAT WORK   I ve been there  We want to make sure that we only have the highest total for each purchase   Some Theoretical Stuff  skip this part if you only want to understand the query  Let Total be a function T customer id  where it returns a value given the name and id To prove that the given total  T customer id   is the highest we have to prove that We want to prove either   x T customer id   gt  T customer x   this total is higher than all other total for that customer   OR     x T customer  id   lt  T customer  x     there exists no higher total for that customer   The first approach will need us to get all the records for that name which I do not really like  The second one will need a smart way to say there can be no record higher than this one   Back to SQL If we left joins the table on the name and total being less than the joined table  LEFT JOIN purchases as p  ON  p customer   purchases customer  AND  purchases total  lt  p total  we make sure that all records that have another record with the higher total for the same user to be joined   -------------- --------------------- ----------------- ------ ------------ ---------    purchases id    purchases customer   purchases total   p id   p customer   p total    -------------- --------------------- ----------------- ------ ------------ ---------               1   Tom                               200      2   Tom              300                2   Tom                               300                                              3   Bob                               400      4   Bob              500                4   Bob                               500                                              5   Alice                             600      6   Alice            700                6   Alice                             700                                  -------------- --------------------- ----------------- ------ ------------ ---------   That will help us filter for the highest total for each purchase with no grouping needed  WHERE p total IS NULL       -------------- ---------------- ----------------- ------ -------- ---------    purchases id   purchases name   purchases total   p id   p name   p total    -------------- ---------------- ----------------- ------ -------- ---------               2   Tom                          300                                          4   Bob                          500                                          6   Alice                        700                              -------------- ---------------- ----------------- ------ -------- ---------   And that s the answer we need

User · Answer

In PostgreSQL  another possibility is to use the first value window function in combination with SELECT DISTINCT   select distinct customer id                  first value row id  total   over partition by customer id order by total desc  id  from            purchases    I created a composite  id  total   so both values are returned by the same aggregate  You can of course always apply first value   twice

User · Answer

Very fast solution  SELECT a    FROM     purchases a      JOIN            SELECT customer  min  id   as id          FROM purchases          GROUP BY customer        b USING   id      and really very fast if table is indexed by id   create index purchases id on purchases  id

User · Answer

Use ARRAY AGG function for PostgreSQL  U-SQL  IBM DB2  and Google BigQuery SQL   SELECT customer   ARRAY AGG id ORDER BY total DESC   1   MAX total  FROM purchases GROUP BY customer

[sql] Select first row in each GROUP BY group?

Examples related to sql

Examples related to sqlite

Examples related to postgresql

Examples related to group-by

Examples related to greatest-n-per-group