Using LIMIT within GROUP BY to get N results per group

Question

The following query   SELECT year  id  rate FROM h WHERE year BETWEEN 2000 AND 2009 AND id IN  SELECT rid FROM table2  GROUP BY id  year ORDER BY id  rate DESC   yields   year    id  rate 2006    p01 8 2003    p01 7 4 2008    p01 6 8 2001    p01 5 9 2007    p01 5 3 2009    p01 4 4 2002    p01 3 9 2004    p01 3 5 2005    p01 2 1 2000    p01 0 8 2001    p02 12 5 2004    p02 12 4 2002    p02 12 2 2003    p02 10 3 2000    p02 8 7 2006    p02 4 6 2007    p02 3 3   What I d like is only the top 5 results for each id   2006    p01 8 2003    p01 7 4 2008    p01 6 8 2001    p01 5 9 2007    p01 5 3 2001    p02 12 5 2004    p02 12 4 2002    p02 12 2 2003    p02 10 3 2000    p02 8 7   Is there a way to do this using some kind of LIMIT like modifier that works within the GROUP BY

User · Answer

The original query used user variables and ORDER BY on derived tables  the behavior of both quirks is not guaranteed  Revised answer as follows   In MySQL 5 x you can use poor man s rank over partition to achieve desired result  Just outer join the table with itself and for each row  count the number of rows lesser than it  In the above case  lesser row is the one with higher rate   SELECT t id  t rate  t year  COUNT l rate  AS rank FROM t LEFT JOIN t AS l ON t id   l id AND t rate  lt  l rate GROUP BY t id  t rate  t year HAVING COUNT l rate   lt  5 ORDER BY t id  t rate DESC  t year   Demo and Result     id    rate   year   rank    ----- ------ ------ ------    p01    8 0   2006   0        p01    7 4   2003   1        p01    6 8   2008   2        p01    5 9   2001   3        p01    5 3   2007   4        p02   12 5   2001   0        p02   12 4   2004   1        p02   12 2   2002   2        p02   10 3   2003   3        p02    8 7   2000   4        Note that if the rates had ties  for example   100  90  90  80  80  80  70  60  50  40        The above query will return 6 rows   100  90  90  80  80  80   Change to HAVING COUNT DISTINCT l rate   lt  5 to get 8 rows   100  90  90  80  80  80  70  60   Or change to ON t id   l id AND  t rate  lt  l rate OR  t rate   l rate AND t pri key  gt  l pri key   to get 5 rows    100  90  90  80  80     In MySQL 8 or later just use the RANK  DENSE RANK or ROW NUMBER functions   SELECT   FROM       SELECT    RANK   OVER  PARTITION BY id ORDER BY rate DESC  AS rnk     FROM t   AS x WHERE rnk  lt   5

User · Answer

No  you can t LIMIT subqueries arbitrarily  you can do it to a limited extent in newer MySQLs  but not for 5 results per group    This is a groupwise-maximum type query  which is not trivial to do in SQL  There are various ways to tackle that which can be more efficient for some cases  but for top-n in general you ll want to look at Bill s answer to a similar previous question   As with most solutions to this problem  it can return more than five rows if there are multiple rows with the same rate value  so you may still need a quantity of post-processing to check for that

User · Answer

for those like me that had queries time out  I made the below to use limits and anything else by a specific group   DELIMITER    CREATE PROCEDURE count limit200   BEGIN     DECLARE a INT Default 0      DECLARE stop loop INT Default 0      DECLARE domain val VARCHAR 250       DECLARE domain list CURSOR FOR SELECT DISTINCT domain FROM db one       OPEN domain list       SELECT COUNT DISTINCT domain   INTO stop loop      FROM db one      -- BEGIN LOOP     loop thru domains  LOOP         FETCH domain list INTO domain val          SET a a 1           INSERT INTO db two book artist title title count last updated           SELECT   FROM                        SELECT book artist title COUNT ObjectKey  AS titleCount  NOW                FROM db one              WHERE book   domain val             GROUP BY artist title             ORDER BY book titleCount DESC             LIMIT 200           a ON DUPLICATE KEY UPDATE title count   titleCount  last updated   NOW             IF a   stop loop THEN             LEAVE loop thru domain          END IF      END LOOP loop thru domain  END      it loops through a list of domains and then inserts only a limit of 200 each

User · Answer

Try this    SELECT h year  h id  h rate  FROM  SELECT h year  h id  h rate  IF  lastid     lastid  h id    index   index 1   index  0  indx        FROM  SELECT h year  h id  h rate              FROM h             WHERE h year BETWEEN 2000 AND 2009 AND id IN  SELECT rid FROM table2              GROUP BY id  h year             ORDER BY id  rate DESC               h   SELECT  lastid       index  0  AS a       h  WHERE h indx  lt   5

User · Answer

Build the virtual columns like RowID in Oracle  Table  CREATE TABLE  stack     year  int 11  DEFAULT NULL   id  varchar 10  DEFAULT NULL   rate  float DEFAULT NULL   ENGINE InnoDB DEFAULT CHARSET utf8mb4  Data  insert into stack values 2006  p01  8   insert into stack values 2001  p01  5 9   insert into stack values 2007  p01  5 3   insert into stack values 2009  p01  4 4   insert into stack values 2001  p02  12 5   insert into stack values 2004  p02  12 4   insert into stack values 2005  p01  2 1   insert into stack values 2000  p01  0 8   insert into stack values 2002  p02  12 2   insert into stack values 2002  p01  3 9   insert into stack values 2004  p01  3 5   insert into stack values 2003  p02  10 3   insert into stack values 2000  p02  8 7   insert into stack values 2006  p02  4 6   insert into stack values 2007  p02  3 3   insert into stack values 2003  p01  7 4   insert into stack values 2008  p01  6 8    SQL like this  select t3 year t3 id t3 rate  from  select t1     select count    from stack t2 where t1 rate lt  t2 rate and t1 id t2 id  as rownum from stack t1  t3  where rownum  lt  3 order by id rate DESC   If delete the where clause in t3  it shows like this   GET  quot TOP N Record quot  -- gt  add the rownum  lt  3 in where clause  the where-clause of t3   CHOOSE  quot the year quot  -- gt  add the BETWEEN 2000 AND 2009 in where clause  the where-clause of t3

User · Answer

The following post  sql  selcting top N record per group describes the complicated way of achieving this without subqueries   It improves on other solutions offered here by    Doing everything in a single query Being able to properly utilize indexes Avoiding subqueries  notoriously known to produce bad execution plans in MySQL   It is however not pretty  A good solution would be achievable were Window Functions  aka Analytic Functions  enabled in MySQL -- but they are not  The trick used in said post utilizes GROUP CONCAT  which is sometimes described as  poor man s Window Functions for MySQL

User · Answer

You could use GROUP CONCAT aggregated function to get all years into a single column  grouped by id and ordered by rate  SELECT   id  GROUP CONCAT year ORDER BY rate DESC  grouped year FROM     yourtable GROUP BY id  Result  -----------------------------------------------------------    ID   GROUPED YEAR                                        -----------------------------------------------------------   p01   2006 2003 2008 2001 2007 2009 2002 2004 2005 2000     p02   2001 2004 2002 2003 2000 2006 2007                  -----------------------------------------------------------  And then you could use FIND IN SET  that returns the position of the first argument inside the second one  eg  SELECT FIND IN SET  2006    2006 2003 2008 2001 2007 2009 2002 2004 2005 2000    1  SELECT FIND IN SET  2009    2006 2003 2008 2001 2007 2009 2002 2004 2005 2000    6  Using a combination of GROUP CONCAT and FIND IN SET  and filtering by the position returned by find in set  you could then use this query that returns only the first 5 years for every id  SELECT   yourtable   FROM   yourtable INNER JOIN       SELECT       id        GROUP CONCAT year ORDER BY rate DESC  grouped year     FROM       yourtable     GROUP BY id  group max   ON yourtable id   group max id      AND FIND IN SET year  grouped year  BETWEEN 1 AND 5 ORDER BY   yourtable id  yourtable year DESC   Please see fiddle here  Please note that if more than one row can have the same rate  you should consider using GROUP CONCAT DISTINCT rate ORDER BY rate  on the rate column instead of the year column  The maximum length of the string returned by GROUP CONCAT is limited  so this works well if you need to select a few records for every group

User · Answer

Took some working  but I thougth my solution would be something to share as it is seems elegant as well as quite fast   SELECT h year  h id  h rate    FROM       SELECT id         SUBSTRING INDEX GROUP CONCAT CONCAT id   -   year  ORDER BY rate DESC         5  AS l       FROM h       WHERE year BETWEEN 2000 AND 2009       GROUP BY id       ORDER BY id     AS h temp     LEFT JOIN h ON h id   h temp id        AND SUBSTRING INDEX h temp l  CONCAT h id   -   h year   1     h temp l   Note that this example is specified for the purpose of the question and can be modified quite easily for other similar purposes

User · Answer

Please try below stored procedure  I have already verified  I am getting proper result but without using groupby   CREATE DEFINER  ks root      PROCEDURE  first five record per id    BEGIN DECLARE query string text  DECLARE datasource1 varchar 24   DECLARE done INT DEFAULT 0  DECLARE tenants varchar 50   DECLARE cur1 CURSOR FOR SELECT rid FROM demo1  DECLARE CONTINUE HANDLER FOR NOT FOUND SET done   1       SET  query string            OPEN cur1        read loop  LOOP        FETCH cur1 INTO tenants          IF done THEN         LEAVE read loop        END IF         SET  datasource1   tenants        SET  query string   concat  query string   select   from demo  where  id         datasource1     order by rate desc LIMIT 5  UNION ALL             END LOOP         close cur1       SET  query string    TRIM TRAILING  UNION ALL  FROM TRIM  query string        select  query string  PREPARE stmt FROM  query string  EXECUTE stmt  DEALLOCATE PREPARE stmt   END

User · Answer

Try this   SET  num    0   type        SELECT  year    id    rate        num    if  type    id    num   1  1  AS  row number        type     id  AS  dummy  FROM       SELECT       FROM  h      WHERE            year  BETWEEN  2000  AND  2009          AND  id  IN  SELECT  rid  FROM  table2   AS  temp rid            ORDER BY  id    AS  temph  GROUP BY  year    id    rate  HAVING  row number  lt   5  ORDER BY  id    rate DESC

User · Answer

This requires a series of subqueries to rank the values  limit them  then perform the sum while grouping   Rnk  0   N  2  select   c id    sum c val  from   select   b id    b bal from   select      if  last id id  Rnk 1 1  as Rnk    a id    a val     last id id  from      select    id    val  from list order by id val desc  as a  as b where b rnk  lt   N  as c group by c id

User · Answer

For me something like   SUBSTRING INDEX group concat col name order by desired col order name        N     works perfectly  No complicated query     for example  get top 1 for each group  SELECT        FROM     yourtable WHERE     id IN  SELECT              SUBSTRING INDEX GROUP CONCAT id                             ORDER BY rate DESC                                                        1  id         FROM             yourtable         GROUP BY year  ORDER BY rate DESC

User · Answer

SELECT year  id  rate FROM  SELECT   year  id  rate  row number   over  partition by id order by rate DESC    FROM h   WHERE year BETWEEN 2000 AND 2009   AND id IN  SELECT rid FROM table2    GROUP BY id  year   ORDER BY id  rate DESC  as subquery WHERE row number  lt   5   The subquery is almost identical to your query  Only change is adding  row number   over  partition by id order by rate DESC

[sql] Using LIMIT within GROUP BY to get N results per group?

Examples related to sql

Examples related to mysql

Examples related to greatest-n-per-group

Examples related to ranking