[mysql] Get records with max value for each group of grouped SQL results

How do you get the rows that contain the max value for each grouped set?

I've seen some overly-complicated variations on this question, and none with a good answer. I've tried to put together the simplest possible example:

Given a table like that below, with person, group, and age columns, how would you get the oldest person in each group? (A tie within a group should give the first alphabetical result)

Person | Group | Age
---
Bob  | 1     | 32  
Jill | 1     | 34  
Shawn| 1     | 42  
Jake | 2     | 29  
Paul | 2     | 36  
Laura| 2     | 39  

Desired result set:

Shawn | 1     | 42    
Laura | 2     | 39  

This question is related to mysql sql greatest-n-per-group

The answer is


You can join against a subquery that pulls the MAX(Group) and Age. This method is portable across most RDBMS.

SELECT t1.*
FROM yourTable t1
INNER JOIN
(
    SELECT `Group`, MAX(Age) AS max_age
    FROM yourTable
    GROUP BY `Group`
) t2
    ON t1.`Group` = t2.`Group` AND t1.Age = t2.max_age;

Using CTEs - Common Table Expressions:

WITH MyCTE(MaxPKID, SomeColumn1)
AS(
SELECT MAX(a.MyTablePKID) AS MaxPKID, a.SomeColumn1
FROM MyTable1 a
GROUP BY a.SomeColumn1
  )
SELECT b.MyTablePKID, b.SomeColumn1, b.SomeColumn2 MAX(b.NumEstado)
FROM MyTable1 b
INNER JOIN MyCTE c ON c.MaxPKID = b.MyTablePKID
GROUP BY b.MyTablePKID, b.SomeColumn1, b.SomeColumn2

--Note: MyTablePKID is the PrimaryKey of MyTable

This method has the benefit of allowing you to rank by a different column, and not trashing the other data. It's quite useful in a situation where you are trying to list orders with a column for items, listing the heaviest first.

Source: http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html#function_group-concat

SELECT person, group,
    GROUP_CONCAT(
        DISTINCT age
        ORDER BY age DESC SEPARATOR ', follow up: '
    )
FROM sql_table
GROUP BY group;

My solution works only if you need retrieve only one column, however for my needs was the best solution found in terms of performance (it use only one single query!):

SELECT SUBSTRING_INDEX(GROUP_CONCAT(column_x ORDER BY column_y),',',1) AS xyz,
   column_z
FROM table_name
GROUP BY column_z;

It use GROUP_CONCAT in order to create an ordered concat list and then I substring to only the first one.


My simple solution for SQLite (and probably MySQL):

SELECT *, MAX(age) FROM mytable GROUP BY `Group`;

However it doesn't work in PostgreSQL and maybe some other platforms.

In PostgreSQL you can use DISTINCT ON clause:

SELECT DISTINCT ON ("group") * FROM "mytable" ORDER BY "group", "age" DESC;

You can also try

SELECT * FROM mytable WHERE age IN (SELECT MAX(age) FROM mytable GROUP BY `Group`) ;

with CTE as 
(select Person, 
[Group], Age, RN= Row_Number() 
over(partition by [Group] 
order by Age desc) 
from yourtable)`


`select Person, Age from CTE where RN = 1`

In Oracle below query can give the desired result.

SELECT group,person,Age,
  ROWNUMBER() OVER (PARTITION BY group ORDER BY age desc ,person asc) as rankForEachGroup
  FROM tablename where rankForEachGroup=1

If ID(and all coulmns) is needed from mytable

SELECT
    *
FROM
    mytable
WHERE
    id NOT IN (
        SELECT
            A.id
        FROM
            mytable AS A
        JOIN mytable AS B ON A. GROUP = B. GROUP
        AND A.age < B.age
    )

let the table name be people

select O.*              -- > O for oldest table
from people O , people T
where O.grp = T.grp and 
O.Age = 
(select max(T.age) from people T where O.grp = T.grp
  group by T.grp)
group by O.grp; 

axiac's solution is what worked best for me in the end. I had an additional complexity however: a calculated "max value", derived from two columns.

Let's use the same example: I would like the oldest person in each group. If there are people that are equally old, take the tallest person.

I had to perform the left join two times to get this behavior:

SELECT o1.* WHERE
    (SELECT o.*
    FROM `Persons` o
    LEFT JOIN `Persons` b
    ON o.Group = b.Group AND o.Age < b.Age
    WHERE b.Age is NULL) o1
LEFT JOIN
    (SELECT o.*
    FROM `Persons` o
    LEFT JOIN `Persons` b
    ON o.Group = b.Group AND o.Age < b.Age
    WHERE b.Age is NULL) o2
ON o1.Group = o2.Group AND o1.Height < o2.Height 
WHERE o2.Height is NULL;

Hope this helps! I guess there should be better way to do this though...


Improving axiac's solution to avoid selecting multiple rows per group while also allowing for use of indexes

SELECT o.*
FROM `Persons` o 
  LEFT JOIN `Persons` b 
      ON o.Group = b.Group AND o.Age < b.Age
  LEFT JOIN `Persons` c 
      ON o.Group = c.Group AND o.Age = c.Age and o.id < c.id
WHERE b.Age is NULL and c.id is null


Not sure if MySQL has row_number function. If so you can use it to get the desired result. On SQL Server you can do something similar to:

CREATE TABLE p
(
 person NVARCHAR(10),
 gp INT,
 age INT
);
GO
INSERT  INTO p
VALUES  ('Bob', 1, 32);
INSERT  INTO p
VALUES  ('Jill', 1, 34);
INSERT  INTO p
VALUES  ('Shawn', 1, 42);
INSERT  INTO p
VALUES  ('Jake', 2, 29);
INSERT  INTO p
VALUES  ('Paul', 2, 36);
INSERT  INTO p
VALUES  ('Laura', 2, 39);
GO

SELECT  t.person, t.gp, t.age
FROM    (
         SELECT *,
                ROW_NUMBER() OVER (PARTITION BY gp ORDER BY age DESC) row
         FROM   p
        ) t
WHERE   t.row = 1;

The correct solution is:

SELECT o.*
FROM `Persons` o                    # 'o' from 'oldest person in group'
  LEFT JOIN `Persons` b             # 'b' from 'bigger age'
      ON o.Group = b.Group AND o.Age < b.Age
WHERE b.Age is NULL                 # bigger age not found

How it works:

It matches each row from o with all the rows from b having the same value in column Group and a bigger value in column Age. Any row from o not having the maximum value of its group in column Age will match one or more rows from b.

The LEFT JOIN makes it match the oldest person in group (including the persons that are alone in their group) with a row full of NULLs from b ('no biggest age in the group').
Using INNER JOIN makes these rows not matching and they are ignored.

The WHERE clause keeps only the rows having NULLs in the fields extracted from b. They are the oldest persons from each group.

Further readings

This solution and many others are explained in the book SQL Antipatterns: Avoiding the Pitfalls of Database Programming


Using ranking method.

SELECT @rn :=  CASE WHEN @prev_grp <> groupa THEN 1 ELSE @rn+1 END AS rn,  
   @prev_grp :=groupa,
   person,age,groupa  
FROM   users,(SELECT @rn := 0) r        
HAVING rn=1
ORDER  BY groupa,age DESC,person

This sql can be explained as below,

  1. select * from users, (select @rn := 0) r order by groupa, age desc, person

  2. @prev_grp is null

  3. @rn := CASE WHEN @prev_grp <> groupa THEN 1 ELSE @rn+1 END

    this is a three operator expression
    like this, rn = 1 if prev_grp != groupa else rn=rn+1

  4. having rn=1 filter out the row you need


This is how I'm getting the N max rows per group in mysql

SELECT co.id, co.person, co.country
FROM person co
WHERE (
SELECT COUNT(*)
FROM person ci
WHERE  co.country = ci.country AND co.id < ci.id
) < 1
;

how it works:

  • self join to the table
  • groups are done by co.country = ci.country
  • N elements per group are controlled by ) < 1 so for 3 elements - ) < 3
  • to get max or min depends on: co.id < ci.id
    • co.id < ci.id - max
    • co.id > ci.id - min

Full example here:

mysql select n max values per group


I would not use Group as column name since it is reserved word. However following SQL would work.

SELECT a.Person, a.Group, a.Age FROM [TABLE_NAME] a
INNER JOIN 
(
  SELECT `Group`, MAX(Age) AS oldest FROM [TABLE_NAME] 
  GROUP BY `Group`
) b ON a.Group = b.Group AND a.Age = b.oldest

Examples related to mysql

Implement specialization in ER diagram How to post query parameters with Axios? PHP with MySQL 8.0+ error: The server requested authentication method unknown to the client Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver class is `com.mysql.cj.jdbc.Driver' phpMyAdmin - Error > Incorrect format parameter? Authentication plugin 'caching_sha2_password' is not supported How to resolve Unable to load authentication plugin 'caching_sha2_password' issue Connection Java-MySql : Public Key Retrieval is not allowed How to grant all privileges to root user in MySQL 8.0 MySQL 8.0 - Client does not support authentication protocol requested by server; consider upgrading MySQL client

Examples related to sql

Passing multiple values for same variable in stored procedure SQL permissions for roles Generic XSLT Search and Replace template Access And/Or exclusions Pyspark: Filter dataframe based on multiple conditions Subtracting 1 day from a timestamp date PYODBC--Data source name not found and no default driver specified select rows in sql with latest date for each ID repeated multiple times ALTER TABLE DROP COLUMN failed because one or more objects access this column Create Local SQL Server database

Examples related to greatest-n-per-group

How to select the rows with maximum values in each group with dplyr? MAX function in where clause mysql Pandas get topmost n records within each group SQL Left Join first match only Select info from table where row has max date FORCE INDEX in MySQL - where do I put it? GROUP BY having MAX date How can I select rows with most recent timestamp for each key value? Select row with most recent date per user How to select id with max date group by category in PostgreSQL?