[mysql] MySQL "Group By" and "Order By"

I want to be able to select a bunch of rows from a table of e-mails and group them by the from sender. My query looks like this:

SELECT 
    `timestamp`, `fromEmail`, `subject`
FROM `incomingEmails` 
GROUP BY LOWER(`fromEmail`) 
ORDER BY `timestamp` DESC

The query almost works as I want it — it selects records grouped by e-mail. The problem is that the subject and timestamp don't correspond to the most recent record for a particular e-mail address.

For example, it might return:

fromEmail: [email protected], subject: hello
fromEmail: [email protected], subject: welcome

When the records in the database are:

fromEmail: [email protected], subject: hello
fromEmail: [email protected], subject: programming question
fromEmail: [email protected], subject: welcome

If the "programming question" subject is the most recent, how can I get MySQL to select that record when grouping the e-mails?

This question is related to mysql sql group-by sql-order-by aggregate-functions

The answer is


As pointed in a reply already, the current answer is wrong, because the GROUP BY arbitrarily selects the record from the window.

If one is using MySQL 5.6, or MySQL 5.7 with ONLY_FULL_GROUP_BY, the correct (deterministic) query is:

SELECT incomingEmails.*
  FROM (
    SELECT fromEmail, MAX(timestamp) `timestamp`
    FROM incomingEmails
    GROUP BY fromEmail
  ) filtered_incomingEmails
  JOIN incomingEmails USING (fromEmail, timestamp)
GROUP BY fromEmail, timestamp

In order for the query to run efficiently, proper indexing is required.

Note that for simplification purposes, I've removed the LOWER(), which in most cases, won't be used.


Do a GROUP BY after the ORDER BY by wrapping your query with the GROUP BY like this:

SELECT t.* FROM (SELECT * FROM table ORDER BY time DESC) t GROUP BY t.from

According to SQL standard you cannot use non-aggregate columns in select list. MySQL allows such usage (uless ONLY_FULL_GROUP_BY mode used) but result is not predictable.

ONLY_FULL_GROUP_BY

You should first select fromEmail, MIN(read), and then, with second query (or subquery) - Subject.


Here's one approach:

SELECT cur.textID, cur.fromEmail, cur.subject, 
     cur.timestamp, cur.read
FROM incomingEmails cur
LEFT JOIN incomingEmails next
    on cur.fromEmail = next.fromEmail
    and cur.timestamp < next.timestamp
WHERE next.timestamp is null
and cur.toUserID = '$userID' 
ORDER BY LOWER(cur.fromEmail)

Basically, you join the table on itself, searching for later rows. In the where clause you state that there cannot be later rows. This gives you only the latest row.

If there can be multiple emails with the same timestamp, this query would need refining. If there's an incremental ID column in the email table, change the JOIN like:

LEFT JOIN incomingEmails next
    on cur.fromEmail = next.fromEmail
    and cur.id < next.id

I struggled with both these approaches for more complex queries than those shown, because the subquery approach was horribly ineficient no matter what indexes I put on, and because I couldn't get the outer self-join through Hibernate

The best (and easiest) way to do this is to group by something which is constructed to contain a concatenation of the fields you require and then to pull them out using expressions in the SELECT clause. If you need to do a MAX() make sure that the field you want to MAX() over is always at the most significant end of the concatenated entity.

The key to understanding this is that the query can only make sense if these other fields are invariant for any entity which satisfies the Max(), so in terms of the sort the other pieces of the concatenation can be ignored. It explains how to do this at the very bottom of this link. http://dev.mysql.com/doc/refman/5.0/en/group-by-hidden-columns.html

If you can get am insert/update event (like a trigger) to pre-compute the concatenation of the fields you can index it and the query will be as fast as if the group by was over just the field you actually wanted to MAX(). You can even use it to get the maximum of multiple fields. I use it to do queries against multi-dimensional trees expresssed as nested sets.


Examples related to mysql

Implement specialization in ER diagram How to post query parameters with Axios? PHP with MySQL 8.0+ error: The server requested authentication method unknown to the client Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver class is `com.mysql.cj.jdbc.Driver' phpMyAdmin - Error > Incorrect format parameter? Authentication plugin 'caching_sha2_password' is not supported How to resolve Unable to load authentication plugin 'caching_sha2_password' issue Connection Java-MySql : Public Key Retrieval is not allowed How to grant all privileges to root user in MySQL 8.0 MySQL 8.0 - Client does not support authentication protocol requested by server; consider upgrading MySQL client

Examples related to sql

Passing multiple values for same variable in stored procedure SQL permissions for roles Generic XSLT Search and Replace template Access And/Or exclusions Pyspark: Filter dataframe based on multiple conditions Subtracting 1 day from a timestamp date PYODBC--Data source name not found and no default driver specified select rows in sql with latest date for each ID repeated multiple times ALTER TABLE DROP COLUMN failed because one or more objects access this column Create Local SQL Server database

Examples related to group-by

SELECT list is not in GROUP BY clause and contains nonaggregated column .... incompatible with sql_mode=only_full_group_by Count unique values using pandas groupby Pandas group-by and sum Count unique values with pandas per groups Group dataframe and get sum AND count? Error related to only_full_group_by when executing a query in MySql Pandas sum by groupby, but exclude certain columns Using DISTINCT along with GROUP BY in SQL Server Python Pandas : group by in group by and average? How do I create a new column from the output of pandas groupby().sum()?

Examples related to sql-order-by

Laravel Eloquent: Ordering results of all() SQL for ordering by number - 1,2,3,4 etc instead of 1,10,11,12 SQL ORDER BY multiple columns How to use Oracle ORDER BY and ROWNUM correctly? MySQL order by before group by Ordering by specific field value first MySQL ORDER BY multiple column ASC and DESC How can I get just the first row in a result set AFTER ordering? SQL order string as number Order by multiple columns with Doctrine

Examples related to aggregate-functions

Spark SQL: apply aggregate functions to a list of columns GROUP BY without aggregate function GROUP BY + CASE statement must appear in the GROUP BY clause or be used in an aggregate function Naming returned columns in Pandas aggregate function? Concatenate multiple result rows of one column into one, group by another column How to include "zero" / "0" results in COUNT aggregate? Apply multiple functions to multiple groupby columns Reason for Column is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause Optimal way to concatenate/aggregate strings