[sql] Select first row in each GROUP BY group?

The Query:

SELECT purchases.*
FROM purchases
LEFT JOIN purchases as p 
ON 
  p.customer = purchases.customer 
  AND 
  purchases.total < p.total
WHERE p.total IS NULL

HOW DOES THAT WORK! (I've been there)

We want to make sure that we only have the highest total for each purchase.


Some Theoretical Stuff (skip this part if you only want to understand the query)

Let Total be a function T(customer,id) where it returns a value given the name and id To prove that the given total (T(customer,id)) is the highest we have to prove that We want to prove either

  • ?x T(customer,id) > T(customer,x) (this total is higher than all other total for that customer)

OR

  • ¬?x T(customer, id) < T(customer, x) (there exists no higher total for that customer)

The first approach will need us to get all the records for that name which I do not really like.

The second one will need a smart way to say there can be no record higher than this one.


Back to SQL

If we left joins the table on the name and total being less than the joined table:

LEFT JOIN purchases as p 
ON 
p.customer = purchases.customer 
AND 
purchases.total < p.total

we make sure that all records that have another record with the higher total for the same user to be joined:

+--------------+---------------------+-----------------+------+------------+---------+
| purchases.id |  purchases.customer | purchases.total | p.id | p.customer | p.total |
+--------------+---------------------+-----------------+------+------------+---------+
|            1 | Tom                 |             200 |    2 | Tom        |     300 |
|            2 | Tom                 |             300 |      |            |         |
|            3 | Bob                 |             400 |    4 | Bob        |     500 |
|            4 | Bob                 |             500 |      |            |         |
|            5 | Alice               |             600 |    6 | Alice      |     700 |
|            6 | Alice               |             700 |      |            |         |
+--------------+---------------------+-----------------+------+------------+---------+

That will help us filter for the highest total for each purchase with no grouping needed:

WHERE p.total IS NULL
    
+--------------+----------------+-----------------+------+--------+---------+
| purchases.id | purchases.name | purchases.total | p.id | p.name | p.total |
+--------------+----------------+-----------------+------+--------+---------+
|            2 | Tom            |             300 |      |        |         |
|            4 | Bob            |             500 |      |        |         |
|            6 | Alice          |             700 |      |        |         |
+--------------+----------------+-----------------+------+--------+---------+

And that's the answer we need.

Examples related to sql

Passing multiple values for same variable in stored procedure SQL permissions for roles Generic XSLT Search and Replace template Access And/Or exclusions Pyspark: Filter dataframe based on multiple conditions Subtracting 1 day from a timestamp date PYODBC--Data source name not found and no default driver specified select rows in sql with latest date for each ID repeated multiple times ALTER TABLE DROP COLUMN failed because one or more objects access this column Create Local SQL Server database

Examples related to sqlite

getting " (1) no such column: _id10 " error Laravel: PDOException: could not find driver auto create database in Entity Framework Core How to open .SQLite files Accessing an SQLite Database in Swift When does SQLiteOpenHelper onCreate() / onUpgrade() run? Attempt to write a readonly database - Django w/ SELinux error Android sqlite how to check if a record exists How can I add the sqlite3 module to Python? "Insert if not exists" statement in SQLite

Examples related to postgresql

Subtracting 1 day from a timestamp date pgadmin4 : postgresql application server could not be contacted. Psql could not connect to server: No such file or directory, 5432 error? How to persist data in a dockerized postgres database using volumes input file appears to be a text format dump. Please use psql Postgres: check if array field contains value? Add timestamp column with default NOW() for new rows only Can't connect to Postgresql on port 5432 How to insert current datetime in postgresql insert query Connecting to Postgresql in a docker container from outside

Examples related to group-by

SELECT list is not in GROUP BY clause and contains nonaggregated column .... incompatible with sql_mode=only_full_group_by Count unique values using pandas groupby Pandas group-by and sum Count unique values with pandas per groups Group dataframe and get sum AND count? Error related to only_full_group_by when executing a query in MySql Pandas sum by groupby, but exclude certain columns Using DISTINCT along with GROUP BY in SQL Server Python Pandas : group by in group by and average? How do I create a new column from the output of pandas groupby().sum()?

Examples related to greatest-n-per-group

How to select the rows with maximum values in each group with dplyr? MAX function in where clause mysql Pandas get topmost n records within each group SQL Left Join first match only Select info from table where row has max date FORCE INDEX in MySQL - where do I put it? GROUP BY having MAX date How can I select rows with most recent timestamp for each key value? Select row with most recent date per user How to select id with max date group by category in PostgreSQL?