Suppose I have a table of customers and a table of purchases. Each purchase belongs to one customer. I want to get a list of all customers along with their last purchase in one SELECT
statement. What is the best practice? Any advice on building indexes?
Please use these table/column names in your answer:
id
, name
id
, customer_id
, item_id
, date
And in more complicated situations, would it be (performance-wise) beneficial to denormalize the database by putting the last purchase into the customer table?
If the (purchase) id
is guaranteed to be sorted by date, can the statements be simplified by using something like LIMIT 1
?
This question is related to
sql
select
join
indexing
greatest-n-per-group
Please try this,
SELECT
c.Id,
c.name,
(SELECT pi.price FROM purchase pi WHERE pi.Id = MAX(p.Id)) AS [LastPurchasePrice]
FROM customer c INNER JOIN purchase p
ON c.Id = p.customerId
GROUP BY c.Id,c.name;
Another approach would be to use a NOT EXISTS
condition in your join condition to test for later purchases:
SELECT *
FROM customer c
LEFT JOIN purchase p ON (
c.id = p.customer_id
AND NOT EXISTS (
SELECT 1 FROM purchase p1
WHERE p1.customer_id = c.id
AND p1.id > p.id
)
)
You could also try doing this using a sub select
SELECT c.*, p.*
FROM customer c INNER JOIN
(
SELECT customer_id,
MAX(date) MaxDate
FROM purchase
GROUP BY customer_id
) MaxDates ON c.id = MaxDates.customer_id INNER JOIN
purchase p ON MaxDates.customer_id = p.customer_id
AND MaxDates.MaxDate = p.date
The select should join on all customers and their Last purchase date.
Try this, It will help.
I have used this in my project.
SELECT
*
FROM
customer c
OUTER APPLY(SELECT top 1 * FROM purchase pi
WHERE pi.customer_id = c.Id order by pi.Id desc) AS [LastPurchasePrice]
I found this thread as a solution to my problem.
But when I tried them the performance was low. Bellow is my suggestion for better performance.
With MaxDates as (
SELECT customer_id,
MAX(date) MaxDate
FROM purchase
GROUP BY customer_id
)
SELECT c.*, M.*
FROM customer c INNER JOIN
MaxDates as M ON c.id = M.customer_id
Hope this will be helpful.
Tested on SQLite:
SELECT c.*, p.*, max(p.date)
FROM customer c
LEFT OUTER JOIN purchase p
ON c.id = p.customer_id
GROUP BY c.id
The max()
aggregate function will make sure that the latest purchase is selected from each group (but assumes that the date column is in a format whereby max() gives the latest - which is normally the case). If you want to handle purchases with the same date then you can use max(p.date, p.id)
.
In terms of indexes, I would use an index on purchase with (customer_id, date, [any other purchase columns you want to return in your select]).
The LEFT OUTER JOIN
(as opposed to INNER JOIN
) will make sure that customers that have never made a purchase are also included.
Without getting into the code first, the logic/algorithm goes below:
Go to the transaction
table with multiple records for the same client
.
Select records of clientID
and the latestDate
of client's activity using group by clientID
and max(transactionDate)
select clientID, max(transactionDate) as latestDate
from transaction
group by clientID
inner join
the transaction
table with the outcome from Step 2, then you will have the full records of the transaction
table with only each client's latest record.
select * from
transaction t
inner join (
select clientID, max(transactionDate) as latestDate
from transaction
group by clientID) d
on t.clientID = d.clientID and t.transactionDate = d.latestDate)
You can use the result from step 3 to join any table you want to get different results.
You haven't specified the database. If it is one that allows analytical functions it may be faster to use this approach than the GROUP BY one(definitely faster in Oracle, most likely faster in the late SQL Server editions, don't know about others).
Syntax in SQL Server would be:
SELECT c.*, p.*
FROM customer c INNER JOIN
(SELECT RANK() OVER (PARTITION BY customer_id ORDER BY date DESC) r, *
FROM purchase) p
ON (c.id = p.customer_id)
WHERE p.r = 1
If you're using PostgreSQL you can use DISTINCT ON
to find the first row in a group.
SELECT customer.*, purchase.*
FROM customer
JOIN (
SELECT DISTINCT ON (customer_id) *
FROM purchase
ORDER BY customer_id, date DESC
) purchase ON purchase.customer_id = customer.id
Note that the DISTINCT ON
field(s) -- here customer_id
-- must match the left most field(s) in the ORDER BY
clause.
Caveat: This is a nonstandard clause.
Source: Stackoverflow.com