[sql] SQL JOIN, GROUP BY on three tables to get totals

I've inherited the following DB design. Tables are:

customers
---------
customerid  
customernumber

invoices
--------
invoiceid  
amount

invoicepayments
---------------
invoicepaymentid  
invoiceid  
paymentid

payments
--------
paymentid  
customerid  
amount

My query needs to return invoiceid, the invoice amount (in the invoices table), and the amount due (invoice amount minus any payments that have been made towards the invoice) for a given customernumber. A customer may have multiple invoices.

The following query gives me duplicate records when multiple payments are made to an invoice:

SELECT i.invoiceid, i.amount, i.amount - p.amount AS amountdue
FROM invoices i
LEFT JOIN invoicepayments ip ON i.invoiceid = ip.invoiceid
LEFT JOIN payments p ON ip.paymentid = p.paymentid
LEFT JOIN customers c ON p.customerid = c.customerid
WHERE c.customernumber = '100'

How can I solve this?

This question is related to sql join aggregate

The answer is


Thank you very much for the replies!

Saggi Malachi, that query unfortunately sums the invoice amount in cases where there is more than one payment. Say there are two payments to a $39 invoice of $18 and $12. So rather than ending up with a result that looks like:

1   39.00   9.00

You'll end up with:

1   78.00   48.00

Charles Bretana, in the course of trimming my query down to the simplest possible query I (stupidly) omitted an additional table, customerinvoices, which provides a link between customers and invoices. This can be used to see invoices for which payments haven't made.

After much struggling, I think that the following query returns what I need it to:

SELECT DISTINCT i.invoiceid, i.amount, ISNULL(i.amount - p.amount, i.amount) AS amountdue
FROM invoices i
LEFT JOIN invoicepayments ip ON i.invoiceid = ip.invoiceid
LEFT JOIN customerinvoices ci ON i.invoiceid = ci.invoiceid
LEFT JOIN (
  SELECT invoiceid, SUM(p.amount) amount
  FROM invoicepayments ip 
  LEFT JOIN payments p ON ip.paymentid = p.paymentid
  GROUP BY ip.invoiceid
) p
ON p.invoiceid = ip.invoiceid
LEFT JOIN payments p2 ON ip.paymentid = p2.paymentid
LEFT JOIN customers c ON ci.customerid = c.customerid
WHERE c.customernumber='100'

Would you guys concur?


I am not sure I got you but this might be what you are looking for:

SELECT i.invoiceid, sum(case when i.amount is not null then i.amount else 0 end), sum(case when i.amount is not null then i.amount else 0 end) - sum(case when p.amount is not null then p.amount else 0 end) AS amountdue
FROM invoices i
LEFT JOIN invoicepayments ip ON i.invoiceid = ip.invoiceid
LEFT JOIN payments p ON ip.paymentid = p.paymentid
LEFT JOIN customers c ON p.customerid = c.customerid
WHERE c.customernumber = '100'
GROUP BY i.invoiceid

This would get you the amounts sums in case there are multiple payment rows for each invoice


I have a tip for those, who want to get various aggregated values from the same table.

Lets say I have table with users and table with points the users acquire. So the connection between them is 1:N (one user, many points records).

Now in the table 'points' I also store the information about for what did the user get the points (login, clicking a banner etc.). And I want to list all users ordered by SUM(points) AND then by SUM(points WHERE type = x). That is to say ordered by all the points user has and then by points the user got for a specific action (eg. login).

The SQL would be:

SELECT SUM(points.points) AS points_all, SUM(points.points * (points.type = 7)) AS points_login
FROM user
LEFT JOIN points ON user.id = points.user_id
GROUP BY user.id

The beauty of this is in the SUM(points.points * (points.type = 7)) where the inner parenthesis evaluates to either 0 or 1 thus multiplying the given points value by 0 or 1, depending on wheteher it equals to the the type of points we want.


First of all, shouldn't there be a CustomerId in the Invoices table? As it is, You can't perform this query for Invoices that have no payments on them as yet. If there are no payments on an invoice, that invoice will not even show up in the ouput of the query, even though it's an outer join...

Also, When a customer makes a payment, how do you know what Invoice to attach it to ? If the only way is by the InvoiceId on the stub that arrives with the payment, then you are (perhaps inappropriately) associating Invoices with the customer that paid them, rather than with the customer that ordered them... . (Sometimes an invoice can be paid by someone other than the customer who ordered the services)


I know this is late, but it does answer your original question.

/*Read the comments the same way that SQL runs the query
    1) FROM 
    2) GROUP 
    3) SELECT 
    4) My final notes at the bottom 
*/
SELECT 
        list.invoiceid
    ,   cust.customernumber 
    ,   MAX(list.inv_amount) AS invoice_amount/* we select the max because it will be the same for each payment to that invoice (presumably invoice amounts do not vary based on payment) */
    ,   MAX(list.inv_amount) - SUM(list.pay_amount)  AS [amount_due]
FROM 
Customers AS cust 
    INNER JOIN 
Payments  AS pay 
    ON 
        pay.customerid = cust.customerid
INNER JOIN  (   /* generate a list of payment_ids, their amounts, and the totals of the invoices they billed to*/
    SELECT 
            inpay.paymentid AS paymentid
        ,   inv.invoiceid AS invoiceid 
        ,   inv.amount  AS inv_amount 
        ,   pay.amount AS pay_amount 
    FROM 
    InvoicePayments AS inpay
        INNER JOIN 
    Invoices AS inv 
        ON  inv.invoiceid = inpay.invoiceid 
        INNER JOIN 
    Payments AS pay 
        ON pay.paymentid = inpay.paymentid
    )  AS list
ON 
    list.paymentid = pay.paymentid
    /* so at this point my result set would look like: 
    -- All my customers (crossed by) every paymentid they are associated to (I'll call this A)
    -- Every invoice payment and its association to: its own ammount, the total invoice ammount, its own paymentid (what I call list) 
    -- Filter out all records in A that do not have a paymentid matching in (list)
     -- we filter the result because there may be payments that did not go towards invoices!
 */
GROUP BY
    /* we want a record line for each customer and invoice ( or basically each invoice but i believe this makes more sense logically */ 
        cust.customernumber 
    ,   list.invoiceid 
/*
    -- we can improve this query by only hitting the Payments table once by moving it inside of our list subquery, 
    -- but this is what made sense to me when I was planning. 
    -- Hopefully it makes it clearer how the thought process works to leave it in there
    -- as several people have already pointed out, the data structure of the DB prevents us from looking at customers with invoices that have no payments towards them.
*/

Examples related to sql

Passing multiple values for same variable in stored procedure SQL permissions for roles Generic XSLT Search and Replace template Access And/Or exclusions Pyspark: Filter dataframe based on multiple conditions Subtracting 1 day from a timestamp date PYODBC--Data source name not found and no default driver specified select rows in sql with latest date for each ID repeated multiple times ALTER TABLE DROP COLUMN failed because one or more objects access this column Create Local SQL Server database

Examples related to join

Pandas Merging 101 pandas: merge (join) two data frames on multiple columns How to use the COLLATE in a JOIN in SQL Server? How to join multiple collections with $lookup in mongodb How to join on multiple columns in Pyspark? Pandas join issue: columns overlap but no suffix specified MySQL select rows where left join is null How to return rows from left table not found in right table? Why do multiple-table joins produce duplicate rows? pandas three-way joining multiple dataframes on columns

Examples related to aggregate

Pandas group-by and sum SELECT list is not in GROUP BY clause and contains nonaggregated column Aggregate multiple columns at once Pandas sum by groupby, but exclude certain columns Extract the maximum value within each group in a dataframe How to group dataframe rows into list in pandas groupby Mean per group in a data.frame Summarizing multiple columns with dplyr? data.frame Group By column Compute mean and standard deviation by group for multiple variables in a data.frame