[sql] SUM OVER PARTITION BY

What am I missing?

This query is returning duplicate data over and over again. The count is correct for a complete total, but I am expecting one row, and yet I am getting the value repeated about 40 times. Any ideas?

SELECT BrandId
      ,SUM(ICount) OVER (PARTITION BY BrandId ) 
  FROM Table 
WHERE DateId  = 20130618

I get this?

BrandId ICount
2       421762
2       421762
2       421762
2       421762
2       421762
2       421762
2       421762
1       133346
1       133346
1       133346
1       133346
1       133346
1       133346
1       133346

What am I missing?

I cant remove the partition by as the entire query is like this:

SELECT BrandId
       ,SUM(ICount) OVER (PARTITION BY BrandId) 
       ,TotalICount= SUM(ICount) OVER ()    
        ,SUM(ICount) OVER () / SUM(ICount) OVER (PARTITION BY BrandId)  as Percentage
FROM Table 
WHERE DateId  = 20130618

Which returns this:

BrandId (No column name)    TotalICount Percentage
2       421762              32239892    76
2       421762              32239892    76
2       421762              32239892    76
2       421762              32239892    76
2       421762              32239892    76
2       421762              32239892    76

I would expect output something like this without having to use a distinct:

BrandId (No column name)    TotalICount Percentage
2       421762              32239892    76
9       1238442             32239892    26
10      1467473             32239892    21

This question is related to sql sql-server tsql

The answer is


remove partition by and add group by clause,

SELECT BrandId
      ,SUM(ICount) totalSum
  FROM Table 
WHERE DateId  = 20130618
GROUP BY BrandId

In my opinion, I think it's important to explain the why behind the need for a GROUP BY in your SQL when summing with OVER() clause and why you are getting repeated lines of data when you are expecting one row per BrandID.

Take this example: You need to aggregate the total sale price of each order line, per specific order category, between two dates, but you also need to retain individual order data in your final results. A SUM() on the SalesPrice column would not allow you to get the correct totals because it would require a GROUP BY, therefore squashing the details because you wouldn't be able to keep the individual order lines in the select statement.

Many times we see a #temp table, @table variable, or CTE filled with the sum of our data and grouped up so we can join to it again later to get a column of the sums we need. This can add processing time and extra lines of code. Instead, use OVER(PARTITION BY ()) like this:

SELECT
  OrderLine, 
  OrderDateTime, 
  SalePrice, 
  OrderCategory,
  SUM(SalePrice) OVER(PARTITION BY OrderCategory) AS SaleTotalPerCategory
FROM tblSales 
WHERE OrderDateTime BETWEEN @StartDate AND @EndDate

Notice we are not grouping and we have individual order lines column selected. The PARTITION BY in the last column will return us a sales price total for each row of data in each category. What the last column essentially says is, we want the sum of the sale price (SUM(SalePrice)) over a partition of my results and by a specified category (OVER(PARTITION BY CategoryHere)).

If we remove the other columns from our select statement, and leave our final SUM() column, like this:

SELECT
  SUM(SalePrice) OVER(PARTITION BY OrderCategory) AS SaleTotalPerCategory
FROM tblSales 
WHERE OrderDateTime BETWEEN @StartDate AND @EndDate

The results will still repeat this sum for each row in our original result set. The reason is this method does not require a GROUP BY. If you don't need to retain individual line data, then simply SUM() without the use of OVER() and group up your data appropriately. Again, if you need an additional column with specific totals, you can use the OVER(PARTITION BY ()) method described above without additional selects to join back to.

The above is purely for explaining WHY he is getting repeated lines of the same number and to help understand what this clause provides. This method can be used in many ways and I highly encourage further reading from the documentation here:

Over Clause


I think the query you want is this:

SELECT BrandId, SUM(ICount),
       SUM(sum(ICount)) over () as TotalCount,
       100.0 * SUM(ICount) / SUM(sum(Icount)) over () as Percentage
FROM Table 
WHERE DateId  = 20130618
group by BrandId;

This does the group by for brand. And it calculates the "Percentage". This version should produce a number between 0 and 100.


Examples related to sql

Passing multiple values for same variable in stored procedure SQL permissions for roles Generic XSLT Search and Replace template Access And/Or exclusions Pyspark: Filter dataframe based on multiple conditions Subtracting 1 day from a timestamp date PYODBC--Data source name not found and no default driver specified select rows in sql with latest date for each ID repeated multiple times ALTER TABLE DROP COLUMN failed because one or more objects access this column Create Local SQL Server database

Examples related to sql-server

Passing multiple values for same variable in stored procedure SQL permissions for roles Count the Number of Tables in a SQL Server Database Visual Studio 2017 does not have Business Intelligence Integration Services/Projects ALTER TABLE DROP COLUMN failed because one or more objects access this column Create Local SQL Server database How to create temp table using Create statement in SQL Server? SQL Query Where Date = Today Minus 7 Days How do I pass a list as a parameter in a stored procedure? SQL Server date format yyyymmdd

Examples related to tsql

Passing multiple values for same variable in stored procedure Count the Number of Tables in a SQL Server Database Change Date Format(DD/MM/YYYY) in SQL SELECT Statement Stored procedure with default parameters Format number as percent in MS SQL Server EXEC sp_executesql with multiple parameters SQL Server after update trigger How to compare datetime with only date in SQL Server Text was truncated or one or more characters had no match in the target code page including the primary key in an unpivot Printing integer variable and string on same line in SQL