[sql-server] SQL Server: Difference between PARTITION BY and GROUP BY

We can take a simple example.

Consider a table named TableA with the following values:

id  firstname                   lastname                    Mark
-------------------------------------------------------------------
1   arun                        prasanth                    40
2   ann                         antony                      45
3   sruthy                      abc                         41
6   new                         abc                         47
1   arun                        prasanth                    45
1   arun                        prasanth                    49
2   ann                         antony                      49

GROUP BY

The SQL GROUP BY clause can be used in a SELECT statement to collect data across multiple records and group the results by one or more columns.

In more simple words GROUP BY statement is used in conjunction with the aggregate functions to group the result-set by one or more columns.

Syntax:

SELECT expression1, expression2, ... expression_n, 
       aggregate_function (aggregate_expression)
FROM tables
WHERE conditions
GROUP BY expression1, expression2, ... expression_n;

We can apply GROUP BY in our table:

select SUM(Mark)marksum,firstname from TableA
group by id,firstName

Results:

marksum  firstname
----------------
94      ann                      
134     arun                     
47      new                      
41      sruthy   

In our real table we have 7 rows and when we apply GROUP BY id, the server group the results based on id:

In simple words:

here GROUP BY normally reduces the number of rows returned by rolling them up and calculating Sum() for each row.

PARTITION BY

Before going to PARTITION BY, let us look at the OVER clause:

According to the MSDN definition:

OVER clause defines a window or user-specified set of rows within a query result set. A window function then computes a value for each row in the window. You can use the OVER clause with functions to compute aggregated values such as moving averages, cumulative aggregates, running totals, or a top N per group results.

PARTITION BY will not reduce the number of rows returned.

We can apply PARTITION BY in our example table:

SELECT SUM(Mark) OVER (PARTITION BY id) AS marksum, firstname FROM TableA

Result:

marksum firstname 
-------------------
134     arun                     
134     arun                     
134     arun                     
94      ann                      
94      ann                      
41      sruthy                   
47      new  

Look at the results - it will partition the rows and returns all rows, unlike GROUP BY.

Examples related to sql-server

Passing multiple values for same variable in stored procedure SQL permissions for roles Count the Number of Tables in a SQL Server Database Visual Studio 2017 does not have Business Intelligence Integration Services/Projects ALTER TABLE DROP COLUMN failed because one or more objects access this column Create Local SQL Server database How to create temp table using Create statement in SQL Server? SQL Query Where Date = Today Minus 7 Days How do I pass a list as a parameter in a stored procedure? SQL Server date format yyyymmdd

Examples related to tsql

Passing multiple values for same variable in stored procedure Count the Number of Tables in a SQL Server Database Change Date Format(DD/MM/YYYY) in SQL SELECT Statement Stored procedure with default parameters Format number as percent in MS SQL Server EXEC sp_executesql with multiple parameters SQL Server after update trigger How to compare datetime with only date in SQL Server Text was truncated or one or more characters had no match in the target code page including the primary key in an unpivot Printing integer variable and string on same line in SQL

Examples related to aggregate-functions

Spark SQL: apply aggregate functions to a list of columns GROUP BY without aggregate function GROUP BY + CASE statement must appear in the GROUP BY clause or be used in an aggregate function Naming returned columns in Pandas aggregate function? Concatenate multiple result rows of one column into one, group by another column How to include "zero" / "0" results in COUNT aggregate? Apply multiple functions to multiple groupby columns Reason for Column is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause Optimal way to concatenate/aggregate strings

Examples related to window-functions

What is ROWS UNBOUNDED PRECEDING used for in Teradata? Pandas get topmost n records within each group Select Row number in postgres What's the difference between RANK() and DENSE_RANK() functions in oracle? PostgreSQL unnest() with element number SQL Server: Difference between PARTITION BY and GROUP BY OVER clause in Oracle Oracle "Partition By" Keyword