SQL Server Difference between PARTITION BY and GROUP BY

Question

I ve been using GROUP BY for all types of aggregate queries over the years   Recently  I ve been reverse-engineering some code that uses PARTITION BY to perform aggregations   In reading through all the documentation I can find about PARTITION BY  it sounds a lot like GROUP BY  maybe with a little extra functionality added in   Are they two versions of the same general functionality  or are they something different entirely

User · Answer

partition by doesn t actually roll up the data   It allows you to reset something on a per group basis   For example  you can get an ordinal column within a group by partitioning on the grouping field and using rownum   over the rows within that group   This gives you something that behaves a bit like an identity column that resets at the beginning of each group

User · Answer

They re used in different places   group by modifies the entire query  like   select customerId  count    as orderCount from Orders group by customerId   But partition by just works on a window function  like row number   select row number   over  partition by customerId order by orderId      as OrderNumberForThisCustomer from Orders   A group by normally reduces the number of rows returned by rolling them up and calculating averages or sums for each row   partition by does not affect the number of rows returned  but it changes how a window function s result is calculated

User · Answer

As of my understanding Partition By is almost identical to Group By, but with the following differences:

That group by actually groups the result set returning one row per group, which results therefore in SQL Server only allowing in the SELECT list aggregate functions or columns that are part of the group by clause (in which case SQL Server can guarantee that there are unique results for each group).

Consider for example MySQL which allows to have in the SELECT list columns that are not defined in the Group By clause, in which case one row is still being returned per group, however if the column doesn't have unique results then there is no guarantee what will be the output!

But with Partition By, although the results of the function are identical to the results of an aggregate function with Group By, still you are getting the normal result set, which means that one is getting one row per underlying row, and not one row per group, and because of this one can have columns that are not unique per group in the SELECT list.

So as a summary, Group By would be best when needs an output of one row per group, and Partition By would be best when one needs all the rows but still wants the aggregate function based on a group.

Of course there might also be performance issues, see http://social.msdn.microsoft.com/Forums/ms-MY/transactsql/thread/0b20c2b5-1607-40bc-b7a7-0c60a2a55fba.

User · Answer

PARTITION BY is analytic  while GROUP BY is aggregate   In order to use PARTITION BY  you have to contain it with an OVER clause

User · Answer

It provides rolled-up data without rolling up

i.e. Suppose I want to return the relative position of sales region

Using PARTITION BY, I can return the sales amount for a given region and the MAX amount across all sales regions in the same row.

This does mean you will have repeating data, but it may suit the end consumer in the sense that data has been aggregated but no data has been lost - as would be the case with GROUP BY.

User · Answer

-- BELOW IS A SAMPLE WHICH OUTLINES THE SIMPLE DIFFERENCES -- READ IT AND THEN EXECUTE IT -- THERE ARE THREE ROWS OF EACH COLOR INSERTED INTO THE TABLE -- CREATE A database called testDB   -- use testDB USE  TestDB  GO   -- create Paints table CREATE TABLE  dbo   Paints        Color   varchar  50  NULL       glossLevel   varchar  50  NULL   ON  PRIMARY   GO   -- Populate Table insert into paints  color  glossLevel  select  red    eggshell  union select  red    glossy  union select  red    flat  union select  blue    eggshell  union select  blue    glossy  union select  blue    flat  union select  orange    glossy  union select  orange    flat  union select  orange    eggshell  union select  green    eggshell  union select  green    glossy  union select  green    flat  union select  black    eggshell  union select  black    glossy  union select  black    flat  union select  purple    eggshell  union select  purple    glossy  union select  purple    flat  union select  salmon    eggshell  union select  salmon    glossy  union select  salmon    flat         COMPARE  GROUP BY  color to  OVER  PARTITION BY Color        -- GROUP BY Color  -- row quantity defined by group by -- aggregate  count     defined by group by select count    from paints group by color  -- OVER  PARTITION BY    Color  -- row quantity defined by main query -- aggregate defined by OVER-PARTITION BY select color   glossLevel   count    OVER  Partition by color  from paints     COMPARE  GROUP BY  color  glossLevel to  OVER  PARTITION BY Color  GlossLevel        -- GROUP BY Color  GlossLevel -- row quantity defined by GROUP BY -- aggregate  count     defined by GROUP BY select count    from paints group by color  glossLevel    -- Partition by Color  GlossLevel -- row quantity defined by main query -- aggregate  count     defined by OVER-PARTITION BY select color   glossLevel   count    OVER  Partition by color  glossLevel  from paints

User · Answer

It has really different usage scenarios  When you use GROUP BY you merge some of the records for the columns that are same and you have an aggregation of the result set   However when you use PARTITION BY your result set is same but you just have an aggregation over the window functions and you don t merge the records  you will still have the same count of records   Here is a rally helpful article explaining the difference  http   alevryustemov com sql sql-partition-by

User · Answer

Suppose we have 14 records of name column in table  in group by  select name count    as totalcount from person where name  Please fill out  group BY name    it will give count in single row i e 14  but in partition by  select row number   over  partition by name  as total from person where name    Please fill out     it will 14 rows of increase in count

User · Answer

PARTITION BY    Divides the result set into partitions  The window function is applied to each partition separately and computation restarts for each partition    Found at this link  OVER Clause

User · Answer

When you use GROUP BY  the resulting rows will be usually less then incoming rows   But  when you use PARTITION BY  the resulting row count should be the same as incoming

User · Answer

Small observation  Automation mechanism to dynamically generate SQL using the  partition by  it is much simpler to implement in relation to the  group by   In the case of  group by   We must take care of the content of  select  column   Sorry for My English

User · Answer

We can take a simple example   Consider a table named TableA with the following values   id  firstname                   lastname                    Mark ------------------------------------------------------------------- 1   arun                        prasanth                    40 2   ann                         antony                      45 3   sruthy                      abc                         41 6   new                         abc                         47 1   arun                        prasanth                    45 1   arun                        prasanth                    49 2   ann                         antony                      49   GROUP BY      The SQL GROUP BY clause can be used in a SELECT statement to collect   data across multiple records and group the results by one or more   columns       In more simple words  GROUP BY statement is used in conjunction with   the aggregate functions to group the result-set by one or more   columns    Syntax   SELECT expression1  expression2      expression n          aggregate function  aggregate expression  FROM tables WHERE conditions GROUP BY expression1  expression2      expression n    We can apply GROUP BY in our table   select SUM Mark marksum firstname from TableA group by id firstName   Results   marksum  firstname ---------------- 94      ann                       134     arun                      47      new                       41      sruthy      In our real table we have 7 rows and when we apply GROUP BY id  the server group the results based on id   In simple words      here GROUP BY normally reduces the number of rows returned by rolling   them up and calculating Sum   for each row    PARTITION BY  Before going to PARTITION BY  let us look at the OVER clause   According to the MSDN definition      OVER clause defines a window or user-specified set of rows within a   query result set  A window function then computes a value for each row   in the window  You can use the OVER clause with functions to compute   aggregated values such as moving averages  cumulative aggregates    running totals  or a top N per group results    PARTITION BY will not reduce the number of rows returned   We can apply PARTITION BY in our example table   SELECT SUM Mark  OVER  PARTITION BY id  AS marksum  firstname FROM TableA   Result   marksum firstname  ------------------- 134     arun                      134     arun                      134     arun                      94      ann                       94      ann                       41      sruthy                    47      new     Look at the results - it will partition the rows and returns all rows  unlike GROUP BY

[sql-server] SQL Server: Difference between PARTITION BY and GROUP BY

The answer is

Examples related to sql-server

Examples related to tsql

Examples related to aggregate-functions

Examples related to window-functions

Tags