The SQL OVER clause - when and why is it useful

Question

USE AdventureWorks2008R2  GO SELECT SalesOrderID  ProductID  OrderQty      SUM OrderQty  OVER PARTITION BY SalesOrderID  AS  Total       AVG OrderQty  OVER PARTITION BY SalesOrderID  AS  Avg       COUNT OrderQty  OVER PARTITION BY SalesOrderID  AS  Count       MIN OrderQty  OVER PARTITION BY SalesOrderID  AS  Min       MAX OrderQty  OVER PARTITION BY SalesOrderID  AS  Max  FROM Sales SalesOrderDetail  WHERE SalesOrderID IN 43659 43664     I read about that clause and I don t understand why I need it  What does the function Over do  What does Partitioning By do  Why can t I make a query with writing Group By SalesOrderID

User · Answer

prkey   whatsthat               cash    890     abb                    32  32 43      abbz                   2   34 4       bttu                   1   35 45      gasstuff               2   37 545     gasz                   5   42 80009   hoo                    9   51 2321    ibm                    1   52 998     krk                    2   54 42      kx-5010                2   56 32      lto                    4   60 543     mp                     5   65 465     multipower             2   67 455     O N                    1   68 7887    prem                   7   75 434     puma                   3   78 23      retractble             3   81 242     Trujillo s stuff       4   85   That s a result of query  Table used as source is the same exept that it has no last column  This column is a moving sum of third one   Query   SELECT prkey whatsthat cash SUM cash  over  order by whatsthat      FROM public iuk order by whatsthat prkey          table goes as public iuk   sql version   2012   It s a little over dbase 1986  level  I don t know why 25  years has been needed to finish it up

User · Answer

You can use GROUP BY SalesOrderID  The difference is  with GROUP BY you can only have the aggregated values for the columns that are not included in GROUP BY   In contrast  using windowed aggregate functions instead of GROUP BY  you can retrieve both aggregated and non-aggregated values  That is  although you are not doing that in your example query  you could retrieve both individual OrderQty values and their sums  counts  averages etc  over groups of same SalesOrderIDs   Here s a practical example of why windowed aggregates are great  Suppose you need to calculate what percent of a total every value is  Without windowed aggregates you d have to first derive a list of aggregated values and then join it back to the original rowset  i e  like this   SELECT   orig  Partition     orig Value    orig Value   100 0   agg TotalValue AS ValuePercent FROM OriginalRowset orig   INNER JOIN       SELECT        Partition         SUM Value  AS TotalValue     FROM OriginalRowset     GROUP BY  Partition      agg ON orig  Partition    agg  Partition    Now look how you can do the same with a windowed aggregate   SELECT    Partition     Value    Value   100 0   SUM Value  OVER  PARTITION BY  Partition   AS ValuePercent FROM OriginalRowset orig   Much easier and cleaner  isn t it

User · Answer

So in simple words  Over clause can be used to select non aggregated values along with Aggregated ones  Partition BY  ORDER BY inside  and ROWS or RANGE are part of OVER   by clause  partition by is used to partition data and then perform these window  aggregated functions  and if we don t have partition by the then entire result set is considered as a single partition  OVER clause can be used with Ranking Functions Rank  Row Number  Dense Rank     Aggregate Functions like  AVG  Max  Min  SUM   etc  and Analytics Functions like  First Value  Last Value  and few others   Let s See basic syntax of OVER clause OVER                lt PARTITION BY clause gt                lt ORDER BY clause gt                 lt ROW or RANGE clause gt                 PARTITION BY  It is used to partition data and perform operations on groups with the same data  ORDER BY  It is used to define the logical order of data in Partitions  When we don t specify Partition  entire resultset is considered as a single partition   This can be used to specify what rows are supposed to be considered in a partition when performing the operation  Let s take an example  Here is my dataset  Id          Name                                               Gender     Salary ----------- -------------------------------------------------- ---------- ----------- 1           Mark                                               Male       5000 2           John                                               Male       4500 3           Pavan                                              Male       5000 4           Pam                                                Female     5500 5           Sara                                               Female     4000 6           Aradhya                                            Female     3500 7           Tom                                                Male       5500 8           Mary                                               Female     5000 9           Ben                                                Male       6500 10          Jodi                                               Female     7000 11          Tom                                                Male       5500 12          Ron                                                Male       5000  So let me execute different scenarios and see how data is impacted and I ll come from difficult syntax to simple one Select   SUM salary  Over order by salary RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW  as sum sal from employees  Id          Name                                               Gender     Salary      sum sal ----------- -------------------------------------------------- ---------- ----------- ----------- 6           Aradhya                                            Female     3500        3500 5           Sara                                               Female     4000        7500 2           John                                               Male       4500        12000 3           Pavan                                              Male       5000        32000 1           Mark                                               Male       5000        32000 8           Mary                                               Female     5000        32000 12          Ron                                                Male       5000        32000 11          Tom                                                Male       5500        48500 7           Tom                                                Male       5500        48500 4           Pam                                                Female     5500        48500 9           Ben                                                Male       6500        55000 10          Jodi                                               Female     7000        62000  Just observe the sum sal part  Here I am using order by Salary and using  quot RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW quot   In this case  we are not using partition so entire data will be treated as one partition and we are ordering on salary  And the important thing here is UNBOUNDED PRECEDING AND CURRENT ROW  This means when we are calculating the sum  from starting row to the current row for each row  But if we see rows with salary 5000 and name  quot Pavan quot   ideally it should be 17000 and for salary 5000 and name Mark  it should be 22000  But as we are using RANGE and in this case  if it finds any similar elements then it considers them as the same logical group and performs an operation on them and assigns value to each item in that group  That is the reason why we have the same value for salary 5000  The engine went up to salary 5000 and Name Ron and calculated sum and then assigned it to all salary 5000  Select   SUM salary  Over order by salary ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW  as sum sal from employees      Id          Name                                               Gender     Salary      sum sal ----------- -------------------------------------------------- ---------- ----------- ----------- 6           Aradhya                                            Female     3500        3500 5           Sara                                               Female     4000        7500 2           John                                               Male       4500        12000 3           Pavan                                              Male       5000        17000 1           Mark                                               Male       5000        22000 8           Mary                                               Female     5000        27000 12          Ron                                                Male       5000        32000 11          Tom                                                Male       5500        37500 7           Tom                                                Male       5500        43000 4           Pam                                                Female     5500        48500 9           Ben                                                Male       6500        55000 10          Jodi                                               Female     7000        62000  So with ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW The difference is for same value items instead of grouping them together  It calculates SUM from starting row to current row and it doesn t treat items with same value differently like RANGE Select   SUM salary  Over order by salary  as sum sal from employees  Id          Name                                               Gender     Salary      sum sal ----------- -------------------------------------------------- ---------- ----------- ----------- 6           Aradhya                                            Female     3500        3500 5           Sara                                               Female     4000        7500 2           John                                               Male       4500        12000 3           Pavan                                              Male       5000        32000 1           Mark                                               Male       5000        32000 8           Mary                                               Female     5000        32000 12          Ron                                                Male       5000        32000 11          Tom                                                Male       5500        48500 7           Tom                                                Male       5500        48500 4           Pam                                                Female     5500        48500 9           Ben                                                Male       6500        55000 10          Jodi                                               Female     7000        62000  These results are the same as Select    SUM salary  Over order by salary RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW  as sum sal from employees  That is because Over order by salary  is just a short cut of Over order by salary RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW  So wherever we simply specify Order by without ROWS or RANGE it is taking RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW as default  Note  This is applicable only to Functions that actually accept RANGE ROW  For example  ROW NUMBER and few others don t accept RANGE ROW and in that case  this doesn t come into the picture  Till now we saw that Over clause with an order by is taking Range ROWS and syntax looks something like this RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW And it is actually calculating up to the current row from the first row  But what If it wants to calculate values for the entire partition of data and have it for each column  that is from 1st row to last row   Here is the query for that Select   sum salary  Over order by salary ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING  as sum sal from employees  Id          Name                                               Gender     Salary      sum sal ----------- -------------------------------------------------- ---------- ----------- ----------- 1           Mark                                               Male       5000        62000 2           John                                               Male       4500        62000 3           Pavan                                              Male       5000        62000 4           Pam                                                Female     5500        62000 5           Sara                                               Female     4000        62000 6           Aradhya                                            Female     3500        62000 7           Tom                                                Male       5500        62000 8           Mary                                               Female     5000        62000 9           Ben                                                Male       6500        62000 10          Jodi                                               Female     7000        62000 11          Tom                                                Male       5500        62000 12          Ron                                                Male       5000        62000  Instead of CURRENT ROW  I am specifying UNBOUNDED FOLLOWING which instructs the engine to calculate till the last record of partition for each row  Now coming to your point on what is OVER   with empty braces  It is just a short cut for Over order by salary ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING  Here we are indirectly specifying to treat all my resultset as a single partition and then perform calculations from the first record to the last record of each partition  Select   Sum salary  Over   as sum sal from employees  Id          Name                                               Gender     Salary      sum sal ----------- -------------------------------------------------- ---------- ----------- ----------- 1           Mark                                               Male       5000        62000 2           John                                               Male       4500        62000 3           Pavan                                              Male       5000        62000 4           Pam                                                Female     5500        62000 5           Sara                                               Female     4000        62000 6           Aradhya                                            Female     3500        62000 7           Tom                                                Male       5500        62000 8           Mary                                               Female     5000        62000 9           Ben                                                Male       6500        62000 10          Jodi                                               Female     7000        62000 11          Tom                                                Male       5500        62000 12          Ron                                                Male       5000        62000  I did create a video on this and if you are interested you can visit it  https   www youtube com watch v CvVenuVUqto amp t 1177s Thanks  Pavan Kumar Aryasomayajulu HTTP   xyzcoder github io

User · Answer

The OVER clause is powerful in that you can have aggregates over different ranges   windowing    whether you use a GROUP BY or not  Example  get count per SalesOrderID and count of all  SELECT     SalesOrderID  ProductID  OrderQty      COUNT OrderQty  AS  Count       COUNT    OVER    AS  CountAll  FROM Sales SalesOrderDetail  WHERE      SalesOrderID IN 43659 43664  GROUP BY      SalesOrderID  ProductID  OrderQty   Get different COUNTs  no GROUP BY  SELECT     SalesOrderID  ProductID  OrderQty      COUNT OrderQty  OVER PARTITION BY SalesOrderID  AS  CountQtyPerOrder       COUNT OrderQty  OVER PARTITION BY ProductID  AS  CountQtyPerProduct        COUNT    OVER    AS  CountAllAgain  FROM Sales SalesOrderDetail  WHERE      SalesOrderID IN 43659 43664

User · Answer

Let me explain with an example and you would be able to see how it works   Assuming you have the following table DIM EQUIPMENT   VIN         MAKE    MODEL   YEAR    COLOR ----------------------------------------- 1234ASDF    Ford    Taurus  2008    White 1234JKLM    Chevy   Truck   2005    Green 5678ASDF    Ford    Mustang 2008    Yellow   Run below SQL  SELECT VIN    MAKE    MODEL    YEAR    COLOR     COUNT    OVER  PARTITION BY YEAR  AS COUNT2 FROM DIM EQUIPMENT   The result would be as below  VIN         MAKE    MODEL   YEAR    COLOR     COUNT2  ----------------------------------------------   1234JKLM    Chevy   Truck   2005    Green     1 5678ASDF    Ford    Mustang 2008    Yellow    2 1234ASDF    Ford    Taurus  2008    White     2   See what happened   You are able to count without Group By on YEAR and Match with ROW   Another Interesting WAY to get same result if as below using WITH Clause  WITH works as in-line VIEW and can simplify the query especially complex ones  which is not the case here though since I am just trying to show usage   WITH EQ AS     SELECT YEAR AS YEAR2  COUNT    AS COUNT2 FROM DIM EQUIPMENT GROUP BY YEAR     SELECT VIN    MAKE    MODEL    YEAR    COLOR    COUNT2 FROM DIM EQUIPMENT    EQ WHERE EQ YEAR2 DIM EQUIPMENT YEAR

User · Answer

If you only wanted to GROUP BY the SalesOrderID then you wouldn t be able to include the ProductID and OrderQty columns in the SELECT clause   The PARTITION BY clause let s you break up your aggregate functions  One obvious and useful example would be if you wanted to generate line numbers for order lines on an order   SELECT     O order id      O order date      ROW NUMBER   OVER PARTITION BY O order id  AS line item no      OL product id FROM     Orders O INNER JOIN Order Lines OL ON OL order id   O order id    My syntax might be off slightly   You would then get back something like   order id    order date    line item no    product id --------    ----------    ------------    ----------     1       2011-05-02         1              5     1       2011-05-02         2              4     1       2011-05-02         3              7     2       2011-05-12         1              8     2       2011-05-12         2              1

User · Answer

The OVER clause when combined with PARTITION BY state that the preceding function call must be done analytically by evaluating the returned rows of the query   Think of it as an inline GROUP BY statement   OVER  PARTITION BY SalesOrderID  is stating that for SUM  AVG  etc    function  return the value OVER a subset of the returned records from the query  and PARTITION that subset BY the foreign key SalesOrderID   So we will SUM every OrderQty record for EACH UNIQUE SalesOrderID  and that column name will be called  Total    It is a MUCH more efficient means than using multiple inline views to find out the same information   You can put this query within an inline view and filter on Total then   SELECT      FROM  your query  inlineview WHERE Total  lt  200

User · Answer

Also Called Query Petition Clause  Similar to the Group By Clause   break up data into chunks  or partitions  separate by partition bounds function performs within partitions re-initialised when crossing parting boundary    Syntax  function       OVER  PARTITION BY col1 col3        Functions   Familiar functions such as COUNT    SUM    MIN    MAX    etc New Functions as well  eg ROW NUMBER    RATION TO REOIRT    etc       More info with example   http   msdn microsoft com en-us library ms189461 aspx

[mysql] The SQL OVER() clause - when and why is it useful?

Examples related to mysql

Examples related to sql

Examples related to sql-server

Examples related to aggregate-functions

Examples related to clause