Trouble using ROW NUMBER OVER PARTITION BY

Question

I m using SQL Server 2008 R2  I have table called EmployeeHistory with the following structure and sample data   EmployeeID Date      DepartmentID SupervisorID 10001      20130101  001          10009 10001      20130909  001          10019 10001      20131201  002          10018 10001      20140501  002          10017 10001      20141001  001          10015 10001      20141201  001          10014   Notice that the Employee 10001 has been changing 2 departments and several supervisors over time  What I am trying to do is to list the start and end dates of this employee s employment in each Department ordered by the Date field  So  the output will look like this   EmployeeID DateStart DateEnd  DepartmentID  10001      20130101  20131201 001 10001      20131201  20141001 002 10001      20141001  NULL     001   I intended to use partitioning the data using the following query but it failed  The Department changes from 001 to 002 and then back to 001  Obviously I cannot partition by DepartmentID    I m sure I m overlooking the obvious  Any help  Thank you  in advance   SELECT    ROW NUMBER   OVER  PARTITION BY EmployeeID  DepartmentID ORDER BY  Date   RN FROM EmployeeHistory

User · Accepted Answer

A bit involved  Easiest would be to refer to this SQL Fiddle I created for you that produces the exact result  There are ways you can improve it for performance or other considerations  but this should hopefully at least be clearer than some alternatives   The gist is  you get a canonical ranking of your data first  then use that to segment the data into groups  then find an end date for each group  then eliminate any intermediate rows  ROW NUMBER   and CROSS APPLY help a lot in doing it readably     EDIT 2019   The SQL Fiddle does in fact seem to be broken  for some reason  but it appears to be a problem on the SQL Fiddle site  Here s a complete version  tested just now on SQL Server 2016   CREATE TABLE Source     EmployeeID int    DateStarted date    DepartmentID int    INSERT INTO Source VALUES  10001  2013-01-01  001    10001  2013-09-09  001    10001  2013-12-01  002    10001  2014-05-01  002    10001  2014-10-01  001    10001  2014-12-01  001    SELECT       ROW NUMBER   OVER  PARTITION BY EmployeeID ORDER BY DateStarted  AS EntryRank    newid   as GroupKey    CAST NULL AS date  AS EndDate INTO  RankedData FROM Source    UPDATE  RankedData SET GroupKey   beginDate GroupKey FROM  RankedData sup   CROSS APPLY          SELECT TOP 1 GroupKey     FROM  RankedData sub      WHERE sub EmployeeID   sup EmployeeID AND       sub DepartmentID   sup DepartmentID AND       NOT EXISTS                      SELECT              FROM  RankedData bot            WHERE bot EmployeeID   sup EmployeeID AND             bot EntryRank BETWEEN sub EntryRank AND sup EntryRank AND             bot DepartmentID  lt  gt  sup DepartmentID                 ORDER BY DateStarted ASC       beginDate  GroupKey    UPDATE  RankedData SET EndDate   nextGroup DateStarted FROM  RankedData sup   CROSS APPLY          SELECT TOP 1 DateStarted     FROM  RankedData sub     WHERE sub EmployeeID   sup EmployeeID AND       sub DepartmentID  lt  gt  sup DepartmentID AND       sub EntryRank  gt  sup EntryRank     ORDER BY EntryRank ASC     nextGroup  DateStarted    SELECT   FROM    SELECT    ROW NUMBER   OVER  PARTITION BY GroupKey ORDER BY EntryRank ASC  AS GroupRank FROM  RankedData   FinalRanking WHERE GroupRank   1 ORDER BY EntryRank   DROP TABLE  RankedData DROP TABLE Source

User · Answer

I would do something like this    WITH x   AS  SELECT                 Row number                  OVER                   partition BY employeeid                  ORDER BY datestart  rn       FROM   employeehistory   SELECT    FROM   x x1     LEFT OUTER JOIN x x2                  ON x1 rn   x2 rn   1    Or maybe it would be x2 rn - 1  You ll have to see  In any case  you get the idea  Once you have the table joined on itself  you can filter  group  sort  etc  to get what you need

User · Answer

It looks like a common gaps-and-islands problem  The difference between two sequences of row numbers rn1 and rn2 give the  group  number   Run this query CTE-by-CTE and examine intermediate results to see how it works   Sample data  I expanded sample data from the question a little   DECLARE  Source TABLE       EmployeeID int      DateStarted date      DepartmentID int    INSERT INTO  Source VALUES  10001  2013-01-01  001    10001  2013-09-09  001    10001  2013-12-01  002    10001  2014-05-01  002    10001  2014-10-01  001    10001  2014-12-01  001     10005  2013-05-01  001    10005  2013-11-09  001    10005  2013-12-01  002    10005  2014-10-01  001    10005  2016-12-01  001     Query for SQL Server 2008  There is no LEAD function in SQL Server 2008  so I had to use self-join via OUTER APPLY to get the value of the  next  row for the DateEnd   WITH CTE AS       SELECT         EmployeeID          DateStarted          DepartmentID          ROW NUMBER   OVER  PARTITION BY EmployeeID ORDER BY DateStarted  AS rn1          ROW NUMBER   OVER  PARTITION BY EmployeeID  DepartmentID ORDER BY DateStarted  AS rn2     FROM  Source    CTE Groups AS       SELECT         EmployeeID          MIN DateStarted  AS DateStart          DepartmentID     FROM CTE     GROUP BY         EmployeeID          DepartmentID          rn1 - rn2   SELECT     CTE Groups EmployeeID      CTE Groups DepartmentID      CTE Groups DateStart      A DateEnd FROM     CTE Groups     OUTER APPLY               SELECT TOP 1  G2 DateStart AS DateEnd         FROM CTE Groups AS G2         WHERE             G2 EmployeeID   CTE Groups EmployeeID             AND G2 DateStart  gt  CTE Groups DateStart         ORDER BY G2 DateStart       AS A ORDER BY     EmployeeID      DateStart     Query for SQL Server 2012   Starting with SQL Server 2012 there is a LEAD function that makes this task more efficient   WITH CTE AS       SELECT         EmployeeID          DateStarted          DepartmentID          ROW NUMBER   OVER  PARTITION BY EmployeeID ORDER BY DateStarted  AS rn1          ROW NUMBER   OVER  PARTITION BY EmployeeID  DepartmentID ORDER BY DateStarted  AS rn2     FROM  Source    CTE Groups AS       SELECT         EmployeeID          MIN DateStarted  AS DateStart          DepartmentID     FROM CTE     GROUP BY         EmployeeID          DepartmentID          rn1 - rn2   SELECT     CTE Groups EmployeeID      CTE Groups DepartmentID      CTE Groups DateStart      LEAD CTE Groups DateStart  OVER  PARTITION BY CTE Groups EmployeeID ORDER BY CTE Groups DateStart  AS DateEnd FROM     CTE Groups ORDER BY     EmployeeID      DateStart     Result   ------------ -------------- ------------ ------------    EmployeeID   DepartmentID   DateStart     DateEnd      ------------ -------------- ------------ ------------         10001              1   2013-01-01   2013-12-01          10001              2   2013-12-01   2014-10-01          10001              1   2014-10-01   NULL                10005              1   2013-05-01   2013-12-01          10005              2   2013-12-01   2014-10-01          10005              1   2014-10-01   NULL          ------------ -------------- ------------ ------------

[sql] Trouble using ROW_NUMBER() OVER (PARTITION BY ...)

Examples related to sql

Examples related to sql-server

Examples related to sql-server-2008

Examples related to row-number

Examples related to gaps-and-islands