Function to Calculate Median in SQL Server

Question

According to MSDN  Median is not available as an aggregate function in Transact-SQL  However  I would like to find out whether it is possible to create this functionality  using the Create Aggregate function  user defined function  or some other method     What would be the best way  if possible  to do this - allow for the calculation of a median value  assuming a numeric data type  in an aggregate query

User · Answer

Even better   SELECT  Median   AVG 1 0   val  FROM       SELECT o val  rn   ROW NUMBER   OVER  ORDER BY o val   c c     FROM dbo EvenRows AS o     CROSS JOIN  SELECT c   COUNT    FROM dbo EvenRows  AS c   AS x WHERE rn IN   c   1  2   c   2  2     From the master Himself  Itzik Ben-Gan

User · Answer

In a UDF  write    Select Top 1 medianSortColumn from Table T   Where  Select Count    from Table          Where MedianSortColumn  lt              Select Count    From Table    2    Order By medianSortColumn

User · Answer

I just came across this page while looking for a set based solution to median  After looking at some of the solutions here  I came up with the following  Hope is helps works   DECLARE  test TABLE      i int identity 1 1       id int      score float    INSERT INTO  test  id score  VALUES  1 10  INSERT INTO  test  id score  VALUES  1 11  INSERT INTO  test  id score  VALUES  1 15  INSERT INTO  test  id score  VALUES  1 19  INSERT INTO  test  id score  VALUES  1 20   INSERT INTO  test  id score  VALUES  2 20  INSERT INTO  test  id score  VALUES  2 21  INSERT INTO  test  id score  VALUES  2 25  INSERT INTO  test  id score  VALUES  2 29  INSERT INTO  test  id score  VALUES  2 30   INSERT INTO  test  id score  VALUES  3 20  INSERT INTO  test  id score  VALUES  3 21  INSERT INTO  test  id score  VALUES  3 25  INSERT INTO  test  id score  VALUES  3 29   DECLARE  counts TABLE      id int      cnt int    INSERT INTO  counts       id      cnt   SELECT     id      COUNT    FROM      test GROUP BY     id  SELECT     drv id      drv start      AVG t score  FROM               SELECT             MIN t i -1 AS start              t id         FROM              test t         GROUP BY             t id       drv     INNER JOIN  test t ON drv id   t id     INNER JOIN  counts c ON t id   c id WHERE     t i     c cnt 1  2  drv start     OR           t i      c cnt 1  2      c cnt 2  2   drv start         AND   c cnt 1  2      c cnt 2  2   lt  gt  0       GROUP BY     drv id      drv start

User · Answer

Frequently  we may need to calculate Median not just for the whole table  but for aggregates with respect to some ID  In other words  calculate median for each ID in our table  where each ID has many records   based on the solution edited by  gdoron  good performance and works in many SQL   SELECT our id  AVG 1 0   our val  as Median FROM   SELECT our id  our val     COUNT    OVER  PARTITION BY our id  AS cnt    ROW NUMBER   OVER  PARTITION BY our id ORDER BY our val  AS rnk   FROM our table   AS x WHERE rnk IN   cnt   1  2   cnt   2  2  GROUP BY our id    Hope it helps

User · Answer

If you want to use the Create Aggregate function in SQL Server  this is how to do it   Doing it this way has the benefit of being able to write clean queries   Note this this process could be adapted to calculate a Percentile value fairly easily   Create a new Visual Studio project and set the target framework to  NET 3 5  this is for SQL 2008  it may be different in SQL 2012    Then create a class file and put in the following code  or c  equivalent   Imports Microsoft SqlServer Server Imports System Data SqlTypes Imports System IO   lt Serializable gt   lt SqlUserDefinedAggregate Format UserDefined  IsInvariantToNulls  True  IsInvariantToDuplicates  False      IsInvariantToOrder  True  MaxByteSize  -1  IsNullIfEmpty  True  gt  Public Class Median   Implements IBinarySerialize   Private  items As List Of Decimal     Public Sub Init        items   New List Of Decimal      End Sub    Public Sub Accumulate value As SqlDecimal      If Not value IsNull Then        items Add value Value      End If   End Sub    Public Sub Merge other As Median      If other  items IsNot Nothing Then        items AddRange other  items      End If   End Sub    Public Function Terminate   As SqlDecimal     If  items Count  lt  gt  0 Then       Dim result As Decimal        items    items OrderBy Function i  i  ToList         If  items Count Mod 2   0 Then         result      items   items Count   2  - 1       items  items Count   2      2        Else         result    items   items Count - 1    2        End If        Return New SqlDecimal result      Else       Return New SqlDecimal       End If   End Function    Public Sub Read r As BinaryReader  Implements IBinarySerialize Read      deserialize it from a string     Dim list   r ReadString        items   New List Of Decimal       For Each value In list Split    c        Dim number As Decimal       If Decimal TryParse value  number  Then          items Add number        End If     Next    End Sub    Public Sub Write w As BinaryWriter  Implements IBinarySerialize Write      serialize the list to a string     Dim list           For Each item In  items       If list  lt  gt     Then         list              End If             list    item ToString       Next     w Write list    End Sub End Class   Then compile it and copy the DLL and PDB file to your SQL Server machine and run the following command in SQL Server   CREATE ASSEMBLY CustomAggregate FROM   path to your DLL   WITH PERMISSION SET SAFE  GO  CREATE AGGREGATE Median  value decimal 9  3   RETURNS decimal 9  3   EXTERNAL NAME  CustomAggregate    namespace of your DLL  Median   GO   You can then write a query to calculate the median like this      SELECT dbo Median Field  FROM Table

User · Answer

See other solutions for median calculation in SQL here   Simple way to calculate median with MySQL   the solutions are mostly vendor-independent

User · Answer

DECLARE  Obs int DECLARE  RowAsc table   ID      INT IDENTITY  Observation  FLOAT   INSERT INTO  RowAsc SELECT Observations FROM MyTable ORDER BY 1  SELECT  Obs COUNT    2 FROM  RowAsc SELECT Observation AS Median FROM  RowAsc WHERE ID  Obs

User · Answer

Justin s example above is very good  But that Primary key need should be stated very clearly  I have seen that code in the wild without the key and the results are bad   The complaint I get about the Percentile Cont is that it wont give you an actual value from the dataset  To get to a  median  that is an actual value from the dataset use Percentile Disc   SELECT SalesOrderID  OrderQty      PERCENTILE DISC 0 5           WITHIN GROUP  ORDER BY OrderQty          OVER  PARTITION BY SalesOrderID  AS MedianCont FROM Sales SalesOrderDetail WHERE SalesOrderID IN  43670  43669  43667  43663  ORDER BY SalesOrderID DESC

User · Answer

Median Finding  This is the simplest method to find the median of an attribute    Select round S salary 4  median from employee S where  select count salary  from station where salary  lt  S salary      select count salary  from station where salary  gt  S salary

User · Answer

For newbies like myself who are learning the very basics  I personally find this example easier to follow  as it is easier to understand exactly what s happening and where median values are coming from     select    max a  Value1     min a  Value1       2 as  Median Value1     max a  Value2     min a  Value2       2 as  Median Value2   from  select     datediff dd startdate enddate  as  Value1       xxxxxxxxxxxxxx as  Value2       from dbo table1       a   In absolute awe of some of the codes above though

User · Answer

This works with SQL 2000   DECLARE  testTable TABLE         VALUE   INT   --INSERT INTO  testTable -- Even Test --SELECT 3 UNION ALL --SELECT 5 UNION ALL --SELECT 7 UNION ALL --SELECT 12 UNION ALL --SELECT 13 UNION ALL --SELECT 14 UNION ALL --SELECT 21 UNION ALL --SELECT 23 UNION ALL --SELECT 23 UNION ALL --SELECT 23 UNION ALL --SELECT 23 UNION ALL --SELECT 29 UNION ALL --SELECT 40 UNION ALL --SELECT 56  -- --INSERT INTO  testTable -- Odd Test --SELECT 3 UNION ALL --SELECT 5 UNION ALL --SELECT 7 UNION ALL --SELECT 12 UNION ALL --SELECT 13 UNION ALL --SELECT 14 UNION ALL --SELECT 21 UNION ALL --SELECT 23 UNION ALL --SELECT 23 UNION ALL --SELECT 23 UNION ALL --SELECT 23 UNION ALL --SELECT 29 UNION ALL --SELECT 39 UNION ALL --SELECT 40 UNION ALL --SELECT 56   DECLARE  RowAsc TABLE       ID      INT IDENTITY      Amount  INT    INSERT INTO  RowAsc SELECT  VALUE  FROM     testTable  ORDER BY VALUE ASC  SELECT  AVG amount  FROM  RowAsc ra WHERE ra id IN       SELECT  ID      FROM     RowAsc     WHERE   ra id -               SELECT  MAX id    2 0          FROM     RowAsc       BETWEEN 0 AND 1

User · Answer

Simple  fast  accurate   SELECT x Amount  FROM    SELECT amount                  Count 1  OVER  partition BY  A          AS TotalRows                  Row number   OVER  ORDER BY Amount ASC  AS AmountOrder          FROM   facttransaction ft  x  WHERE  x AmountOrder   Round x TotalRows   2 0  0

User · Answer

For large scale datasets  you can try this GIST   https   gist github com chrisknoll 1b38761ce8c5016ec5b2  It works by aggregating the distinct values you would find in your set  such as ages  or year of birth  etc    and uses SQL window functions to locate any percentile position you specify in the query

User · Answer

The following solution works under these assumptions    No duplicate values No NULLs   Code   IF OBJECT ID  dbo R    U   IS NOT NULL   DROP TABLE dbo R  CREATE TABLE R       A FLOAT NOT NULL    INSERT INTO R VALUES  1   INSERT INTO R VALUES  2   INSERT INTO R VALUES  3   INSERT INTO R VALUES  4   INSERT INTO R VALUES  5   INSERT INTO R VALUES  6    -- Returns Median R  select SUM A    CAST COUNT A  AS FLOAT  from R R1  where   select count A  from R R2 where R1 A  gt  R2 A            select count A  from R R2 where R1 A  lt  R2 A   OR         select count A  from R R2 where R1 A  gt  R2 A    1           select count A  from R R2 where R1 A  lt  R2 A   OR         select count A  from R R2 where R1 A  gt  R2 A            select count A  from R R2 where R1 A  lt  R2 A    1

User · Answer

Below is My solution   with tempa as          select value row number   over  order by value  as Rn    Assigning a                                                             row number               count value  over    as Cnt   Taking total count of the values        from numbers     where value is not null    Excluding the null values         tempb as              Since we don t know whether the number of rows is odd or even  we shall       consider both the scenarios         select round cnt 2  as Ref from tempa where mod cnt 2  1     union all     select round cnt 2  a Ref from tempa where mod cnt 2  0      union all     select round cnt 2    1 as Ref from tempa where mod cnt 2  0        select avg value  as Median Value    from tempa where rn in        select Ref from tempb

User · Answer

MS SQL Server 2012  and later  has the PERCENTILE DISC function which computes a specific percentile for sorted values  PERCENTILE DISC  0 5  will compute the median - https   msdn microsoft com en-us library hh231327 aspx

User · Answer

For a continuous variable measure  col1  from  table1   select col1   from      select top 50 percent col1       ROW NUMBER   OVER ORDER BY col1 ASC  AS Rowa      ROW NUMBER   OVER ORDER BY col1 DESC  AS Rowd     from table1   tmp where tmp Rowa   tmp Rowd

User · Answer

Using the COUNT aggregate  you can first count how many rows there are and store in a variable called  cnt  Then you can compute parameters for the OFFSET-FETCH filter to specify  based on qty ordering  how many rows to skip  offset value  and how many to filter  fetch value    The number of rows to skip is   cnt - 1    2  It   s clear that for an odd count this calculation is correct because you first subtract 1 for the single middle value  before you divide by 2   This also works correctly for an even count because the division used in the expression is integer division  so  when subtracting 1 from an even count  you   re left with an odd value   When dividing that odd value by 2  the fraction part of the result   5  is truncated  The number of rows to fetch is 2 -   cnt   2   The idea is that when the count is odd the result of the modulo operation is 1  and you need to fetch 1 row  When the count is even the result of the modulo operation is 0  and you need to fetch 2 rows  By subtracting the 1 or 0 result of the modulo operation from 2  you get the desired 1 or 2  respectively  Finally  to compute the median quantity  take the one or two result quantities  and apply an average after converting the input integer value to a numeric one as follows   DECLARE  cnt AS INT    SELECT COUNT    FROM  Sales   production   stocks    SELECT AVG 1 0   quantity  AS median FROM   SELECT quantity FROM  Sales   production   stocks  ORDER BY quantity OFFSET   cnt - 1    2 ROWS FETCH NEXT 2 -  cnt   2 ROWS ONLY   AS D

User · Answer

Building on Jeff Atwood s answer above here it is with GROUP BY and a correlated subquery to get the median for each group   SELECT TestID       SELECT MAX Score  FROM     SELECT TOP 50 PERCENT Score FROM Posts WHERE TestID   Posts parent TestID ORDER BY Score  AS BottomHalf       SELECT MIN Score  FROM     SELECT TOP 50 PERCENT Score FROM Posts WHERE TestID   Posts parent TestID ORDER BY Score DESC  AS TopHalf      2 AS MedianScore  AVG Score  AS AvgScore  MIN Score  AS MinScore  MAX Score  AS MaxScore FROM Posts parent GROUP BY Posts parent TestID

User · Answer

I wanted to work out a solution by myself  but my brain tripped and fell on the way  I think it works  but don t ask me to explain it in the morning   P  DECLARE  table AS TABLE       Number int not null     insert into  table select 2  insert into  table select 4  insert into  table select 9  insert into  table select 15  insert into  table select 22  insert into  table select 26  insert into  table select 37  insert into  table select 49   DECLARE  Count AS INT SELECT  Count   COUNT    FROM  table   WITH MyResults RowNo  Number  AS       SELECT RowNo  Number FROM          SELECT ROW NUMBER   OVER  ORDER BY Number  AS RowNo  Number FROM  table  AS Foo   SELECT AVG Number  FROM MyResults WHERE RowNo     Count 1  2 OR RowNo      Count 1  2       Count 2  2

User · Answer

I try with several alternatives  but due my data records has repeated values  the ROW NUMBER versions seems are not a choice for me  So here the query I used  a version with NTILE    SELECT distinct    CustomerId              MAX CASE WHEN Percent50 Asc 1 THEN TotalDue END  OVER  PARTITION BY CustomerId            MIN CASE WHEN Percent50 desc 1 THEN TotalDue END  OVER  PARTITION BY CustomerId        2 MEDIAN FROM      SELECT       CustomerId        TotalDue       NTILE 2  OVER            PARTITION BY CustomerId          ORDER BY TotalDue ASC  AS Percent50 Asc       NTILE 2  OVER            PARTITION BY CustomerId          ORDER BY TotalDue DESC  AS Percent50 desc    FROM Sales SalesOrderHeader SOH   x ORDER BY CustomerId

User · Answer

Try the below logic to find out the median   Consider a table with the below numbers  1 1 2 3 4 5  THE MEDIAN is 2 5  with tempa as        select num count num  over   as Cnt          row number   over  order by num  as Rnum     from temp   tempb as               select round cnt 2  as ref value         from tempa where mod cnt 2  lt  0         union all         select round cnt 2  from tempa where mod cnt 2  0         union all         select round cnt 2 1          from tempa where mod cnt 2  0       select avg num  from tempa where rnum in  select   from tempb

User · Answer

This is as simple an answer as I could come up with  Worked well with my data  If you want to exclude certain values just add a where clause to the inner select   SELECT TOP 1      ValueField AS MedianValue FROM      SELECT TOP SELECT COUNT 1  2 FROM tTABLE          ValueField     FROM          tTABLE     ORDER BY          ValueField  A ORDER BY     ValueField DESC

User · Answer

My original quick answer was   select  max my column  as  my column   quartile from     select my column  ntile 4  over  order by my column  as  quartile           from   my table  i --where quartile   2 group by quartile   This will give you the median and interquartile range in one fell swoop   If you really only want one row that is the median then uncomment the where clause   When you stick that into an explain plan  60  of the work is sorting the data which is unavoidable when calculating position dependent statistics like this   I ve amended the answer to follow the excellent suggestion from Robert   evc  k-Robajz in the comments below    with PartitionedData as    select my column  ntile 10  over  order by my column  as  percentile     from   my table   MinimaAndMaxima as    select  min my column  as  low   max my column  as  high   percentile    from    PartitionedData    group by percentile  select   case     when b percentile   10 then cast b high as decimal 18 2       else cast  a low   b high   as decimal 18 2     2   end as  value   --b high  a low    b percentile from    MinimaAndMaxima a   join  MinimaAndMaxima b on  a percentile -1   b percentile  or  a percentile   10 and b percentile   10  --where b percentile   5   This should calculate the correct median and percentile values when you have an even number of data items   Again  uncomment the final where clause if you only want the median and not the entire percentile distribution

User · Answer

The following query returns the median from a list of values in one column  It cannot be used as or along with an aggregate function  but you can still use it as a sub-query with a WHERE clause in the inner select   SQL Server 2005    SELECT TOP 1 value from       SELECT TOP 50 PERCENT value      FROM table name      ORDER BY  value  for median ORDER BY value DESC

User · Answer

Using a single statement - One way is to use ROW NUMBER   and filter using sub-query  Here is to find the median salary  SELECT AVG a Salary  FROM                                                               SELECT ROW NUMBER   OVER ORDER BY Salary  as row no  Salary FROM Employee a CROSS JOIN  SELECT  COUNT    1  0 5 AS row half FROM Employee  t WHERE a row no IN  FLOOR t row half  CEILING t row half    I have seen similar solutions over the net using FLOOR and CEILING but tried to use a single statement

User · Answer

For your question  Jeff Atwood had already given the simple and effective solution  But  if you are looking for some alternative approach to calculate the median  below SQL code will help you     x000D   x000D  create table employees salary int   x000D   x000D  insert into employees values 8   insert into employees values 23   insert into employees values 45   insert into employees values 123   insert into employees values 93   insert into employees values 2342   insert into employees values 2238   x000D   x000D  select   from employees  x000D   x000D  declare  odd even int  declare  cnt int  declare  middle no int  x000D   x000D   x000D  set  cnt  select count    from employees   set  middle no   cnt 2  1  select  odd even case when   cnt 2 0  THEN -1 ELse 0 END   x000D   x000D   x000D   select AVG tbl salary  from   select  salary ROW NUMBER   over  order by salary  as rno from employees group by salary  tbl  where tbl rno  middle no or tbl rno  middle no  odd even  x000D   x000D   x000D    If you are looking to calculate median in MySQL  this github link will be useful

User · Answer

In SQL Server 2012 you should use PERCENTILE CONT   SELECT SalesOrderID  OrderQty      PERCENTILE CONT 0 5           WITHIN GROUP  ORDER BY OrderQty          OVER  PARTITION BY SalesOrderID  AS MedianCont FROM Sales SalesOrderDetail WHERE SalesOrderID IN  43670  43669  43667  43663  ORDER BY SalesOrderID DESC   See also   http   blog sqlauthority com 2011 11 20 sql-server-introduction-to-percentile cont-analytic-functions-introduced-in-sql-server-2012

User · Answer

If you re using SQL 2005 or better this is a nice  simple-ish median calculation for a single column in a table   SELECT     SELECT MAX Score  FROM     SELECT TOP 50 PERCENT Score FROM Posts ORDER BY Score  AS BottomHalf       SELECT MIN Score  FROM     SELECT TOP 50 PERCENT Score FROM Posts ORDER BY Score DESC  AS TopHalf      2 AS Median

User · Answer

--Create Temp Table to Store Results in DECLARE  results AS TABLE         Month  datetime not null    Median  int not null     --This variable will determine the date DECLARE  IntDate as int  set  IntDate   -13   WHILE   IntDate  lt  0   BEGIN  --Create Temp Table DECLARE  table AS TABLE         Rank  int not null    Days Open  int not null     --Insert records into Temp Table insert into  table   SELECT      rank   OVER  ORDER BY DATEADD mm  DATEDIFF mm  0  DATEADD ss  SVR close date   1970     0   DATEDIFF day DATEADD ss  SVR open date   1970   DATEADD ss  SVR close date   1970     SVR   ref num   as  Rank    DATEDIFF day DATEADD ss  SVR open date   1970   DATEADD ss  SVR close date   1970    as  Days Open  FROM  mdbrpt dbo View Request SVR  LEFT OUTER JOIN dbo dtv apps systems vapp   on SVR category   vapp persid  LEFT OUTER JOIN dbo prob ctg pctg   on SVR category   pctg persid  Left Outer Join  mdbrpt   dbo   rootcause  as  Root Cause    on  SVR   rootcause   Root Cause   id   Left Outer Join  mdbrpt   dbo   cr stat  as  Status   on  SVR   status   Status   code   LEFT OUTER JOIN  mdbrpt   dbo   net res  as  net    on  net   id  SVR  affected rc  WHERE  SVR Type IN   P     AND  SVR close date IS NOT NULL   AND   Status   SYM     Closed   AND  SVR parent is null  AND   Root Cause   sym  in    RC - Application   RC - Hardware    RC - Operational    RC - Unknown    AND       vapp   appl name  in   3PI   Billing Rpts Files   Collabrent   Reports   STMS   STMS 2   Telco   Comergent   OOM   C3-BAU   C3-DD   DIRECTV   DIRECTV Sales   DIRECTV Self Care   Dealer Website   EI Servlet   Enterprise Integration   ET   ICAN   ODS   SB-SCM   SeeBeyond   Digital Dashboard   IVR   OMS   Order Services   Retail Services   OSCAR   SAP   CTI   RIO   RIO Call Center   RIO Field Services   FSS-RIO3   TAOS   TCS    OR   pctg sym in   Systems Release Health Dashboard Problem   DTV QA Test Enterprise Release Deferred Defect Log    AND      Net   nr desc  in   3PI   Billing Rpts Files   Collabrent   Reports   STMS   STMS 2   Telco   Comergent   OOM   C3-BAU   C3-DD   DIRECTV   DIRECTV Sales   DIRECTV Self Care   Dealer Website   EI Servlet   Enterprise Integration   ET   ICAN   ODS   SB-SCM   SeeBeyond   Digital Dashboard   IVR   OMS   Order Services   Retail Services   OSCAR   SAP   CTI   RIO   RIO Call Center   RIO Field Services   FSS-RIO3   TAOS   TCS       AND  DATEADD mm  DATEDIFF mm  0  DATEADD ss  SVR close date   1970     0    DATEADD mm  DATEDIFF mm 0 DATEADD mm  IntDate getdate      0  ORDER BY  Days Open     DECLARE  Count AS INT SELECT  Count   COUNT    FROM  table   WITH MyResults RowNo   Days Open   AS       SELECT RowNo   Days Open  FROM          SELECT ROW NUMBER   OVER  ORDER BY  Days Open   AS RowNo   Days Open  FROM  table  AS Foo     insert into  results SELECT   DATEADD mm  DATEDIFF mm 0 DATEADD mm  IntDate getdate      0  as  Month    AVG  Days Open  as  Median  FROM MyResults WHERE RowNo     Count 1  2 OR RowNo      Count 1  2       Count 2  2     set  IntDate    IntDate 1 DELETE FROM  table END  select   from  results order by  Month

User · Answer

2019 UPDATE  In the 10 years since I wrote this answer  more solutions have been uncovered that may yield better results  Also  SQL Server releases since then  especially SQL 2012  have introduced new T-SQL features that can be used to calculate medians  SQL Server releases have also improved its query optimizer which may affect perf of various median solutions  Net-net  my original 2009 post is still OK but there may be better solutions on for modern SQL Server apps  Take a look at this article from 2012 which is a great resource   https   sqlperformance com 2012 08 t-sql-queries median  This article found the following pattern to be much  much faster than all other alternatives  at least on the simple schema they tested  This solution was 373x faster       than the slowest  PERCENTILE CONT  solution tested   Note that this trick requires two separate queries which may not be practical in all cases   It also requires SQL 2012 or later    DECLARE  c BIGINT    SELECT COUNT    FROM dbo EvenRows    SELECT AVG 1 0   val  FROM       SELECT val FROM dbo EvenRows      ORDER BY val      OFFSET   c - 1    2 ROWS      FETCH NEXT 1    1 -  c   2  ROWS ONLY   AS x    Of course  just because one test on one schema in 2012 yielded great results  your mileage may vary  especially if you re on SQL Server 2014 or later  If perf is important for your median calculation  I d strongly suggest trying and perf-testing several of the options recommended in that article to make sure that you ve found the best one for your schema    I d also be especially careful using the  new in SQL Server 2012  function PERCENTILE CONT that s recommended in one of the other answers to this question  because the article linked above found this built-in function to be 373x slower than the fastest solution   It s possible that this disparity has been improved in the 7 years since  but personally I wouldn t use this function on a large table until I verified its performance vs  other solutions   ORIGINAL 2009 POST IS BELOW   There are lots of ways to do this  with dramatically varying performance  Here s one particularly well-optimized solution  from Medians  ROW NUMBERs  and performance  This is a particularly optimal solution when it comes to actual I Os generated during execution     it looks more costly than other solutions  but it is actually much faster   That page also contains a discussion of other solutions and performance testing details  Note the use of a unique column as a disambiguator in case there are multiple rows with the same value of the median column   As with all database performance scenarios  always try to test a solution out with real data on real hardware     you never know when a change to SQL Server s optimizer or a peculiarity in your environment will make a normally-speedy solution slower   SELECT    CustomerId     AVG TotalDue  FROM      SELECT       CustomerId        TotalDue        -- SalesOrderId in the ORDER BY is a disambiguator to break ties       ROW NUMBER   OVER            PARTITION BY CustomerId          ORDER BY TotalDue ASC  SalesOrderId ASC  AS RowAsc        ROW NUMBER   OVER            PARTITION BY CustomerId          ORDER BY TotalDue DESC  SalesOrderId DESC  AS RowDesc    FROM Sales SalesOrderHeader SOH   x WHERE    RowAsc IN  RowDesc  RowDesc - 1  RowDesc   1  GROUP BY CustomerId ORDER BY CustomerId

User · Answer

Although Justin grant s solution appears solid I found that when you have a number of duplicate values within a given partition key the row numbers for the ASC duplicate values end up out of sequence so they do not properly align   Here is a fragment from my result     KEY VALUE ROWA ROWD    13  2     22   182 13  1     6    183 13  1     7    184 13  1     8    185 13  1     9    186 13  1     10   187 13  1     11   188 13  1     12   189 13  0     1    190 13  0     2    191 13  0     3    192 13  0     4    193 13  0     5    194   I used Justin s code as the basis for this solution  Although not as efficient given the use of multiple derived tables it does resolve the row ordering problem I encountered  Any improvements would be welcome as I am not that experienced in T-SQL   SELECT PKEY  cast AVG VALUE as decimal 5 2   as MEDIANVALUE FROM     SELECT PKEY VALUE ROWA ROWD     FLAG     CASE WHEN ROWA IN  ROWD ROWD-1 ROWD 1  THEN 1 ELSE 0 END    FROM         SELECT     PKEY      cast VALUE as decimal 5 2   as VALUE      ROWA      ROW NUMBER   OVER  PARTITION BY PKEY ORDER BY ROWA DESC  as ROWD       FROM             SELECT       PKEY         VALUE        ROW NUMBER   OVER  PARTITION BY PKEY ORDER BY VALUE ASC PKEY ASC   as ROWA        FROM  MTEST       T1    T2  T3 WHERE FLAG    1  GROUP BY PKEY ORDER BY PKEY

User · Answer

This is the most optimal solution for finding medians that I can think of  The names in the example is based on Justin example  Make sure an index for table Sales SalesOrderHeader exists with index columns CustomerId and TotalDue in that order   SELECT  sohCount CustomerId   AVG sohMid TotalDue  as TotalDueMedian FROM   SELECT    soh CustomerId    COUNT    as NumberOfRows FROM    Sales SalesOrderHeader soh  GROUP BY soh CustomerId  As sohCount CROSS APPLY       Select         soh TotalDue     FROM      Sales SalesOrderHeader soh      WHERE soh CustomerId   sohCount CustomerId      ORDER BY soh TotalDue     OFFSET sohCount NumberOfRows   2 -   sohCount NumberOfRows   1    2  ROWS      FETCH NEXT 1     sohCount NumberOfRows   1    2  ROWS ONLY       As sohMid GROUP BY sohCount CustomerId   UPDATE  I was a bit unsure about which method has best performance  so I did a comparison between my method Justin Grants and Jeff Atwoods by running query based on all three methods in one batch and the batch cost of each query were   Without index    Mine 30  Justin Grants 13  Jeff Atwoods 58    And with index   Mine 3   Justin Grants 10   Jeff Atwoods 87    I tried to see how well the queries scale if you have index by creating more data from around 14 000 rows by a factor of 2 up to 512 which means in the end around 7 2 millions rows  Note I made sure CustomeId field where unique for each time I did a single copy  so the proportion of rows compared to unique instance of CustomerId was kept constant  While I was doing this I ran executions where I rebuilt index afterwards  and I noticed the results stabilized at around a factor of 128 with the data I had to these values    Mine 3   Justin Grants 5   Jeff Atwoods 92    I wondered how the performance could have been affected by scaling number of of rows but keeping unique CustomerId constant  so I setup a new test where I did just this  Now instead of stabilizing  the batch cost ratio kept diverging  also  instead of about 20 rows per CustomerId per average I had in the end around 10000 rows per such unique Id  The numbers where    Mine 4  Justins 60  Jeffs 35    I made sure I implemented each method correct by comparing the results   My conclusion is the method I used is generally faster as long as index exists  Also noticed that this method is what s recommended for this particular problem in this article https   www microsoftpressstore com articles article aspx p 2314819 amp seqNum 5  A way to even further improve performance of subsequent calls to this query even further is to persist the count information in an auxiliary table  You could even maintain it by having a trigger that update and holds information regarding the count of SalesOrderHeader rows dependant on CustomerId  of course you can then simple store the median as well

[sql-server] Function to Calculate Median in SQL Server

Examples related to sql-server

Examples related to tsql

Examples related to aggregate-functions

Examples related to median