Eliminating duplicate values based on only one column of the table

Question

My query   SELECT sites siteName  sites siteIP  history date FROM sites INNER JOIN      history ON sites siteName   history siteName ORDER BY siteName date   First part of the output     How can I remove the duplicates in siteName column  I want to leave only the updated one based on date column   In the example output above  I need the rows 1  3  6  10

User · Accepted Answer

This is where the window function row number   comes in handy   SELECT s siteName  s siteIP  h date FROM sites s INNER JOIN       select h    row number   over  partition by siteName order by date desc  as seqnum       from history h        h     ON s siteName   h siteName and seqnum   1 ORDER BY s siteName  h date

User · Answer

From your example it seems reasonable to assume that the siteIP column is determined by the siteName column  that is  each site has only one siteIP   If this is indeed the case  then there is a simple solution using group by   select   sites siteName    sites siteIP    max history date  from sites inner join history on   sites siteName history siteName group by   sites siteName    sites siteIP order by   sites siteName    However  if my assumption is not correct  that is  it is possible for a site to have multiple siteIP   then it is not clear from you question which siteIP you want the query to return in the second column  If just any siteIP  then the following query will do   select   sites siteName    min sites siteIP     max history date  from sites inner join history on   sites siteName history siteName group by   sites siteName order by   sites siteName

User · Answer

I solve such queries using this pattern   SELECT   FROM t WHERE t field     SELECT MAX t field     FROM t AS t0    WHERE t group column1 t0 group column1     AND t group column2 t0 group column2        That is it will select records where the value of a field is at its max value  To apply it to your query I used the common table expression so that I don t have to repeat the JOIN twice   WITH site history AS     SELECT sites siteName  sites siteIP  history date   FROM sites   JOIN history USING  siteName    SELECT   FROM site history h WHERE date     SELECT MAX date     FROM site history h0    WHERE h siteName h0 siteName  ORDER BY siteName   It s important to note that it works only if the field we re calculating the maximum for is unique  In your example the date field should be unique for each siteName  that is if the IP can t be changed multiple times per millisecond  In my experience this is commonly the case otherwise you don t know which record is the newest anyway  If the history table has an unique index for  site  date   this query is also very fast  index range scan on the history table scanning just the first item can be used

[sql] Eliminating duplicate values based on only one column of the table

Examples related to sql

Examples related to sql-server

Examples related to distinct

Examples related to inner-join

Examples related to duplicate-removal