[sql] Eliminating duplicate values based on only one column of the table

My query:

SELECT sites.siteName, sites.siteIP, history.date
FROM sites INNER JOIN
     history ON sites.siteName = history.siteName
ORDER BY siteName,date

First part of the output:

enter image description here

How can I remove the duplicates in siteName column? I want to leave only the updated one based on date column.

In the example output above, I need the rows 1, 3, 6, 10

This question is related to sql sql-server distinct inner-join duplicate-removal

The answer is


I solve such queries using this pattern:

SELECT *
FROM t
WHERE t.field=(
  SELECT MAX(t.field) 
  FROM t AS t0 
  WHERE t.group_column1=t0.group_column1
    AND t.group_column2=t0.group_column2 ...)

That is it will select records where the value of a field is at its max value. To apply it to your query I used the common table expression so that I don't have to repeat the JOIN twice:

WITH site_history AS (
  SELECT sites.siteName, sites.siteIP, history.date
  FROM sites
  JOIN history USING (siteName)
)
SELECT *
FROM site_history h
WHERE date=(
  SELECT MAX(date) 
  FROM site_history h0 
  WHERE h.siteName=h0.siteName)
ORDER BY siteName

It's important to note that it works only if the field we're calculating the maximum for is unique. In your example the date field should be unique for each siteName, that is if the IP can't be changed multiple times per millisecond. In my experience this is commonly the case otherwise you don't know which record is the newest anyway. If the history table has an unique index for (site, date), this query is also very fast, index range scan on the history table scanning just the first item can be used.


From your example it seems reasonable to assume that the siteIP column is determined by the siteName column (that is, each site has only one siteIP). If this is indeed the case, then there is a simple solution using group by:

select
  sites.siteName,
  sites.siteIP,
  max(history.date)
from sites
inner join history on
  sites.siteName=history.siteName
group by
  sites.siteName,
  sites.siteIP
order by
  sites.siteName;

However, if my assumption is not correct (that is, it is possible for a site to have multiple siteIP), then it is not clear from you question which siteIP you want the query to return in the second column. If just any siteIP, then the following query will do:

select
  sites.siteName,
  min(sites.siteIP),
  max(history.date)
from sites
inner join history on
  sites.siteName=history.siteName
group by
  sites.siteName
order by
  sites.siteName;

Examples related to sql

Passing multiple values for same variable in stored procedure SQL permissions for roles Generic XSLT Search and Replace template Access And/Or exclusions Pyspark: Filter dataframe based on multiple conditions Subtracting 1 day from a timestamp date PYODBC--Data source name not found and no default driver specified select rows in sql with latest date for each ID repeated multiple times ALTER TABLE DROP COLUMN failed because one or more objects access this column Create Local SQL Server database

Examples related to sql-server

Passing multiple values for same variable in stored procedure SQL permissions for roles Count the Number of Tables in a SQL Server Database Visual Studio 2017 does not have Business Intelligence Integration Services/Projects ALTER TABLE DROP COLUMN failed because one or more objects access this column Create Local SQL Server database How to create temp table using Create statement in SQL Server? SQL Query Where Date = Today Minus 7 Days How do I pass a list as a parameter in a stored procedure? SQL Server date format yyyymmdd

Examples related to distinct

Using DISTINCT along with GROUP BY in SQL Server How to "select distinct" across multiple data frame columns in pandas? Laravel Eloquent - distinct() and count() not working properly together SQL - select distinct only on one column SQL: Group by minimum value in one field while selecting distinct rows Count distinct value pairs in multiple columns in SQL sql query distinct with Row_Number Eliminating duplicate values based on only one column of the table MongoDB distinct aggregation Pandas count(distinct) equivalent

Examples related to inner-join

Trying to use INNER JOIN and GROUP BY SQL with SUM Function, Not Working Multiple INNER JOIN SQL ACCESS How to select all rows which have same value in some column Eliminating duplicate values based on only one column of the table How can I delete using INNER JOIN with SQL Server? How to use mysql JOIN without ON condition? Inner join with 3 tables in mysql SQL Inner join more than two tables MySQL INNER JOIN select only one row from second table Insert using LEFT JOIN and INNER JOIN

Examples related to duplicate-removal

C# LINQ find duplicates in List Eliminating duplicate values based on only one column of the table Delete duplicate records from a SQL table without a primary key