[sql] SQL - select distinct only on one column

I have searched far and wide for an answer to this problem. I'm using a Microsoft SQL Server, suppose I have a table that looks like this:

+--------+---------+-------------+-------------+
| ID     | NUMBER  | COUNTRY     | LANG        |
+--------+---------+-------------+-------------+
| 1      | 3968    | UK          | English     |
| 2      | 3968    | Spain       | Spanish     |
| 3      | 3968    | USA         | English     |
| 4      | 1234    | Greece      | Greek       |
| 5      | 1234    | Italy       | Italian     |

I want to perform one query which only selects the unique 'NUMBER' column (whether is be the first or last row doesn't bother me). So this would give me:

+--------+---------+-------------+-------------+
| ID     | NUMBER  | COUNTRY     | LANG        |
+--------+---------+-------------+-------------+
| 1      | 3968    | UK          | English     |
| 4      | 1234    | Greece      | Greek       |

How is this achievable?

This question is related to sql sql-server unique distinct

The answer is


Since you don't care, I chose the max ID for each number.

select tbl.* from tbl
inner join (
select max(id) as maxID, number from tbl group by number) maxID
on maxID.maxID = tbl.id

Query Explanation

 select 
    tbl.*  -- give me all the data from the base table (tbl) 
 from 
    tbl    
    inner join (  -- only return rows in tbl which match this subquery
        select 
            max(id) as maxID -- MAX (ie distinct) ID per GROUP BY below
        from 
            tbl 
        group by 
            NUMBER            -- how to group rows for the MAX aggregation
    ) maxID
        on maxID.maxID = tbl.id -- join condition ie only return rows in tbl 
                                -- whose ID is also a MAX ID for a given NUMBER

A very typical approach to this type of problem is to use row_number():

select t.*
from (select t.*,
             row_number() over (partition by number order by id) as seqnum
      from t
     ) t
where seqnum = 1;

This is more generalizable than using a comparison to the minimum id. For instance, you can get a random row by using order by newid(). You can select 2 rows by using where seqnum <= 2.


You will use the following query:

SELECT * FROM [table] GROUP BY NUMBER;

Where [table] is the name of the table.

This provides a unique listing for the NUMBER column however the other columns may be meaningless depending on the vendor implementation; which is to say they may not together correspond to a specific row or rows.


Examples related to sql

Passing multiple values for same variable in stored procedure SQL permissions for roles Generic XSLT Search and Replace template Access And/Or exclusions Pyspark: Filter dataframe based on multiple conditions Subtracting 1 day from a timestamp date PYODBC--Data source name not found and no default driver specified select rows in sql with latest date for each ID repeated multiple times ALTER TABLE DROP COLUMN failed because one or more objects access this column Create Local SQL Server database

Examples related to sql-server

Passing multiple values for same variable in stored procedure SQL permissions for roles Count the Number of Tables in a SQL Server Database Visual Studio 2017 does not have Business Intelligence Integration Services/Projects ALTER TABLE DROP COLUMN failed because one or more objects access this column Create Local SQL Server database How to create temp table using Create statement in SQL Server? SQL Query Where Date = Today Minus 7 Days How do I pass a list as a parameter in a stored procedure? SQL Server date format yyyymmdd

Examples related to unique

Count unique values with pandas per groups Find the unique values in a column and then sort them How can I check if the array of objects have duplicate property values? Firebase: how to generate a unique numeric ID for key? pandas unique values multiple columns Select unique values with 'select' function in 'dplyr' library Generate 'n' unique random numbers within a range SQL - select distinct only on one column Can I use VARCHAR as the PRIMARY KEY? Count unique values in a column in Excel

Examples related to distinct

Using DISTINCT along with GROUP BY in SQL Server How to "select distinct" across multiple data frame columns in pandas? Laravel Eloquent - distinct() and count() not working properly together SQL - select distinct only on one column SQL: Group by minimum value in one field while selecting distinct rows Count distinct value pairs in multiple columns in SQL sql query distinct with Row_Number Eliminating duplicate values based on only one column of the table MongoDB distinct aggregation Pandas count(distinct) equivalent