How do I or can I SELECT DISTINCT on multiple columns

Question

I need to retrieve all rows from a table where 2 columns combined are all different  So I want all the sales that do not have any other sales that happened on the same day for the same price  The sales that are unique based on day and price will get updated to an active status   So I m thinking   UPDATE sales SET status    ACTIVE  WHERE id IN  SELECT DISTINCT  saleprice  saledate   id  count id               FROM sales              HAVING count   1    But my brain hurts going any farther than that

User · Accepted Answer

SELECT DISTINCT a b c FROM t   is roughly equivalent to     SELECT a b c FROM t GROUP BY a b c   It s a good idea to get used to the GROUP BY syntax  as it s more powerful     For your query  I d do it like this   UPDATE sales SET status  ACTIVE  WHERE id IN       SELECT id     FROM sales S     INNER JOIN               SELECT saleprice  saledate         FROM sales         GROUP BY saleprice  saledate         HAVING COUNT      1        T     ON S saleprice T saleprice AND s saledate T saledate

User · Answer

If you put together the answers so far  clean up and improve  you would arrive at this superior query   UPDATE sales SET    status    ACTIVE  WHERE   saleprice  saledate  IN       SELECT saleprice  saledate     FROM   sales     GROUP  BY saleprice  saledate     HAVING count      1           Which is much faster than either of them  Nukes the performance of the currently accepted answer by factor 10 - 15  in my tests on PostgreSQL 8 4 and 9 1    But this is still far from optimal  Use a NOT EXISTS  anti- semi-join for even better performance  EXISTS is standard SQL  has been around forever  at least since PostgreSQL 7 2  long before this question was asked  and fits the presented requirements perfectly   UPDATE sales s SET    status    ACTIVE  WHERE  NOT EXISTS      SELECT FROM sales s1                     -- SELECT list can be empty for EXISTS    WHERE  s saleprice   s1 saleprice    AND    s saledate    s1 saledate    AND    s id  lt  gt  s1 id                     -- except for row itself      AND    s status IS DISTINCT FROM  ACTIVE    -- avoid empty updates  see below   db lt  fiddle here Old SQL Fiddle  Unique key to identify row  If you don t have a primary or unique key for the table  id in the example   you can substitute with the system column ctid for the purpose of this query  but not for some other purposes       AND    s1 ctid  lt  gt  s ctid   Every table should have a primary key  Add one if you didn t have one  yet  I suggest a serial or an IDENTITY column in Postgres 10    Related    In-order sequence generation Auto increment table column   How is this faster   The subquery in the EXISTS anti-semi-join can stop evaluating as soon as the first dupe is found  no point in looking further   For a base table with few duplicates this is only mildly more efficient  With lots of duplicates this becomes way more efficient   Exclude empty updates  For rows that already have status    ACTIVE  this update would not change anything  but still insert a new row version at full cost  minor exceptions apply   Normally  you do not want this  Add another WHERE condition like demonstrated above to avoid this and make it even faster   If status is defined NOT NULL  you can simplify to   AND status  lt  gt   ACTIVE     The data type of the column must support the  lt  gt  operator  Some types like json don t  See    How to query a json column for empty objects    Subtle difference in NULL handling  This query  unlike the currently accepted answer by Joel  does not treat NULL values as equal  The following two rows for  saleprice  saledate  would qualify as  distinct   though looking identical to the human eye     123  NULL   123  NULL    Also passes in a unique index and almost anywhere else  since NULL values do not compare equal according to the SQL standard  See    Create unique constraint with null columns   OTOH  GROUP BY  DISTINCT or DISTINCT ON    treat NULL values as equal  Use an appropriate query style depending on what you want to achieve  You can still use this faster query with IS NOT DISTINCT FROM instead of   for any or all comparisons to make NULL compare equal  More    How to delete duplicate rows without unique identifier   If all columns being compared are defined NOT NULL  there is no room for disagreement

User · Answer

I want to select the distinct values from one column  GrondOfLucht  but they should be sorted in the order as given in the column  sortering   I cannot get the distinct values of just one column using  Select distinct GrondOfLucht sortering from CorWijzeVanAanleg order by sortering   It will also give the column  sortering  and because  GrondOfLucht  AND  sortering  is not unique  the result will be ALL rows   use the GROUP to select the records of  GrondOfLucht  in the order given by  sortering  SELECT        GrondOfLucht FROM            dbo CorWijzeVanAanleg GROUP BY GrondOfLucht  sortering ORDER BY MIN sortering

User · Answer

If your DBMS doesn t support distinct with multiple columns like this   select distinct col1  col2  from table   Multi select in general can be executed safely as follows   select distinct   from  select col1  col2 from table   as x   As this can work on most of the DBMS and this is expected to be faster than group by solution as you are avoiding the grouping functionality

User · Answer

The problem with your query is that when using a GROUP BY clause  which you essentially do by using distinct  you can only use columns that you group by or aggregate functions  You cannot use the column id because there are potentially different values  In your case there is always only one value because of the HAVING clause  but most RDBMS are not smart enough to recognize that   This should work however  and doesn t need a join    UPDATE sales SET status  ACTIVE  WHERE id IN     SELECT MIN id  FROM sales   GROUP BY saleprice  saledate   HAVING COUNT id    1     You could also use MAX or AVG instead of MIN  it is only important to use a function that returns the value of the column if there is only one matching row

[sql] How do I (or can I) SELECT DISTINCT on multiple columns?

Examples related to sql

Examples related to postgresql

Examples related to sql-update

Examples related to duplicates

Examples related to distinct