What s faster SELECT DISTINCT or GROUP BY in MySQL

Question

If I have a table  CREATE TABLE users     id int 10  unsigned NOT NULL auto increment    name varchar 255  NOT NULL    profession varchar 255  NOT NULL    employer varchar 255  NOT NULL    PRIMARY KEY   id      and I want to get all unique values of profession field  what would be faster  or recommended    SELECT DISTINCT u profession FROM users u   or  SELECT u profession FROM users u GROUP BY u profession

User · Answer

Group by is expensive than Distinct since Group by does a sort on the result while distinct avoids it. But if you want to make group by yield the same result as distinct give order by null ..

SELECT DISTINCT u.profession FROM users u

is equal to

SELECT u.profession FROM users u GROUP BY u.profession order by null

User · Answer

Here is a simple approach which will print the 2 different elapsed time for each query   DECLARE  t1 DATETIME  DECLARE  t2 DATETIME   SET  t1   GETDATE    SELECT DISTINCT u profession FROM users u  --Query with DISTINCT SET  t2   GETDATE    PRINT  Elapsed time  ms       CAST DATEDIFF millisecond   t1   t2  AS varchar    SET  t1   GETDATE    SELECT u profession FROM users u GROUP BY u profession  --Query with GROUP BY SET  t2   GETDATE    PRINT  Elapsed time  ms       CAST DATEDIFF millisecond   t1   t2  AS varchar     OR try SET STATISTICS TIME  Transact-SQL   SET STATISTICS TIME ON  SELECT DISTINCT u profession FROM users u  --Query with DISTINCT SELECT u profession FROM users u GROUP BY u profession  --Query with GROUP BY SET STATISTICS TIME OFF    It simply displays the number of milliseconds required to parse  compile  and execute each statement as below    SQL Server Execution Times     CPU time   0 ms   elapsed time   2 ms

User · Answer

If you have an index on profession  these two are synonyms   If you don t  then use DISTINCT   GROUP BY in MySQL sorts results  You can even do   SELECT u profession FROM users u GROUP BY u profession DESC   and get your professions sorted in DESC order   DISTINCT creates a temporary table and uses it for storing duplicates  GROUP BY does the same  but sortes the distinct results afterwards   So  SELECT DISTINCT u profession FROM users u   is faster  if you don t have an index on profession

User · Answer

In MySQL   Group By  uses an extra step  filesort  I realize DISTINCT is faster than GROUP BY  and that was a surprise

User · Answer

It seems that the queries are not exactly the same  At least for MySQL    Compare    describe select distinct productname from northwind products describe select productname from northwind products group by productname   The second query gives additionally  Using filesort  in Extra

User · Answer

well distinct can be slower than group by on some occasions in postgres  dont know about other dbs    tested example   postgres   select count    from  select distinct i from g  a   count   10001  1 row   Time  1563 109 ms  postgres   select count    from  select i from g group by i  a   count 10001  1 row   Time  594 481 ms   http   www pgsql cz index php PostgreSQL SQL Tricks I  so be careful

User · Answer

SELECT DISTINCT will always be the same  or faster  than a GROUP BY   On some systems  i e  Oracle   it might be optimized to be the same as DISTINCT for most queries   On others  such as SQL Server   it can be considerably faster

User · Answer

more of a functional note   There are cases when you have to use GROUP BY  for example if you wanted to get the number of employees per employer   SELECT u employer  COUNT u id  AS  total employees  FROM users u GROUP BY u employer   In such a scenario DISTINCT u employer doesn t work right  Perhaps there is a way  but I just do not know it   If someone knows how to make such a query with DISTINCT please add a note

User · Answer

After heavy testing we came to the conclusion that GROUP BY is faster  SELECT sql no cache opnamegroep intern   FROM telwerken   WHERE opnemergroep IN  7 8 9 10 11 12 13  group by opnamegroep intern  635 totaal 0 0944 seconds Weergave van records 0 - 29   635 totaal  query duurde 0 0484 sec   SELECT sql no cache distinct  opnamegroep intern    FROM telwerken   WHERE opnemergroep IN  7 8 9 10 11 12 13    635 totaal 0 2117 seconds   almost 100  slower   Weergave van records 0 - 29   635 totaal  query duurde 0 3468 sec

User · Answer

They are essentially equivalent to each other  in fact this is how some databases implement DISTINCT under the hood    If one of them is faster  it s going to be DISTINCT   This is because  although the two are the same  a query optimizer would have to catch the fact that your GROUP BY is not taking advantage of any group members  just their keys  DISTINCT makes this explicit  so you can get away with a slightly dumber optimizer   When in doubt  test

User · Answer

This is not a rule  For each query      try separately distinct and then group by     compare the time to complete each query and use the faster        In my project sometime I use group by and others distinct

User · Answer

Go for the simplest and shortest if you can -- DISTINCT seems to be more what you are looking for only because it will give you EXACTLY the answer you need and only that

User · Answer

If you don t have to do any group functions  sum  average etc in case you want to add numeric data to the table   use SELECT DISTINCT  I suspect it s faster  but i have nothing to show for it   In any case  if you re worried about speed  create an index on the column

User · Answer

All of the answers above are correct  for the case of DISTINCT on a single column vs GROUP BY on a single column   Every db engine has its own implementation and optimizations  and if you care about the very little difference  in most cases  then you have to test against specific server AND specific version  As implementations may change     BUT  if you select more than one column in the query  then the DISTINCT is essentially different  Because in this case it will compare ALL columns of all rows  instead of just one column   So if you have something like      This will NOT return unique by  id   but unique by  id name  SELECT DISTINCT id  name FROM some query with joins     This will select unique by  id   SELECT id  name FROM some query with joins GROUP BY id   It is a common mistake to think that DISTINCT keyword distinguishes rows by the first column you specified  but the DISTINCT is a general keyword in this manner   So people you have to be careful not to take the answers above as correct for all cases    You might get confused and get the wrong results while all you wanted was to optimize

User · Answer

If the problem allows it  try with EXISTS  since it s optimized to end as soon as a result is found  And don t buffer any response   so  if you are just trying to normalize data for a WHERE clause like this  SELECT FROM SOMETHING S WHERE S ID IN   SELECT DISTINCT DCR SOMETHING ID FROM DIFF CARDINALITY RELATIONSHIP DCR   -- to keep same cardinality   A faster response would be   SELECT FROM SOMETHING S WHERE EXISTS   SELECT 1 FROM DIFF CARDINALITY RELATIONSHIP DCR WHERE DCR SOMETHING ID   S ID     This isn t always possible but when available you will see a faster response

[mysql] What's faster, SELECT DISTINCT or GROUP BY in MySQL?

Examples related to mysql

Examples related to sql

Examples related to database

Examples related to group-by

Examples related to distinct