Is there any difference between GROUP BY and DISTINCT

Question

I learned something simple about SQL the other day   SELECT c FROM myTbl GROUP BY C   Has the same result as   SELECT DISTINCT C FROM myTbl   What I am curious of  is there anything different in the way an SQL engine processes the command  or are they truly the same thing     I personally prefer the distinct syntax  but I am sure it s more out of habit than anything else   EDIT  This is not a question about aggregates  The use of GROUP BY with aggregate functions is understood

User · Answer

GROUP BY has a very specific meaning that is distinct (heh) from the DISTINCT function.

GROUP BY causes the query results to be grouped using the chosen expression, aggregate functions can then be applied, and these will act on each group, rather than the entire resultset.

Here's an example that might help:

Given a table that looks like this:

name
------
barry
dave
bill
dave
dave
barry
john

This query:

SELECT name, count(*) AS count FROM table GROUP BY name;

Will produce output like this:

name    count
-------------
barry   2
dave    3
bill    1
john    1

Which is obviously very different from using DISTINCT. If you want to group your results, use GROUP BY, if you just want a unique list of a specific column, use DISTINCT. This will give your database a chance to optimise the query for your needs.

User · Answer

You re only noticing that because you are selecting a single column   Try selecting two fields and see what happens   Group By is intended to be used like this   SELECT name  SUM transaction  FROM myTbl GROUP BY name   Which would show the sum of all transactions for each person

User · Answer

In that particular query there is no difference  But  of course  if you add any aggregate columns then you ll have to use group by

User · Answer

In Hive  HQL   GROUP BY can be way faster than DISTINCT  because the former does not require comparing all fields in the table  See  https   sqlperformance com 2017 01 t-sql-queries surprises-assumptions-group-by-distinct

User · Answer

Use DISTINCT if you just want to remove duplicates  Use GROUPY BY if you want to apply aggregate operators  MAX  SUM  GROUP CONCAT       or a HAVING clause

User · Answer

For the query you posted  they are identical   But for other queries that may not be true   For example  it s not the same as   SELECT C FROM myTbl GROUP BY C  D

User · Answer

I expect there is the possibility for subtle differences in their execution  I checked the execution plans for two functionally equivalent queries along these lines in Oracle 10g   core gt  select sta from zip group by sta   ---------------------------------------------------------------------------   Id    Operation            Name   Rows    Bytes   Cost   CPU   Time       ---------------------------------------------------------------------------     0   SELECT STATEMENT               58     174      44   19   00 00 01       1    HASH GROUP BY                 58     174      44   19   00 00 01       2     TABLE ACCESS FULL  ZIP    42303     123K     38    6   00 00 01   ---------------------------------------------------------------------------  core gt  select distinct sta from zip   ---------------------------------------------------------------------------   Id    Operation            Name   Rows    Bytes   Cost   CPU   Time       ---------------------------------------------------------------------------     0   SELECT STATEMENT               58     174      44   19   00 00 01       1    HASH UNIQUE                   58     174      44   19   00 00 01       2     TABLE ACCESS FULL  ZIP    42303     123K     38    6   00 00 01   ---------------------------------------------------------------------------   The middle operation is slightly different    HASH GROUP BY  vs   HASH UNIQUE   but the estimated costs etc  are identical   I then executed these with tracing on and the actual operation counts were the same for both  except that the second one didn t have to do any physical reads due to caching    But I think that because the operation names are different  the execution would follow somewhat different code paths and that opens the possibility of more significant differences   I think you should prefer the DISTINCT syntax for this purpose   It s not just habit  it more clearly indicates the purpose of the query

User · Answer

Funtional efficiency is totally different   If you would like to select only  return value  except duplicate one  use distinct is better than group by  Because  group by  include   sorting   removing      distinct  include   removing

User · Answer

From a  SQL the language  perspective the two constructs are equivalent and which one you choose is one of those  lifestyle  choices we all have to make  I think there is a good case for DISTINCT being more explicit  and therefore is more considerate to the person who will inherit your code etc  but that doesn t mean the GROUP BY construct is an invalid choice    I think this  GROUP BY is for aggregates  is the wrong emphasis  Folk should be aware that the set function  MAX  MIN  COUNT  etc  can be omitted so that they can understand the coder s intent when it is   The ideal optimizer will recognize equivalent SQL constructs and will always pick the ideal plan accordingly  For your real life SQL engine of choice  you must test     PS note the position of the DISTINCT keyword in the select clause may produce different results e g  contrast    SELECT COUNT DISTINCT C  FROM myTbl   SELECT DISTINCT COUNT C  FROM myTbl

User · Answer

The way I always understood it is that using distinct is the same as grouping by every field you selected in the order you selected them    i e   select distinct a  b  c from table    is the same as   select a  b  c from table group by a  b  c

User · Answer

If you are using a GROUP BY without any aggregate function then internally it will treated as DISTINCT  so in this case there is no difference between GROUP BY and DISTINCT   But when you are provided with DISTINCT clause better to use it for finding your unique records because the objective of GROUP BY is to achieve aggregation

User · Answer

I read all the above comments but didn t see anyone pointed to the main difference between Group By and Distinct apart from the aggregation bit   Distinct returns all the rows then de-duplicates them whereas Group By de-deduplicate the rows as they re read by the algorithm one by one   This means they can produce different results   For example  the below codes generate different results   SELECT distinct ROW NUMBER   OVER  ORDER BY Name   Name FROM NamesTable   SELECT ROW NUMBER   OVER  ORDER BY Name   Name FROM NamesTable GROUP BY Name   If there are 10 names in the table where 1 of which is a duplicate of another then the first query returns 10 rows whereas the second query returns 9 rows   The reason is what I said above so they can behave differently

User · Answer

Sometimes they may give you the same results but they are meant to be used in different sense case  The main difference is in syntax      Minutely notice the example below  DISTINCT is used to filter out the duplicate set of values   6  cs  9 1  and  1  cs  5 5  are two different sets  So DISTINCT is going to display both the rows while GROUP BY Branch is going to display only one set    SELECT   FROM student    ------ -------- ------    Id     Branch   CGPA    ------ -------- ------       3   civil     7 2        2   mech      6 3        6   cs        9 1        4   eee       8 2        1   cs        5 5    ------ -------- ------  5 rows in set  0 001 sec   SELECT DISTINCT   FROM student    ------ -------- ------    Id     Branch   CGPA    ------ -------- ------       3   civil     7 2        2   mech      6 3        6   cs        9 1        4   eee       8 2        1   cs        5 5    ------ -------- ------  5 rows in set  0 001 sec   SELECT   FROM student GROUP BY Branch   ------ -------- ------    Id     Branch   CGPA    ------ -------- ------       3   civil     7 2        6   cs        9 1        4   eee       8 2        2   mech      6 3    ------ -------- ------  4 rows in set  0 001 sec    Sometimes the results that can be achieved by GROUP BY clause is not possible to achieved by DISTINCT without using some extra clause or conditions  E g in above case    To get the same result as DISTINCT you have to pass all the column names in GROUP BY clause like below  So see the syntactical difference  You must have knowledge about all the column names to use GROUP BY clause in that case   SELECT   FROM student GROUP BY Id  Branch  CGPA   ------ -------- ------    Id     Branch   CGPA    ------ -------- ------       1   cs        5 5        2   mech      6 3        3   civil     7 2        4   eee       8 2        6   cs        9 1    ------ -------- ------    Also I have noticed GROUP BY displays the results in ascending order by default which DISTINCT does not  But I am not sure about this  It may be differ vendor wise     Source   https   dbjpanda me dbms languages sql sql-syntax-with-examples group-by

User · Answer

Please don t use GROUP BY when you mean DISTINCT  even if they happen to work the same   I m assuming you re trying to shave off milliseconds from queries  and I have to point out that developer time is orders of magnitude more expensive than computer time

User · Answer

In terms of usage  GROUP BY is used for grouping those rows you want to calculate  DISTINCT will not do any calculation  It will show no duplicate rows   I always used DISTINCT if I want to present data without duplicates   If I want to do calculations like summing up the total quantity of mangoes  I will use GROUP BY

User · Answer

They have different semantics  even if they happen to have equivalent results on your particular data

User · Answer

MusiGenesis  response is functionally the correct one with regard to your question as stated  the SQL Server is smart enough to realize that if you are using  Group By  and not using any aggregate functions  then what you actually mean is  Distinct  - and therefore it generates an execution plan as if you d simply used  Distinct    However  I think it s important to note Hank s response as well - cavalier treatment of  Group By  and  Distinct  could lead to some pernicious gotchas down the line if you re not careful   It s not entirely correct to say that this is  not a question about aggregates  because you re asking about the functional difference between two SQL query keywords  one of which is meant to be used with aggregates and one of which is not   A hammer can work to drive in a screw sometimes  but if you ve got a screwdriver handy  why bother    for the purposes of this analogy  Hammer   Screwdriver    GroupBy   Distinct and screw   gt  get list of unique values in a table column

User · Answer

GROUP BY lets you use aggregate functions  like AVG  MAX  MIN  SUM  and COUNT   On the other hand DISTINCT just removes duplicates   For example  if you have a bunch of purchase records  and you want to know how much was spent by each department  you might do something like   SELECT department  SUM amount  FROM purchases GROUP BY department   This will give you one row per department  containing the department name and the sum of all of the amount values in all rows for that department

User · Answer

group by is used in aggregate operations -- like when you want to get a count of Bs broken down by column C  select C  count B  from myTbl group by C   distinct is what it sounds like -- you get unique rows   In sql server 2005  it looks like the query optimizer is able to optimize away the difference in the simplistic examples I ran  Dunno if you can count on that in all situations  though

User · Answer

Generally we can use DISTINCT for eliminate the duplicates on Specific Column in the table      In Case of  GROUP BY  we can Apply the Aggregation Functions like   AVG  MAX  MIN  SUM  and COUNT on Specific column and fetch   the column name and it aggregation function result on the same column       Example    select  specialColumn sum specialColumn  from yourTableName group by specialColumn

User · Answer

In Teradata perspective     From a result set point of view  it does not matter if you use DISTINCT or GROUP BY in Teradata  The answer set will be the same   From a performance point of view  it is not the same   To understand what impacts performance  you need to know what happens on Teradata when executing a statement with DISTINCT or GROUP BY   In the case of DISTINCT  the rows are redistributed immediately without any preaggregation taking place  while in the case of GROUP BY  in a first step a preaggregation is done and only then are the unique values redistributed across the AMPs   Don   t think now that GROUP BY is always better from a performance point of view  When you have many different values  the preaggregation step of GROUP BY is not very efficient  Teradata has to sort the data to remove duplicates  In this case  it may be better to the redistribution first  i e  use the DISTINCT statement  Only if there are many duplicate values  the GROUP BY statement is probably the better choice as only once the deduplication step takes place  after redistribution     In short  DISTINCT vs  GROUP BY in Teradata means   GROUP BY   -   for many duplicates DISTINCT    -  no or a few duplicates only    At times  when using DISTINCT  you run out of spool space on an AMP   The reason is that redistribution takes place immediately  and skewing could cause AMPs to run out of space     If this happens  you have probably a better chance with GROUP BY  as duplicates are already removed in a first step  and less data is moved across the AMPs

User · Answer

What s the difference from a mere duplicate removal functionality point of view  Apart from the fact that unlike DISTINCT  GROUP BY allows for aggregating data per group  which has been mentioned by many other answers   the most important difference in my opinion is the fact that the two operations  happen  at two very different steps in the logical order of operations that are executed in a SELECT statement    Here are the most important operations    FROM  including JOIN  APPLY  etc   WHERE GROUP BY  can remove duplicates  Aggregations HAVING Window functions SELECT DISTINCT  can remove duplicates  UNION  INTERSECT  EXCEPT  can remove duplicates  ORDER BY OFFSET LIMIT   As you can see  the logical order of each operation influences what can be done with it and how it influences subsequent operations  In particular  the fact that the GROUP BY operation  happens before  the SELECT operation  the projection  means that    It doesn t depend on the projection  which can be an advantage  It cannot use any values from the projection  which can be a disadvantage    1  It doesn t depend on the projection  An example where not depending on the projection is useful is if you want to calculate window functions on distinct values   SELECT rating  row number   OVER  ORDER BY rating  AS rn FROM film GROUP BY rating   When run against the Sakila database  this yields   rating   rn ----------- G        1 NC-17    2 PG       3 PG-13    4 R        5   The same couldn t be achieved with DISTINCT easily   SELECT DISTINCT rating  row number   OVER  ORDER BY rating  AS rn FROM film   That query is  wrong  and yields something like   rating   rn ------------ G        1 G        2 G        3     G        178 NC-17    179 NC-17    180       This is not what we wanted  The DISTINCT operation  happens after  the projection  so we can no longer remove DISTINCT ratings because the window function was already calculated and projected  In order to use DISTINCT  we d have to nest that part of the query   SELECT rating  row number   OVER  ORDER BY rating  AS rn FROM     SELECT DISTINCT rating FROM film   f   Side-note  In this particular case  we could also use DENSE RANK    SELECT DISTINCT rating  dense rank   OVER  ORDER BY rating  AS rn FROM film   2  It cannot use any values from the projection  One of SQL s drawbacks is its verbosity at times  For the same reason as what we ve seen before  namely the logical order of operations   we cannot  easily  group by something we re projecting   This is invalid SQL   SELECT first name           last name AS name FROM customer GROUP BY name   This is valid  repeating the expression   SELECT first name           last name AS name FROM customer GROUP BY first name           last name   This is valid  too  nesting the expression   SELECT name FROM     SELECT first name           last name AS name   FROM customer   c GROUP BY name   I ve written about this topic more in depth in a blog post

User · Answer

There is no significantly difference between group by and distinct clause except the usage of aggregate functions  Both can be used to distinguish the values but if in performance point of view group by is better  When distinct keyword is used   internally it used sort operation which can be view in execution plan    Try simple example   Declare  tmpresult table     Id tinyint    Insert into  tmpresult Select 5 Union all Select 2 Union all Select 3 Union all Select 4  Select distinct  Id From  tmpresult

User · Answer

If you use DISTINCT with multiple columns  the result set won t be grouped as it will with GROUP BY  and you can t use aggregate functions with DISTINCT

User · Answer

I know it s an old post  But it happens that I had a query that used group by just to return distinct values when using that query in toad and oracle reports everything worked fine  I mean a good response time  When we migrated from Oracle 9i to 11g the response time in Toad was excellent but in the reporte it took about 35 minutes to finish the report when using previous version it took about 5 minutes   The solution was to change the group by and use DISTINCT and now the report runs in about 30 secs   I hope this is useful for someone with the same situation

User · Answer

There is no difference  in SQL Server  at least    Both queries use the same execution plan   http   sqlmag com database-performance-tuning distinct-vs-group  Maybe there is a difference  if there are sub-queries involved   http   blog sqlauthority com 2007 03 29 sql-server-difference-between-distinct-and-group-by-distinct-vs-group-by   There is no difference  Oracle-style    http   asktom oracle com pls asktom f p 100 11 0    P11 QUESTION ID 32961403234212

[sql] Is there any difference between GROUP BY and DISTINCT

Examples related to sql

Examples related to group-by

Examples related to distinct