[sql] Using group by on multiple columns

I understand the point of GROUP BY x.

But how does GROUP BY x, y work, and what does it mean?

This question is related to sql group-by multiple-columns

The answer is


In simple English from GROUP BY with two parameters what we are doing is looking for similar value pairs and get the count to a 3rd column.

Look at the following example for reference. Here I'm using International football results from 1872 to 2020

+----------+----------------+--------+---+---+--------+---------+-------------------+-----+
|       _c0|             _c1|     _c2|_c3|_c4|     _c5|      _c6|                _c7|  _c8|
+----------+----------------+--------+---+---+--------+---------+-------------------+-----+
|1872-11-30|        Scotland| England|  0|  0|Friendly|  Glasgow|           Scotland|FALSE|
|1873-03-08|         England|Scotland|  4|  2|Friendly|   London|            England|FALSE|
|1874-03-07|        Scotland| England|  2|  1|Friendly|  Glasgow|           Scotland|FALSE|
|1875-03-06|         England|Scotland|  2|  2|Friendly|   London|            England|FALSE|
|1876-03-04|        Scotland| England|  3|  0|Friendly|  Glasgow|           Scotland|FALSE|
|1876-03-25|        Scotland|   Wales|  4|  0|Friendly|  Glasgow|           Scotland|FALSE|
|1877-03-03|         England|Scotland|  1|  3|Friendly|   London|            England|FALSE|
|1877-03-05|           Wales|Scotland|  0|  2|Friendly|  Wrexham|              Wales|FALSE|
|1878-03-02|        Scotland| England|  7|  2|Friendly|  Glasgow|           Scotland|FALSE|
|1878-03-23|        Scotland|   Wales|  9|  0|Friendly|  Glasgow|           Scotland|FALSE|
|1879-01-18|         England|   Wales|  2|  1|Friendly|   London|            England|FALSE|
|1879-04-05|         England|Scotland|  5|  4|Friendly|   London|            England|FALSE|
|1879-04-07|           Wales|Scotland|  0|  3|Friendly|  Wrexham|              Wales|FALSE|
|1880-03-13|        Scotland| England|  5|  4|Friendly|  Glasgow|           Scotland|FALSE|
|1880-03-15|           Wales| England|  2|  3|Friendly|  Wrexham|              Wales|FALSE|
|1880-03-27|        Scotland|   Wales|  5|  1|Friendly|  Glasgow|           Scotland|FALSE|
|1881-02-26|         England|   Wales|  0|  1|Friendly|Blackburn|            England|FALSE|
|1881-03-12|         England|Scotland|  1|  6|Friendly|   London|            England|FALSE|
|1881-03-14|           Wales|Scotland|  1|  5|Friendly|  Wrexham|              Wales|FALSE|
|1882-02-18|Northern Ireland| England|  0| 13|Friendly|  Belfast|Republic of Ireland|FALSE|
+----------+----------------+--------+---+---+--------+---------+-------------------+-----+

And now I'm going to group by similar country(column _c7) and tournament(_c5) value pairs by GROUP BY operation,

SELECT `_c5`,`_c7`,count(*)  FROM res GROUP BY `_c5`,`_c7`

+--------------------+-------------------+--------+
|                 _c5|                _c7|count(1)|
+--------------------+-------------------+--------+
|            Friendly|  Southern Rhodesia|      11|
|            Friendly|            Ecuador|      68|
|African Cup of Na...|           Ethiopia|      41|
|Gold Cup qualific...|Trinidad and Tobago|       9|
|AFC Asian Cup qua...|             Bhutan|       7|
|African Nations C...|              Gabon|       2|
|            Friendly|           China PR|     170|
|FIFA World Cup qu...|             Israel|      59|
|FIFA World Cup qu...|              Japan|      61|
|UEFA Euro qualifi...|            Romania|      62|
|AFC Asian Cup qua...|              Macau|       9|
|            Friendly|        South Sudan|       1|
|CONCACAF Nations ...|           Suriname|       3|
|         Copa Newton|          Argentina|      12|
|            Friendly|        Philippines|      38|
|FIFA World Cup qu...|              Chile|      68|
|African Cup of Na...|         Madagascar|      29|
|FIFA World Cup qu...|       Burkina Faso|      30|
| UEFA Nations League|            Denmark|       4|
|        Atlantic Cup|           Paraguay|       2|
+--------------------+-------------------+--------+

Explanation: The meaning of the first row is there were 11 Friendly tournaments held on Southern Rhodesia in total.

Note: Here it's mandatory to use a counter column in this case.


Here I am going to explain not only the GROUP clause use, but also the Aggregate functions use.

The GROUP BY clause is used in conjunction with the aggregate functions to group the result-set by one or more columns. e.g.:

-- GROUP BY with one parameter:
SELECT column_name, AGGREGATE_FUNCTION(column_name)
FROM table_name
WHERE column_name operator value
GROUP BY column_name;

-- GROUP BY with two parameters:
SELECT
    column_name1,
    column_name2,
    AGGREGATE_FUNCTION(column_name3)
FROM
    table_name
GROUP BY
    column_name1,
    column_name2;

Remember this order:

  1. SELECT (is used to select data from a database)

  2. FROM (clause is used to list the tables)

  3. WHERE (clause is used to filter records)

  4. GROUP BY (clause can be used in a SELECT statement to collect data across multiple records and group the results by one or more columns)

  5. HAVING (clause is used in combination with the GROUP BY clause to restrict the groups of returned rows to only those whose the condition is TRUE)

  6. ORDER BY (keyword is used to sort the result-set)

You can use all of these if you are using aggregate functions, and this is the order that they must be set, otherwise you can get an error.

Aggregate Functions are:

MIN() returns the smallest value in a given column

MAX() returns the maximum value in a given column.

SUM() returns the sum of the numeric values in a given column

AVG() returns the average value of a given column

COUNT() returns the total number of values in a given column

COUNT(*) returns the number of rows in a table

SQL script examples about using aggregate functions:

Let's say we need to find the sale orders whose total sale is greater than $950. We combine the HAVING clause and the GROUP BY clause to accomplish this:

SELECT 
    orderId, SUM(unitPrice * qty) Total
FROM
    OrderDetails
GROUP BY orderId
HAVING Total > 950;

Counting all orders and grouping them customerID and sorting the result ascendant. We combine the COUNT function and the GROUP BY, ORDER BY clauses and ASC:

SELECT 
    customerId, COUNT(*)
FROM
    Orders
GROUP BY customerId
ORDER BY COUNT(*) ASC;

Retrieve the category that has an average Unit Price greater than $10, using AVG function combine with GROUP BY and HAVING clauses:

SELECT 
    categoryName, AVG(unitPrice)
FROM
    Products p
INNER JOIN
    Categories c ON c.categoryId = p.categoryId
GROUP BY categoryName
HAVING AVG(unitPrice) > 10;

Getting the less expensive product by each category, using the MIN function in a subquery:

SELECT categoryId,
       productId,
       productName,
       unitPrice
FROM Products p1
WHERE unitPrice = (
                SELECT MIN(unitPrice)
                FROM Products p2
                WHERE p2.categoryId = p1.categoryId)

The following statement groups rows with the same values in both categoryId and productId columns:

SELECT 
    categoryId, categoryName, productId, SUM(unitPrice)
FROM
    Products p
INNER JOIN
    Categories c ON c.categoryId = p.categoryId
GROUP BY categoryId, productId

Examples related to sql

Passing multiple values for same variable in stored procedure SQL permissions for roles Generic XSLT Search and Replace template Access And/Or exclusions Pyspark: Filter dataframe based on multiple conditions Subtracting 1 day from a timestamp date PYODBC--Data source name not found and no default driver specified select rows in sql with latest date for each ID repeated multiple times ALTER TABLE DROP COLUMN failed because one or more objects access this column Create Local SQL Server database

Examples related to group-by

SELECT list is not in GROUP BY clause and contains nonaggregated column .... incompatible with sql_mode=only_full_group_by Count unique values using pandas groupby Pandas group-by and sum Count unique values with pandas per groups Group dataframe and get sum AND count? Error related to only_full_group_by when executing a query in MySql Pandas sum by groupby, but exclude certain columns Using DISTINCT along with GROUP BY in SQL Server Python Pandas : group by in group by and average? How do I create a new column from the output of pandas groupby().sum()?

Examples related to multiple-columns

How to make div same height as parent (displayed as table-cell) Scrolling a flexbox with overflowing content How do I change Bootstrap 3 column order on mobile layout? Bootstrap 3 Multi-column within a single ul not floating properly Multiple input box excel VBA Unique Key constraints for multiple columns in Entity Framework Combine two or more columns in a dataframe into a new column with a new name Apply pandas function to column to create multiple new columns? mysqli_fetch_array while loop columns MySQL ORDER BY multiple column ASC and DESC