[sql] must appear in the GROUP BY clause or be used in an aggregate function

I have a table that looks like this caller 'makerar'

 cname  | wmname |          avg           
--------+-------------+------------------------
 canada | zoro   |     2.0000000000000000
 spain  | luffy  | 1.00000000000000000000
 spain  | usopp  |     5.0000000000000000

And I want to select the maximum avg for each cname.

SELECT cname, wmname, MAX(avg)  FROM makerar GROUP BY cname;

but I will get an error,

ERROR:  column "makerar.wmname" must appear in the GROUP BY clause or be used in an   aggregate function 
LINE 1: SELECT cname, wmname, MAX(avg)  FROM makerar GROUP BY cname;

so i do this

SELECT cname, wmname, MAX(avg)  FROM makerar GROUP BY cname, wmname;

however this will not give the intented results, and the incorrect output below is shown.

 cname  | wmname |          max           
--------+--------+------------------------
 canada | zoro   |     2.0000000000000000
 spain  | luffy  | 1.00000000000000000000
 spain  | usopp  |     5.0000000000000000

Actual Results should be

 cname  | wmname |          max           
--------+--------+------------------------
 canada | zoro   |     2.0000000000000000
 spain  | usopp  |     5.0000000000000000

How can I go about fixing this issue?

Note: This table is a VIEW created from a previous operation.

This question is related to sql group-by aggregate-functions postgresql-9.1

The answer is


The problem with specifying non-grouped and non-aggregate fields in group by selects is that engine has no way of knowing which record's field it should return in this case. Is it first? Is it last? There is usually no record that naturally corresponds to aggregated result (min and max are exceptions).

However, there is a workaround: make the required field aggregated as well. In posgres, this should work:

SELECT cname, (array_agg(wmname ORDER BY avg DESC))[1], MAX(avg)
FROM makerar GROUP BY cname;

Note that this creates an array of all wnames, ordered by avg, and returns the first element (arrays in postgres are 1-based).


I recently run into this problem, when trying to count using case when, and found that changing the order of the which and count statements fixes the problem:

SELECT date(dateday) as pick_day,
COUNT(CASE WHEN (apples = 'TRUE' OR oranges 'TRUE') THEN fruit END)  AS fruit_counter

FROM pickings

GROUP BY 1

Instead of using - in the latter, where I got errors that apples and oranges should appear in aggregate functions

CASE WHEN ((apples = 'TRUE' OR oranges 'TRUE') THEN COUNT(*) END) END AS fruit_counter

For me, it is not about a "common aggregation problem", but just about an incorrect SQL query. The single correct answer for "select the maximum avg for each cname..." is

SELECT cname, MAX(avg) FROM makerar GROUP BY cname;

The result will be:

 cname  |      MAX(avg)
--------+---------------------
 canada | 2.0000000000000000
 spain  | 5.0000000000000000

This result in general answers the question "What is the best result for each group?". We see that the best result for spain is 5 and for canada the best result is 2. It is true, and there is no error. If we need to display wmname also, we have to answer the question: "What is the RULE to choose wmname from resulting set?" Let's change the input data a bit to clarify the mistake:

  cname | wmname |        avg           
--------+--------+-----------------------
 spain  | zoro   |  1.0000000000000000
 spain  | luffy  |  5.0000000000000000
 spain  | usopp  |  5.0000000000000000

Which result do you expect on runnig this query: SELECT cname, wmname, MAX(avg) FROM makerar GROUP BY cname;? Should it be spain+luffy or spain+usopp? Why? It is not determined in the query how to choose "better" wmname if several are suitable, so the result is also not determined. That's why SQL interpreter returns an error - the query is not correct.

In the other word, there is no correct answer to the question "Who is the best in spain group?". Luffy is not better than usopp, because usopp has the same "score".


In Postgres, you can also use the special DISTINCT ON (expression) syntax:

SELECT DISTINCT ON (cname) 
    cname, wmname, avg
FROM 
    makerar 
ORDER BY 
    cname, avg DESC ;

This seems to work as well

SELECT *
FROM makerar m1
WHERE m1.avg = (SELECT MAX(avg)
                FROM makerar m2
                WHERE m1.cname = m2.cname
               )

SELECT t1.cname, t1.wmname, t2.max
FROM makerar t1 JOIN (
    SELECT cname, MAX(avg) max
    FROM makerar
    GROUP BY cname ) t2
ON t1.cname = t2.cname AND t1.avg = t2.max;

Using rank() window function:

SELECT cname, wmname, avg
FROM (
    SELECT cname, wmname, avg, rank() 
    OVER (PARTITION BY cname ORDER BY avg DESC)
    FROM makerar) t
WHERE rank = 1;

Note

Either one will preserve multiple max values per group. If you want only single record per group even if there is more than one record with avg equal to max you should check @ypercube's answer.


Examples related to sql

Passing multiple values for same variable in stored procedure SQL permissions for roles Generic XSLT Search and Replace template Access And/Or exclusions Pyspark: Filter dataframe based on multiple conditions Subtracting 1 day from a timestamp date PYODBC--Data source name not found and no default driver specified select rows in sql with latest date for each ID repeated multiple times ALTER TABLE DROP COLUMN failed because one or more objects access this column Create Local SQL Server database

Examples related to group-by

SELECT list is not in GROUP BY clause and contains nonaggregated column .... incompatible with sql_mode=only_full_group_by Count unique values using pandas groupby Pandas group-by and sum Count unique values with pandas per groups Group dataframe and get sum AND count? Error related to only_full_group_by when executing a query in MySql Pandas sum by groupby, but exclude certain columns Using DISTINCT along with GROUP BY in SQL Server Python Pandas : group by in group by and average? How do I create a new column from the output of pandas groupby().sum()?

Examples related to aggregate-functions

Spark SQL: apply aggregate functions to a list of columns GROUP BY without aggregate function GROUP BY + CASE statement must appear in the GROUP BY clause or be used in an aggregate function Naming returned columns in Pandas aggregate function? Concatenate multiple result rows of one column into one, group by another column How to include "zero" / "0" results in COUNT aggregate? Apply multiple functions to multiple groupby columns Reason for Column is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause Optimal way to concatenate/aggregate strings

Examples related to postgresql-9.1

Postgresql SQL: How check boolean field with null and True,False Value? must appear in the GROUP BY clause or be used in an aggregate function postgres, ubuntu how to restart service on startup? get stuck on clustering after instance reboot List tables in a PostgreSQL schema PostgreSQL ERROR: canceling statement due to conflict with recovery How do I convert an integer to string as part of a PostgreSQL query? ALTER TABLE, set null in not null column, PostgreSQL 9.1 Strange PostgreSQL "value too long for type character varying(500)" PostgreSQL next value of the sequences? Change type of varchar field to integer: "cannot be cast automatically to type integer"