I often find these three variants:
SELECT COUNT(*) FROM Foo;
SELECT COUNT(1) FROM Foo;
SELECT COUNT(PrimaryKey) FROM Foo;
As far as I can see, they all do the same thing, and I find myself using the three in my codebase. However, I don't like to do the same thing different ways. To which one should I stick? Is any one of them better than the two others?
Two of them always produce the same answer:
COUNT(*)
counts the number of rowsCOUNT(1)
also counts the number of rowsAssuming the pk
is a primary key and that no nulls are allowed in the values, then
COUNT(pk)
also counts the number of rowsHowever, if pk
is not constrained to be not null, then it produces a different answer:
COUNT(possibly_null)
counts the number of rows with non-null values in the column possibly_null
.
COUNT(DISTINCT pk)
also counts the number of rows (because a primary key does not allow duplicates).
COUNT(DISTINCT possibly_null_or_dup)
counts the number of distinct non-null values in the column possibly_null_or_dup
.
COUNT(DISTINCT possibly_duplicated)
counts the number of distinct (necessarily non-null) values in the column possibly_duplicated
when that has the NOT NULL
clause on it.
Normally, I write COUNT(*)
; it is the original recommended notation for SQL. Similarly, with the EXISTS
clause, I normally write WHERE EXISTS(SELECT * FROM ...)
because that was the original recommend notation. There should be no benefit to the alternatives; the optimizer should see through the more obscure notations.
I feel the performance characteristics change from one DBMS to another. It's all on how they choose to implement it. Since I have worked extensively on Oracle, I'll tell from that perspective.
COUNT(*)
- Fetches entire row into result set before passing on to the count function, count function will aggregate 1 if the row is not null
COUNT(1)
- Will not fetch any row, instead count is called with a constant value of 1 for each row in the table when the WHERE
matches.
COUNT(PK)
- The PK in Oracle is indexed. This means Oracle has to read only the index. Normally one row in the index B+ tree is many times smaller than the actual row. So considering the disk IOPS rate, Oracle can fetch many times more rows from Index with a single block transfer as compared to entire row. This leads to higher throughput of the query.
From this you can see the first count is the slowest and the last count is the fastest in Oracle.
Books on line says "COUNT ( { [ [ ALL | DISTINCT ] expression ] | * } )
"
"1" is a non-null expression so it's the same as COUNT(*)
.
The optimiser recognises it as trivial so gives the same plan. A PK is unique and non-null (in SQL Server at least) so COUNT(PK)
= COUNT(*)
This is a similar myth to EXISTS (SELECT * ...
or EXISTS (SELECT 1 ...
And see the ANSI 92 spec, section 6.5, General Rules, case 1
a) If COUNT(*) is specified, then the result is the cardinality
of T.
b) Otherwise, let TX be the single-column table that is the
result of applying the <value expression> to each row of T
and eliminating null values. If one or more null values are
eliminated, then a completion condition is raised: warning-
null value eliminated in set function.
At least on Oracle they are all the same: http://www.oracledba.co.uk/tips/count_speed.htm
Source: Stackoverflow.com