[sql] How to Select Every Row Where Column Value is NOT Distinct

I need to run a select statement that returns all rows where the value of a column is not distinct (e.g. EmailAddress).

For example, if the table looks like below:

CustomerName     EmailAddress
Aaron            [email protected]
Christy          [email protected]
Jason            [email protected]
Eric             [email protected]
John             [email protected]

I need the query to return:

Aaron            [email protected]
Christy          [email protected]
John             [email protected]

I have read many posts and tried different queries to no avail. The query that I believe should work is below. Can someone suggest an alternative or tell me what may be wrong with my query?

select EmailAddress, CustomerName from Customers
group by EmailAddress, CustomerName
having COUNT(distinct(EmailAddress)) > 1

This question is related to sql sql-server sql-server-2008

The answer is


This is significantly faster than the EXISTS way:

SELECT [EmailAddress], [CustomerName] FROM [Customers] WHERE [EmailAddress] IN
  (SELECT [EmailAddress] FROM [Customers] GROUP BY [EmailAddress] HAVING COUNT(*) > 1)

How about

SELECT EmailAddress, CustomerName FROM Customers a
WHERE Exists ( SELECT emailAddress FROM customers c WHERE a.customerName != c.customerName AND a.EmailAddress = c.EmailAddress)

select CustomerName,count(1) from Customers group by CustomerName having count(1) > 1

The thing that is incorrect with your query is that you are grouping by email and name, that forms a group of each unique set of email and name combined together and hence

aaron and [email protected]
christy and [email protected]
john and [email protected]

are treated as 3 different groups rather all belonging to 1 single group.

Please use the query as given below :

select emailaddress,customername from customers where emailaddress in
(select emailaddress from customers group by emailaddress having count(*) > 1)

Rather than using sub queries in where condition which will increase the query time where records are huge.

I would suggest to use Inner Join as a better option to this problem.

Considering the same table this could give the result

SELECT EmailAddress, CustomerName FROM Customers as a 
Inner Join Customers as b on a.CustomerName <> b.CustomerName and a.EmailAddress = b.EmailAddress

For still better results I would suggest you to use CustomerID or any unique field of your table. Duplication of CustomerName is possible.


Just for fun, here's another way:

;with counts as (
    select CustomerName, EmailAddress,
      count(*) over (partition by EmailAddress) as num
    from Customers
)
select CustomerName, EmailAddress
from counts
where num > 1

Examples related to sql

Passing multiple values for same variable in stored procedure SQL permissions for roles Generic XSLT Search and Replace template Access And/Or exclusions Pyspark: Filter dataframe based on multiple conditions Subtracting 1 day from a timestamp date PYODBC--Data source name not found and no default driver specified select rows in sql with latest date for each ID repeated multiple times ALTER TABLE DROP COLUMN failed because one or more objects access this column Create Local SQL Server database

Examples related to sql-server

Passing multiple values for same variable in stored procedure SQL permissions for roles Count the Number of Tables in a SQL Server Database Visual Studio 2017 does not have Business Intelligence Integration Services/Projects ALTER TABLE DROP COLUMN failed because one or more objects access this column Create Local SQL Server database How to create temp table using Create statement in SQL Server? SQL Query Where Date = Today Minus 7 Days How do I pass a list as a parameter in a stored procedure? SQL Server date format yyyymmdd

Examples related to sql-server-2008

Violation of PRIMARY KEY constraint. Cannot insert duplicate key in object How to Use Multiple Columns in Partition By And Ensure No Duplicate Row is Returned SQL Server : How to test if a string has only digit characters Conversion of a varchar data type to a datetime data type resulted in an out-of-range value in SQL query Get last 30 day records from today date in SQL Server How to subtract 30 days from the current date using SQL Server Calculate time difference in minutes in SQL Server SQL Connection Error: System.Data.SqlClient.SqlException (0x80131904) SQL Server Service not available in service list after installation of SQL Server Management Studio How to delete large data of table in SQL without log?