It's easy to find duplicates with one field:
SELECT name, COUNT(email)
FROM users
GROUP BY email
HAVING COUNT(email) > 1
So if we have a table
ID NAME EMAIL
1 John [email protected]
2 Sam [email protected]
3 Tom [email protected]
4 Bob [email protected]
5 Tom [email protected]
This query will give us John, Sam, Tom, Tom because they all have the same email
.
However, what I want is to get duplicates with the same email
and name
.
That is, I want to get "Tom", "Tom".
The reason I need this: I made a mistake, and allowed to insert duplicate name
and email
values. Now I need to remove/change the duplicates, so I need to find them first.
This question is related to
sql
duplicates
Try this:
SELECT name, email
FROM users
GROUP BY name, email
HAVING ( COUNT(*) > 1 )
The exact code would differ depending on whether you want to find duplicate rows as well or only different ids with the same email and name. If id is a primary key or otherwise has a unique constraint this distinction does not exist, but the question does not specify this. In the former case you can use code given in several other answers:
SELECT name, email, COUNT(*)
FROM users
GROUP BY name, email
HAVING COUNT(*) > 1
In the latter case you would use:
SELECT name, email, COUNT(DISTINCT id)
FROM users
GROUP BY name, email
HAVING COUNT(DISTINCT id) > 1
ORDER BY COUNT(DISTINCT id) DESC
Table structure:
ID NAME EMAIL
1 John [email protected]
2 Sam [email protected]
3 Tom [email protected]
4 Bob [email protected]
5 Tom [email protected]
Solution 1:
SELECT *,
COUNT(*)
FROM users t1
INNER JOIN users t2
WHERE t1.id > t2.id
AND t1.name = t2.name
AND t1.email=t2.email
Solution 2:
SELECT name,
email,
COUNT(*)
FROM users
GROUP BY name,
email
HAVING COUNT(*) > 1
SELECT name, email,COUNT(email)
FROM users
WHERE email IN (
SELECT email
FROM users
GROUP BY email
HAVING COUNT(email) > 1)
To Check From duplicate Record in a table.
select * from users s
where rowid < any
(select rowid from users k where s.name = k.name and s.email = k.email);
or
select * from users s
where rowid not in
(select max(rowid) from users k where s.name = k.name and s.email = k.email);
To Delete the duplicate record in a table.
delete from users s
where rowid < any
(select rowid from users k where s.name = k.name and s.email = k.email);
or
delete from users s
where rowid not in
(select max(rowid) from users k where s.name = k.name and s.email = k.email);
Try the following:
SELECT * FROM
(
SELECT Id, Name, Age, Comments, Row_Number() OVER(PARTITION BY Name, Age ORDER By Name)
AS Rank
FROM Customers
) AS B WHERE Rank>1
select id,name,COUNT(*) from user group by Id,Name having COUNT(*)>1
This should also work, maybe give it try.
Select * from Users a
where EXISTS (Select * from Users b
where ( a.name = b.name
OR a.email = b.email)
and a.ID != b.id)
Especially good in your case If you search for duplicates who have some kind of prefix or general change like e.g. new domain in mail. then you can use replace() at these columns
SELECT name, email
FROM users
WHERE email in
(SELECT email FROM users
GROUP BY email
HAVING COUNT(*)>1)
You can use the SELECT DISTINCT keyword to get rid of duplicates. You can also filter by name and get everyone with that name on a table.
In case you work with Microsoft Access, this way works:
CREATE TABLE users (id int, name varchar(10), email varchar(50));
INSERT INTO users VALUES (1, 'John', '[email protected]');
INSERT INTO users VALUES (2, 'Sam', '[email protected]');
INSERT INTO users VALUES (3, 'Tom', '[email protected]');
INSERT INTO users VALUES (4, 'Bob', '[email protected]');
INSERT INTO users VALUES (5, 'Tom', '[email protected]');
SELECT name, email, COUNT(*) AS CountOf
FROM users
GROUP BY name, email
HAVING COUNT(*)>1;
DELETE *
FROM users
WHERE id IN (
SELECT u1.id
FROM users u1, users u2
WHERE u1.name = u2.name AND u1.email = u2.email AND u1.id > u2.id
);
Thanks to Tancrede Chazallet for the delete code.
SELECT column_name,COUNT(*) FROM TABLE_NAME GROUP BY column1, HAVING COUNT(*) > 1;
We can use having here which work on aggregate functions as shown below
create table #TableB (id_account int, data int, [date] date)
insert into #TableB values (1 ,-50, '10/20/2018'),
(1, 20, '10/09/2018'),
(2 ,-900, '10/01/2018'),
(1 ,20, '09/25/2018'),
(1 ,-100, '08/01/2018')
SELECT id_account , data, COUNT(*)
FROM #TableB
GROUP BY id_account , data
HAVING COUNT(id_account) > 1
drop table #TableB
Here as two fields id_account and data are used with Count(*). So, it will give all the records which has more than one times same values in both columns.
We some reason mistakely we had missed to add any constraints in SQL server table and the records has been inserted duplicate in all columns with front-end application. Then we can use below query to delete duplicate query from table.
SELECT DISTINCT * INTO #TemNewTable FROM #OriginalTable
TRUNCATE TABLE #OriginalTable
INSERT INTO #OriginalTable SELECT * FROM #TemNewTable
DROP TABLE #TemNewTable
Here we have taken all the distinct records of the orignal table and deleted the records of original table. Again we inserted all the distinct values from new table to the original table and then deleted new table.
select emp.ename, emp.empno, dept.loc
from emp
inner join dept
on dept.deptno=emp.deptno
inner join
(select ename, count(*) from
emp
group by ename, deptno
having count(*) > 1)
t on emp.ename=t.ename order by emp.ename
/
try this code
WITH CTE AS
( SELECT Id, Name, Age, Comments, RN = ROW_NUMBER()OVER(PARTITION BY Name,Age ORDER BY ccn)
FROM ccnmaster )
select * from CTE
A little late to the party but I found a really cool workaround to finding all duplicate IDs:
SELECT GROUP_CONCAT( id )
FROM users
GROUP BY email
HAVING ( COUNT(email) > 1 )
You may want to try this
SELECT NAME, EMAIL, COUNT(*)
FROM USERS
GROUP BY 1,2
HAVING COUNT(*) > 1
How we can count the duplicated values?? either it is repeated 2 times or greater than 2. just count them, not group wise.
as simple as
select COUNT(distinct col_01) from Table_01
This is the easy thing I've come up with. It uses a common table expression (CTE) and a partition window (I think these features are in SQL 2008 and later).
This example finds all students with duplicate name and dob. The fields you want to check for duplication go in the OVER clause. You can include any other fields you want in the projection.
with cte (StudentId, Fname, LName, DOB, RowCnt)
as (
SELECT StudentId, FirstName, LastName, DateOfBirth as DOB, SUM(1) OVER (Partition By FirstName, LastName, DateOfBirth) as RowCnt
FROM tblStudent
)
SELECT * from CTE where RowCnt > 1
ORDER BY DOB, LName
By Using CTE also we can find duplicate value like this
with MyCTE
as
(
select Name,EmailId,ROW_NUMBER() over(PARTITION BY EmailId order by id) as Duplicate from [Employees]
)
select * from MyCTE where Duplicate>1
To delete records whose names are duplicate
;WITH CTE AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY name ORDER BY name) AS T FROM @YourTable
)
DELETE FROM CTE WHERE T > 1
If you want to find duplicate data (by one or several criterias) and select the actual rows.
with MYCTE as (
SELECT DuplicateKey1
,DuplicateKey2 --optional
,count(*) X
FROM MyTable
group by DuplicateKey1, DuplicateKey2
having count(*) > 1
)
SELECT E.*
FROM MyTable E
JOIN MYCTE cte
ON E.DuplicateKey1=cte.DuplicateKey1
AND E.DuplicateKey2=cte.DuplicateKey2
ORDER BY E.DuplicateKey1, E.DuplicateKey2, CreatedAt
http://developer.azurewebsites.net/2014/09/better-sql-group-by-find-duplicate-data/
This selects/deletes all duplicate records except one record from each group of duplicates. So, the delete leaves all unique records + one record from each group of the duplicates.
Select duplicates:
SELECT *
FROM table
WHERE
id NOT IN (
SELECT MIN(id)
FROM table
GROUP BY column1, column2
);
Delete duplicates:
DELETE FROM table
WHERE
id NOT IN (
SELECT MIN(id)
FROM table
GROUP BY column1, column2
);
Be aware of larger amounts of records, it can cause performance problems.
Another easy way you can try this using analytic function as well:
SELECT * from
(SELECT name, email,
COUNT(name) OVER (PARTITION BY name, email) cnt
FROM users)
WHERE cnt >1;
The most important thing here is to have the fastest function. Also indices of duplicates should be identified. Self join is a good option but to have a faster function it is better to first find rows that have duplicates and then join with original table for finding id of duplicated rows. Finally order by any column except id to have duplicated rows near each other.
SELECT u.*
FROM users AS u
JOIN (SELECT username, email
FROM users
GROUP BY username, email
HAVING COUNT(*)>1) AS w
ON u.username=w.username AND u.email=w.email
ORDER BY u.email;
In case you work with Oracle, this way would be preferable:
create table my_users(id number, name varchar2(100), email varchar2(100));
insert into my_users values (1, 'John', '[email protected]');
insert into my_users values (2, 'Sam', '[email protected]');
insert into my_users values (3, 'Tom', '[email protected]');
insert into my_users values (4, 'Bob', '[email protected]');
insert into my_users values (5, 'Tom', '[email protected]');
commit;
select *
from my_users
where rowid not in (select min(rowid) from my_users group by name, email);
SELECT id, COUNT(id) FROM table1 GROUP BY id HAVING COUNT(id)>1;
I think this will work properly to search repeated values in a particular column.
How to get duplicate record in table
SELECT COUNT(EmpCode),EmpCode FROM tbl_Employees WHERE Status=1
GROUP BY EmpCode HAVING COUNT(EmpCode) > 1
If you want to delete the duplicates, here's a much simpler way to do it than having to find even/odd rows into a triple sub-select:
SELECT id, name, email
FROM users u, users u2
WHERE u.name = u2.name AND u.email = u2.email AND u.id > u2.id
And so to delete:
DELETE FROM users
WHERE id IN (
SELECT id/*, name, email*/
FROM users u, users u2
WHERE u.name = u2.name AND u.email = u2.email AND u.id > u2.id
)
Much more easier to read and understand IMHO
Note: The only issue is that you have to execute the request until there is no rows deleted, since you delete only 1 of each duplicate each time
If you wish to see if there is any duplicate rows in your table, I used below Query:
create table my_table(id int, name varchar(100), email varchar(100));
insert into my_table values (1, 'shekh', '[email protected]');
insert into my_table values (1, 'shekh', '[email protected]');
insert into my_table values (2, 'Aman', '[email protected]');
insert into my_table values (3, 'Tom', '[email protected]');
insert into my_table values (4, 'Raj', '[email protected]');
Select COUNT(1) As Total_Rows from my_table
Select Count(1) As Distinct_Rows from ( Select Distinct * from my_table) abc
SELECT * FROM users u where rowid = (select max(rowid) from users u1 where
u.email=u1.email);
select name, email
, case
when ROW_NUMBER () over (partition by name, email order by name) > 1 then 'Yes'
else 'No'
end "duplicated ?"
from users
try this:
declare @YourTable table (id int, name varchar(10), email varchar(50))
INSERT @YourTable VALUES (1,'John','John-email')
INSERT @YourTable VALUES (2,'John','John-email')
INSERT @YourTable VALUES (3,'fred','John-email')
INSERT @YourTable VALUES (4,'fred','fred-email')
INSERT @YourTable VALUES (5,'sam','sam-email')
INSERT @YourTable VALUES (6,'sam','sam-email')
SELECT
name,email, COUNT(*) AS CountOf
FROM @YourTable
GROUP BY name,email
HAVING COUNT(*)>1
OUTPUT:
name email CountOf
---------- ----------- -----------
John John-email 2
sam sam-email 2
(2 row(s) affected)
if you want the IDs of the dups use this:
SELECT
y.id,y.name,y.email
FROM @YourTable y
INNER JOIN (SELECT
name,email, COUNT(*) AS CountOf
FROM @YourTable
GROUP BY name,email
HAVING COUNT(*)>1
) dt ON y.name=dt.name AND y.email=dt.email
OUTPUT:
id name email
----------- ---------- ------------
1 John John-email
2 John John-email
5 sam sam-email
6 sam sam-email
(4 row(s) affected)
to delete the duplicates try:
DELETE d
FROM @YourTable d
INNER JOIN (SELECT
y.id,y.name,y.email,ROW_NUMBER() OVER(PARTITION BY y.name,y.email ORDER BY y.name,y.email,y.id) AS RowRank
FROM @YourTable y
INNER JOIN (SELECT
name,email, COUNT(*) AS CountOf
FROM @YourTable
GROUP BY name,email
HAVING COUNT(*)>1
) dt ON y.name=dt.name AND y.email=dt.email
) dt2 ON d.id=dt2.id
WHERE dt2.RowRank!=1
SELECT * FROM @YourTable
OUTPUT:
id name email
----------- ---------- --------------
1 John John-email
3 fred John-email
4 fred fred-email
5 sam sam-email
(4 row(s) affected)
Source: Stackoverflow.com