I've got a couple of duplicates in a database that I want to inspect, so what I did to see which are duplicates, I did this:
SELECT relevant_field
FROM some_table
GROUP BY relevant_field
HAVING COUNT(*) > 1
This way, I will get all rows with relevant_field occuring more than once. This query takes milliseconds to execute.
Now, I wanted to inspect each of the duplicates, so I thought I could SELECT each row in some_table with a relevant_field in the above query, so I did like this:
SELECT *
FROM some_table
WHERE relevant_field IN
(
SELECT relevant_field
FROM some_table
GROUP BY relevant_field
HAVING COUNT(*) > 1
)
This turns out to be extreeeemely slow for some reason (it takes minutes). What exactly is going on here to make it that slow? relevant_field is indexed.
Eventually I tried creating a view "temp_view" from the first query (SELECT relevant_field FROM some_table GROUP BY relevant_field HAVING COUNT(*) > 1)
, and then making my second query like this instead:
SELECT *
FROM some_table
WHERE relevant_field IN
(
SELECT relevant_field
FROM temp_view
)
And that works just fine. MySQL does this in some milliseconds.
Any SQL experts here who can explain what's going on?
Firstly you can find duplicate rows and find count of rows is used how many times and order it by number like this;
SELECT q.id,q.name,q.password,q.NID,(select count(*) from UserInfo k where k.NID= q.NID) as Count,_x000D_
(_x000D_
CASE q.NID_x000D_
WHEN @curCode THEN_x000D_
@curRow := @curRow + 1_x000D_
ELSE_x000D_
@curRow := 1_x000D_
AND @curCode := q.NID_x000D_
END_x000D_
) AS No_x000D_
FROM UserInfo q,_x000D_
(_x000D_
SELECT_x000D_
@curRow := 1,_x000D_
@curCode := ''_x000D_
) rt_x000D_
WHERE q.NID IN_x000D_
(_x000D_
SELECT NID_x000D_
FROM UserInfo_x000D_
GROUP BY NID_x000D_
HAVING COUNT(*) > 1_x000D_
)
_x000D_
after that create a table and insert result to it.
create table CopyTable _x000D_
SELECT q.id,q.name,q.password,q.NID,(select count(*) from UserInfo k where k.NID= q.NID) as Count,_x000D_
(_x000D_
CASE q.NID_x000D_
WHEN @curCode THEN_x000D_
@curRow := @curRow + 1_x000D_
ELSE_x000D_
@curRow := 1_x000D_
AND @curCode := q.NID_x000D_
END_x000D_
) AS No_x000D_
FROM UserInfo q,_x000D_
(_x000D_
SELECT_x000D_
@curRow := 1,_x000D_
@curCode := ''_x000D_
) rt_x000D_
WHERE q.NID IN_x000D_
(_x000D_
SELECT NID_x000D_
FROM UserInfo_x000D_
GROUP BY NID_x000D_
HAVING COUNT(*) > 1_x000D_
)
_x000D_
Finally, delete dublicate rows.No is start 0. Except fist number of each group delete all dublicate rows.
delete from CopyTable where No!= 0;
_x000D_
Try this
SELECT t1.*
FROM
some_table t1,
(SELECT relevant_field
FROM some_table
GROUP BY relevant_field
HAVING COUNT (*) > 1) t2
WHERE
t1.relevant_field = t2.relevant_field;
I find this to be the most efficient for finding if a value exists, logic can easily be inverted to find if a value doesn't exist (ie IS NULL);
SELECT * FROM primary_table st1
LEFT JOIN comparision_table st2 ON (st1.relevant_field = st2.relevant_field)
WHERE st2.primaryKey IS NOT NULL
*Replace relevant_field with the name of the value that you want to check exists in your table
*Replace primaryKey with the name of the primary key column on the comparison table.
sometimes when data grow bigger mysql WHERE IN's could be pretty slow because of query optimization. Try using STRAIGHT_JOIN to tell mysql to execute query as is, e.g.
SELECT STRAIGHT_JOIN table.field FROM table WHERE table.id IN (...)
but beware: in most cases mysql optimizer works pretty well, so I would recommend to use it only when you have this kind of problem
Rewrite the query into this
SELECT st1.*, st2.relevant_field FROM sometable st1
INNER JOIN sometable st2 ON (st1.relevant_field = st2.relevant_field)
GROUP BY st1.id /* list a unique sometable field here*/
HAVING COUNT(*) > 1
I think st2.relevant_field
must be in the select, because otherwise the having
clause will give an error, but I'm not 100% sure
Never use IN
with a subquery; this is notoriously slow.
Only ever use IN
with a fixed list of values.
More tips
SELECT *
only select
the fields that you really need.relevant_field
to speed up the equi-join.group by
on the primary key. General solution for 90% of your IN (select
queries
Use this code
SELECT * FROM sometable a WHERE EXISTS (
SELECT 1 FROM sometable b
WHERE a.relevant_field = b.relevant_field
GROUP BY b.relevant_field
HAVING count(*) > 1)
This is similar to my case, where I have a table named tabel_buku_besar
. What I need are
Looking for record that have account_code='101.100'
in tabel_buku_besar
which have companyarea='20000'
and also have IDR
as currency
I need to get all record from tabel_buku_besar
which have account_code same as step 1 but have transaction_number
in step 1 result
while using select ... from...where....transaction_number in (select transaction_number from ....)
, my query running extremely slow and sometimes causing request time out or make my application not responding...
I try this combination and the result...not bad...
`select DATE_FORMAT(L.TANGGAL_INPUT,'%d-%m-%y') AS TANGGAL,
L.TRANSACTION_NUMBER AS VOUCHER,
L.ACCOUNT_CODE,
C.DESCRIPTION,
L.DEBET,
L.KREDIT
from (select * from tabel_buku_besar A
where A.COMPANYAREA='$COMPANYAREA'
AND A.CURRENCY='$Currency'
AND A.ACCOUNT_CODE!='$ACCOUNT'
AND (A.TANGGAL_INPUT BETWEEN STR_TO_DATE('$StartDate','%d/%m/%Y') AND STR_TO_DATE('$EndDate','%d/%m/%Y'))) L
INNER JOIN (select * from tabel_buku_besar A
where A.COMPANYAREA='$COMPANYAREA'
AND A.CURRENCY='$Currency'
AND A.ACCOUNT_CODE='$ACCOUNT'
AND (A.TANGGAL_INPUT BETWEEN STR_TO_DATE('$StartDate','%d/%m/%Y') AND STR_TO_DATE('$EndDate','%d/%m/%Y'))) R ON R.TRANSACTION_NUMBER=L.TRANSACTION_NUMBER AND R.COMPANYAREA=L.COMPANYAREA
LEFT OUTER JOIN master_account C ON C.ACCOUNT_CODE=L.ACCOUNT_CODE AND C.COMPANYAREA=L.COMPANYAREA
ORDER BY L.TANGGAL_INPUT,L.TRANSACTION_NUMBER`
SELECT st1.*
FROM some_table st1
inner join
(
SELECT relevant_field
FROM some_table
GROUP BY relevant_field
HAVING COUNT(*) > 1
)st2 on st2.relevant_field = st1.relevant_field;
I've tried your query on one of my databases, and also tried it rewritten as a join to a sub-query.
This worked a lot faster, try it!
I have reformatted your slow sql query with www.prettysql.net
SELECT *
FROM some_table
WHERE
relevant_field in
(
SELECT relevant_field
FROM some_table
GROUP BY relevant_field
HAVING COUNT ( * ) > 1
);
When using a table in both the query and the subquery, you should always alias both, like this:
SELECT *
FROM some_table as t1
WHERE
t1.relevant_field in
(
SELECT t2.relevant_field
FROM some_table as t2
GROUP BY t2.relevant_field
HAVING COUNT ( t2.relevant_field ) > 1
);
Does that help?
Source: Stackoverflow.com