[sql] Select rows having 2 columns equal value

Consider following table

C1    || C2  || C3  || C4
--------------------------
1     || a   || b   || 1
2     || a   || b   || 4
3     || b   || d   || 2
4     || b   || d   || 2

Question 1: Select all rows in which Column C2 , C3, C4 have equal values e.g. select row 3 and row 4 in above example.

Question 2 : Select all rows in which C4 column has duplicates e.g. C4 has value 2 in row 3 and row 4, so select row 3 and 4.

This question is related to sql

The answer is


select * from test;
a1  a2  a3
1   1   2
1   2   2
2   1   2

select t1.a3 from test t1, test t2 where t1.a1 = t2.a1 and t2.a2 = t1.a2 and t1.a1 = t2.a2

a3
1

You can try same thing using Joins too..


SELECT t1.* FROM table t1 JOIN table t2 ON t1.Id=t2.Id WHERE t1.C4=t2.C4;

Giving Accurate Result for me.


Select * from tablename t1, tablename t2, tablename t3 
where t1.C1 = t2.c2 and t2.c2 = t3.c3 

Seems like this will work. Though does not seems like an efficient way.


SELECT *
FROM my_table
WHERE column_a <=> column_b AND column_a <=> column_c

For question 1:

SELECT DISTINCT a.*
  FROM [Table] a
  INNER JOIN
  [Table] b
  ON
  a.C1 <> b.C1 AND a.C2 = b.C2 AND a.C3 = b.C3 AND a.C4 = b.C4

Using an inner join is much more efficient than a subquery because it requires fewer operations, and maintains the use of indexes when comparing the values, allowing the SQL server to better optimize the query before its run. Using appropriate indexes with this query can bring your query down to only n * log(n) rows to compare.

Using a subquery with your where clause or only doing a standard join where C1 does not equal C2 results in a table that has roughly 2 to the power of n rows to compare, where n is the number of rows in the table.

So by using proper indexing with an Inner Join, which only returns records which met the join criteria, we're able to drastically improve the performance. Also note that we return DISTINCT a.*, because this will only return the columns for table a where the join criteria was met. Returning * would return the columns for both a and b where the criteria was met, and not including DISTINCT would result in a duplicate of each row for each time that row row matched another row more than once.

A similar approach could also be performed using CROSS APPLY, which still uses a subquery, but makes use of indexes more efficiently.

An implementation with the keyword USING instead of ON could also work, but the syntax is more complicated to make work because your want to match on rows where C1 does not match, so you would need an additional where clause to filter out matching each row with itself. Also, USING is not compatible/allowed in conjunction with table values in all implementations of SQL, so it's best to stick with ON.

Similarly, for question 2:

SELECT DISTINCT a.*
  FROM [Table] a
  INNER JOIN
  [Table] b
  ON
  a.C1 <> b.C1 AND a.C4 = b.C4

This is essentially the same query as for 1, but because it only wants to know which rows match for C4, we only compare on the rows for C4.


select t.* from table t
    join (
        select C2, C3, C4
        from table
        group by C2, C3, C4
        having count(*) > 1
    ) t2
    using (C2, C3, C4);

Actually this would go faster in most of cases:

SELECT *
FROM table ta1
JOIN table ta2 on ta1.id != ta2.id
WHERE ta1.c2 = ta2.c2 and ta1.c3 = ta2.c3 and ta1.c4 = ta2.c4

You join on different rows which have the same values. I think it should work. Correct me if I'm wrong.