[sql] SQL: how to select a single id ("row") that meets multiple criteria from a single column

I have a very narrow table: user_id, ancestry.

The user_id column is self explanatory.

The ancestry column contains the country from where the user's ancestors hail.

A user can have multiple rows on the table, as a user can have ancestors from multiple countries.

My question is this: how do I select users whose ancestors hail from multiple, specified countries?

For instance, show me all users who have ancestors from England, France and Germany, and return 1 row per user that met that criteria.

What is that SQL?

 user_id     ancestry

---------   ----------

    1        England
    1        Ireland
    2        France
    3        Germany
    3        Poland
    4        England
    4        France
    4        Germany
    5        France
    5        Germany

In the case of the data above, I would expect the result to be "4" as user_id 4 has ancestors from England, France and Germany.

Thanks in advance.

P.S. To clarify: Yes, the user_id / ancestry columns make a unique pair, so a country would not be repeated for a given user.

P.P.S. I am looking for users who hail from all 3 countries - England, France, AND Germany (and the countries are arbitrary).

P.P.P.S. I am not looking for answers specific to a certain RDBMS. I'm looking to answer this problem "in general."

I'm content w regenerating the where clause for each query provided generating the where clause can be done programmatically (e.g. that I can build a function to build the WHERE / FROM - WHERE clause).

This question is related to sql select

The answer is


one of the approach if you want to get all user_id that satisfies all conditions is:

SELECT DISTINCT user_id FROM table WHERE ancestry IN ('England', '...', '...') GROUP BY user_id HAVING count(*) = <number of conditions that has to be satisfied> 

etc. If you need to take all user_ids that satisfies at least one condition, then you can do

SELECT DISTINCT user_id from table where ancestry IN ('England', 'France', ... , '...')

I am not aware if there is something similar to IN but that joins conditions with AND instead of OR


Users who have one of the 3 countries

SELECT DISTINCT user_id
FROM table
WHERE ancestry IN('England','France','Germany')

Users who have all 3 countries

SELECT DISTINCT A.userID
FROM table A
   INNER JOIN table B on A.user_id = B.user_id
   INNER JOIN table C on A.user_id = C.user_id
WHERE A.ancestry = 'England'
   AND B.ancestry = 'Germany'
   AND C.ancestry = 'France'

brute force (and only tested on an Oracle system, but I think this is pretty standard):

select distinct usr_id from users where user_id in (
    select user_id from (
      Select user_id, Count(User_Id) As Cc
      From users 
      GROUP BY user_id
    ) Where Cc =3
  )
  and ancestry in ('England', 'France', 'Germany')
;

edit: I like @HuckIt's answer even better.


First way: JOIN:

get people with multiple countries:

SELECT u1.user_id 
FROM users u1
JOIN users u2
on u1.user_id  = u2.user_id 
AND u1.ancestry <> u2.ancestry

Get people from 2 specific countries:

SELECT u1.user_id 
FROM users u1
JOIN users u2
on u1.user_id  = u2.user_id 
WHERE u1.ancestry = 'Germany'
AND u2.ancestry = 'France'

For 3 countries... join three times. To only get the result(s) once, distinct.

Second way: GROUP BY

This will get users which have 3 lines (having...count) and then you specify which lines are permitted. Note that if you don't have a UNIQUE KEY on (user_id, ancestry), a user with 'id, england' that appears 3 times will also match... so it depends on your table structure and/or data.

SELECT user_id 
FROM users u1
WHERE ancestry = 'Germany'
OR ancestry = 'France'
OR ancestry = 'England'
GROUP BY user_id
HAVING count(DISTINCT ancestry) = 3

I was having a similar issue like yours, except that I wanted a specific subset of 'ancestry'. Hong Ning's query was a good start, except it will return combined records containing duplicates and/or extra ancestries (e.g. it would also return someone with ancestries ('England', 'France', 'Germany', 'Netherlands') and ('England', 'France', 'England'). Supposing you'd want just the three and only the three, you'd need the following query:

SELECT Src.user_id
FROM yourtable Src
WHERE ancestry in ('England', 'France', 'Germany')
    AND EXISTS (
        SELECT user_id
        FROM dbo.yourtable
        WHERE user_id = Src.user_id
        GROUP BY user_id
        HAVING COUNT(DISTINCT ancestry) = 3
        )
GROUP BY user_id
HAVING COUNT(DISTINCT ancestry) = 3

SELECT DISTINCT (user_id) 
FROM [user]
WHERE user.user_id In (select user_id from user where ancestry = 'England') 
    And user.user_id In (select user_id from user where ancestry = 'France') 
    And user.user_id In (select user_id from user where ancestry = 'Germany');`

like the answer above but I have a duplicate record so I have to create a subquery with distinct

Select user_id
(
   select distinct userid
   from yourtable
   where user_id = @userid

) t1
where 
ancestry in ('England', 'France', 'Germany')
group by user_id
having count(user_id) = 3

this is what I used because I have multiple record(download logs) and this checks that all the required files have been downloaded


Try this:

Select user_id
from yourtable
where ancestry in ('England', 'France', 'Germany')
group by user_id
having count(user_id) = 3

The last line means the user's ancestry has all 3 countries.


This question is some years old but i came via a duplicate to it. I want to suggest a more general solution too. If you know you always have a fixed number of ancestors you can use some self joins as already suggested in the answers. If you want a generic approach go on reading.

What you need here is called Quotient in relational Algebra. The Quotient is more or less the reversal of the Cartesian Product (or Cross Join in SQL).

Let's say your ancestor set A is (i use a table notation here, i think this is better for understanding)

ancestry
-----------
'England'
'France'
'Germany'

and your user set U is

user_id
--------
   1
   2
   3

The cartesian product C=AxU is then:

user_id  |  ancestry
---------+-----------
   1     | 'England'
   1     | 'France'
   1     | 'Germany'
   2     | 'England'
   2     | 'France'
   2     | 'Germany'
   3     | 'England'
   3     | 'France'
   3     | 'Germany'

If you calculate the set quotient U=C/A then you get

user_id
--------
   1
   2
   3

If you redo the cartesian product UXA you will get C again. But note that for a set T, (T/A)xA will not necessarily reproduce T. For example, if T is

user_id  |  ancestry
---------+-----------
   1     | 'England'
   1     | 'France'
   1     | 'Germany'
   2     | 'England'
   2     | 'France'

then (T/A) is

user_id
--------
   1

(T/A)xA will then be

user_id  |  ancestry
---------+------------
   1     | 'England'
   1     | 'France'
   1     | 'Germany'

Note that the records for user_id=2 have been eliminated by the Quotient and Cartesian Product operations.

Your question is: Which user_id has ancestors from all countries in your ancestor set? In other words you want U=T/A where T is your original set (or your table).

To implement the quotient in SQL you have to do 4 steps:

  1. Create the Cartesian Product of your ancestry set and the set of all user_ids.
  2. Find all records in the Cartesian Product which have no partner in the original set (Left Join)
  3. Extract the user_ids from the resultset of 2)
  4. Return all user_ids from the original set which are not included in the result set of 3)

So let's do it step by step. I will use TSQL syntax (Microsoft SQL server) but it should easily be adaptable to other DBMS. As a name for the table (user_id, ancestry) i choose ancestor

CREATE TABLE ancestry_set (ancestry nvarchar(25))
INSERT INTO ancestry_set (ancestry) VALUES ('England')
INSERT INTO ancestry_set (ancestry) VALUES ('France')
INSERT INTO ancestry_set (ancestry) VALUES ('Germany')

CREATE TABLE ancestor ([user_id] int, ancestry nvarchar(25))
INSERT INTO ancestor ([user_id],ancestry) VALUES (1,'England')
INSERT INTO ancestor ([user_id],ancestry) VALUES(1,'Ireland')
INSERT INTO ancestor ([user_id],ancestry) VALUES(2,'France')
INSERT INTO ancestor ([user_id],ancestry) VALUES(3,'Germany')
INSERT INTO ancestor ([user_id],ancestry) VALUES(3,'Poland')
INSERT INTO ancestor ([user_id],ancestry) VALUES(4,'England')
INSERT INTO ancestor ([user_id],ancestry) VALUES(4,'France')
INSERT INTO ancestor ([user_id],ancestry) VALUES(4,'Germany')
INSERT INTO ancestor ([user_id],ancestry) VALUES(5,'France')
INSERT INTO ancestor ([user_id],ancestry) VALUES(5,'Germany')

1) Create the Cartesian Product of your ancestry set and the set of all user_ids.

SELECT a.[user_id],s.ancestry
FROM ancestor a, ancestry_set s
GROUP BY a.[user_id],s.ancestry

2) Find all records in the Cartesian Product which have no partner in the original set (Left Join) and

3) Extract the user_ids from the resultset of 2)

SELECT DISTINCT cp.[user_id]
FROM (SELECT a.[user_id],s.ancestry
      FROM ancestor a, ancestry_set s
      GROUP BY a.[user_id],s.ancestry) cp
   LEFT JOIN ancestor a ON cp.[user_id]=a.[user_id] AND cp.ancestry=a.ancestry
WHERE a.[user_id] is null

4) Return all user_ids from the original set which are not included in the result set of 3)

SELECT DISTINCT [user_id]
FROM ancestor
WHERE [user_id] NOT IN (
   SELECT DISTINCT cp.[user_id]
   FROM (SELECT a.[user_id],s.ancestry
         FROM ancestor a, ancestry_set s
         GROUP BY a.[user_id],s.ancestry) cp
   LEFT JOIN ancestor a ON cp.[user_id]=a.[user_id] AND cp.ancestry=a.ancestry
   WHERE a.[user_id] is null
   )