[sql] SQL WHERE ID IN (id1, id2, ..., idn)

I need to write a query to retrieve a big list of ids.

We do support many backends (MySQL, Firebird, SQLServer, Oracle, PostgreSQL ...) so I need to write a standard SQL.

The size of the id set could be big, the query would be generated programmatically. So, what is the best approach?

1) Writing a query using IN

SELECT * FROM TABLE WHERE ID IN (id1, id2, ..., idn)

My question here is. What happens if n is very big? Also, what about performance?

2) Writing a query using OR

SELECT * FROM TABLE WHERE ID = id1 OR ID = id2 OR ... OR ID = idn

I think that this approach does not have n limit, but what about performance if n is very big?

3) Writing a programmatic solution:

  foreach (var id in myIdList)
  {
      var item = GetItemByQuery("SELECT * FROM TABLE WHERE ID = " + id);
      myObjectList.Add(item);
  }

We experienced some problems with this approach when the database server is queried over the network. Normally is better to do one query that retrieve all results versus making a lot of small queries. Maybe I'm wrong.

What would be a correct solution for this problem?

This question is related to sql select

The answer is


What Ed Guiness suggested is really a performance booster , I had a query like this

select * from table where id in (id1,id2.........long list)

what i did :

DECLARE @temp table(
            ID  int
            )
insert into @temp 
select * from dbo.fnSplitter('#idlist#')

Then inner joined the temp with main table :

select * from table inner join temp on temp.id = table.id

And performance improved drastically.


In most database systems, IN (val1, val2, …) and a series of OR are optimized to the same plan.

The third way would be importing the list of values into a temporary table and join it which is more efficient in most systems, if there are lots of values.

You may want to read this articles:


Try this

SELECT Position_ID , Position_Name
FROM 
position
WHERE Position_ID IN (6 ,7 ,8)
ORDER BY Position_Name

Sample 3 would be the worst performer out of them all because you are hitting up the database countless times for no apparent reason.

Loading the data into a temp table and then joining on that would be by far the fastest. After that the IN should work slightly faster than the group of ORs.


I think you mean SqlServer but on Oracle you have a hard limit how many IN elements you can specify: 1000.


First option is definitely the best option.

SELECT * FROM TABLE WHERE ID IN (id1, id2, ..., idn)

However considering that the list of ids is very huge, say millions, you should consider chunk sizes like below:

  • Divide you list of Ids into chunks of fixed number, say 100
  • Chunk size should be decided based upon the memory size of your server
  • Suppose you have 10000 Ids, you will have 10000/100 = 100 chunks
  • Process one chunk at a time resulting in 100 database calls for select

Why should you divide into chunks?

You will never get memory overflow exception which is very common in scenarios like yours. You will have optimized number of database calls resulting in better performance.

It has always worked like charm for me. Hope it would work for my fellow developers as well :)


An alternative approach might be to use another table to contain id values. This other table can then be inner joined on your TABLE to constrain returned rows. This will have the major advantage that you won't need dynamic SQL (problematic at the best of times), and you won't have an infinitely long IN clause.

You would truncate this other table, insert your large number of rows, then perhaps create an index to aid the join performance. It would also let you detach the accumulation of these rows from the retrieval of data, perhaps giving you more options to tune performance.

Update: Although you could use a temporary table, I did not mean to imply that you must or even should. A permanent table used for temporary data is a common solution with merits beyond that described here.


Doing the SELECT * FROM MyTable where id in () command on an Azure SQL table with 500 million records resulted in a wait time of > 7min!

Doing this instead returned results immediately:

select b.id, a.* from MyTable a
join (values (250000), (2500001), (2600000)) as b(id)
ON a.id = b.id

Use a join.