[sql] Fastest check if row exists in PostgreSQL

I have a bunch of rows that I need to insert into table, but these inserts are always done in batches. So I want to check if a single row from the batch exists in the table because then I know they all were inserted.

So its not a primary key check, but shouldn't matter too much. I would like to only check single row so count(*) probably isn't good, so its something like exists I guess.

But since I'm fairly new to PostgreSQL I'd rather ask people who know.

My batch contains rows with following structure:

userid | rightid | remaining_count

So if table contains any rows with provided userid it means they all are present there.

This question is related to sql postgresql

The answer is


select true from tablename where condition limit 1;

I believe that this is the query that postgres uses for checking foreign keys.

In your case, you could do this in one go too:

insert into yourtable select $userid, $rightid, $count where not (select true from yourtable where userid = $userid limit 1);

I would like to propose another thought to specifically address your sentence: "So I want to check if a single row from the batch exists in the table because then I know they all were inserted."

You are making things efficient by inserting in "batches" but then doing existence checks one record at a time? This seems counter intuitive to me. So when you say "inserts are always done in batches" I take it you mean you are inserting multiple records with one insert statement. You need to realize that Postgres is ACID compliant. If you are inserting multiple records (a batch of data) with one insert statement, there is no need to check if some were inserted or not. The statement either passes or it will fail. All records will be inserted or none.

On the other hand, if your C# code is simply doing a "set" separate insert statements, for example, in a loop, and in your mind, this is a "batch" .. then you should not in fact describe it as "inserts are always done in batches". The fact that you expect that part of what you call a "batch", may actually not be inserted, and hence feel the need for a check, strongly suggests this is the case, in which case you have a more fundamental problem. You need change your paradigm to actually insert multiple records with one insert, and forego checking if the individual records made it.

Consider this example:

CREATE TABLE temp_test (
    id SERIAL PRIMARY KEY,
    sometext TEXT,
    userid INT,
    somethingtomakeitfail INT unique
)
-- insert a batch of 3 rows
;;
INSERT INTO temp_test (sometext, userid, somethingtomakeitfail) VALUES
('foo', 1, 1),
('bar', 2, 2),
('baz', 3, 3)
;;
-- inspect the data of what we inserted
SELECT * FROM temp_test
;;
-- this entire statement will fail .. no need to check which one made it
INSERT INTO temp_test (sometext, userid, somethingtomakeitfail) VALUES
('foo', 2, 4),
('bar', 2, 5),
('baz', 3, 3)  -- <<--(deliberately simulate a failure)
;;
-- check it ... everything is the same from the last successful insert ..
-- no need to check which records from the 2nd insert may have made it in
SELECT * FROM temp_test

This is in fact the paradigm for any ACID compliant DB .. not just Postgresql. In other words you are better off if you fix your "batch" concept and avoid having to do any row by row checks in the first place.


How about simply:

select 1 from tbl where userid = 123 limit 1;

where 123 is the userid of the batch that you're about to insert.

The above query will return either an empty set or a single row, depending on whether there are records with the given userid.

If this turns out to be too slow, you could look into creating an index on tbl.userid.

if even a single row from batch exists in table, in that case I don't have to insert my rows because I know for sure they all were inserted.

For this to remain true even if your program gets interrupted mid-batch, I'd recommend that you make sure you manage database transactions appropriately (i.e. that the entire batch gets inserted within a single transaction).


as @MikeM pointed out.

select exists(select 1 from contact where id=12)

with index on contact, it can usually reduce time cost to 1 ms.

CREATE INDEX index_contact on contact(id);

If you think about the performace ,may be you can use "PERFORM" in a function just like this:

 PERFORM 1 FROM skytf.test_2 WHERE id=i LIMIT 1;
  IF FOUND THEN
      RAISE NOTICE ' found record id=%', i;  
  ELSE
      RAISE NOTICE ' not found record id=%', i;  
 END IF;

SELECT 1 FROM user_right where userid = ? LIMIT 1

If your resultset contains a row then you do not have to insert. Otherwise insert your records.


INSERT INTO target( userid, rightid, count )
  SELECT userid, rightid, count 
  FROM batch
  WHERE NOT EXISTS (
    SELECT * FROM target t2, batch b2
    WHERE t2.userid = b2.userid
    -- ... other keyfields ...
    )       
    ;

BTW: if you want the whole batch to fail in case of a duplicate, then (given a primary key constraint)

INSERT INTO target( userid, rightid, count )
SELECT userid, rightid, count 
FROM batch
    ;

will do exactly what you want: either it succeeds, or it fails.