SQL Server insert if not exists best practice

Question

I have a Competitions results table which holds team member s names and their ranking on one hand   On the other hand I need to maintain a table of unique competitors names   CREATE TABLE Competitors  cName nvarchar 64  primary key    Now I have some 200 000 results in the 1st table and when the competitors table is empty I can perform this    INSERT INTO Competitors SELECT DISTINCT Name FROM CompResults   And the query only takes some 5 seconds to insert about 11 000 names   So far this is not a critical application so I can consider truncate the Competitors table once a month  when I receive the new competition results with some 10 000 rows   But what is the best practice when new results are added  with new AND existing competitors  I don t want to truncate existing competitors table  I need to perform INSERT statement for new competitors only and do nothing if they exists

User · Answer

Normalizing your operational tables as suggested by Transact Charlie  is a good idea  and will save many headaches and problems over time - but there are such things as interface tables  which support integration with external systems  and reporting tables  which support things like analytical processing  and those types of tables should not necessarily be normalized - in fact  very often it is much  much more convenient and performant for them to not be   In this case  I think Transact Charlie s proposal for your operational tables is a good one    But I would add an index  not necessarily unique  to CompetitorName in the Competitors table to support efficient joins on CompetitorName for the purposes of integration  loading of data from external sources   and I would put an interface table into the mix  CompetitionResults   CompetitionResults should contain whatever data your competition results have in it  The point of an interface table like this one is to make it as quick and easy as possible to truncate and reload it from an Excel sheet or a CSV file  or whatever form you have that data in    That interface table should not be considered part of the normalized set of operational tables  Then you can join with CompetitionResults as suggested by Richard  to insert records into Competitors that don t already exist  and update the ones that do  for example if you actually have more information about competitors  like their phone number or email address    One thing I would note - in reality  Competitor Name  it seems to me  is very unlikely to be unique in your data  In 200 000 competitors  you may very well have 2 or more David Smiths  for example  So I would recommend that you collect more information from competitors  such as their phone number or an email address  or something which is more likely to be unique   Your operational table  Competitors  should just have one column for each data item that contributes to a composite natural key  for example it should have one column for a primary email address  But the interface table should have a slot for old and new values for a primary email address  so that the old value can be use to look up the record in Competitors and update that part of it to the new value   So CompetitionResults should have some  old  and  new  fields - oldEmail  newEmail  oldPhone  newPhone  etc  That way you can form a composite key  in Competitors  from CompetitorName  Email  and Phone   Then when you have some competition results  you can truncate and reload your CompetitionResults table from your excel sheet or whatever you have  and run a single  efficient insert to insert all the new competitors into the Competitors table  and single  efficient update to update all the information about the existing competitors from the CompetitionResults  And you can do a single insert to insert new rows into the CompetitionCompetitors table  These things can be done in a ProcessCompetitionResults stored procedure  which could be executed after loading the CompetitionResults table   That s a sort of rudimentary description of what I ve seen done over and over in the real world with Oracle Applications  SAP  PeopleSoft  and a laundry list of other enterprise software suites   One last comment I d make is one I ve made before on SO  If you create a foreign key that insures that a Competitor exists in the Competitors table before you can add a row with that Competitor in it to CompetitionCompetitors  make sure that foreign key is set to cascade updates and deletes  That way if you need to delete a competitor  you can do it and all the rows associated with that competitor will get automatically deleted  Otherwise  by default  the foreign key will require you to delete all the related rows out of CompetitionCompetitors before it will let you delete a Competitor     Some people think non-cascading foreign keys are a good safety precaution  but my experience is that they re just a freaking pain in the butt that are more often than not simply a result of an oversight and they create a bunch of make work for DBA s  Dealing with people accidentally deleting stuff is why you have things like  are you sure  dialogs and various types of regular backups and redundant data sources  It s far  far more common to actually want to delete a competitor  whose data is all messed up for example  than it is to accidentally delete one and then go  Oh no  I didn t mean to do that  And now I don t have their competition results  Aaaahh   The latter is certainly common enough  so  you do need to be prepared for it  but the former is far more common  so the easiest and best way to prepare for the former  imo  is to just make foreign keys cascade updates and deletes

User · Answer

You will need to join the tables together and get a list of unique competitors that don t already exist in Competitors   This will insert unique records   INSERT Competitors  cName   SELECT DISTINCT Name FROM CompResults cr LEFT JOIN Competitors c ON cr Name   c cName WHERE c Name IS NULL   There may come a time when this insert needs to be done quickly without being able to wait for the selection of unique names  In that case  you could insert the unique names into a temporary table  and then use that temporary table to insert into your real table  This works well because all the processing happens at the time you are inserting into a temporary table  so it doesn t affect your real table  Then when you have all the processing finished  you do a quick insert into the real table  I might even wrap the last part  where you insert into the real table  inside a transaction

User · Answer

Additionally  if you have multiple columns to insert and want to check if they exists or not use the following code Insert Into  Competitors   cName  cCity  cState  Select cName  cCity  cState from        select new   from                select distinct cName  cCity  cState          from  Competitors  s   City  c   State  s       new     left join                   select distinct cName  cCity  cState          from  Competitors  s       existing     on new cName   existing cName and new City   existing City and new State   existing State     where existing Name is null  or existing City is null or existing State is null

User · Answer

Don t know why anyone else hasn t said this yet   NORMALISE   You ve got a table that models competitions  Competitions are made up of Competitors  You need a distinct list of Competitors in one or more Competitions        You should have the following tables       CREATE TABLE Competitor        CompetitorID  INT IDENTITY 1 1  PRIMARY KEY        CompetitorName  NVARCHAR 255         CREATE TABLE Competition        CompetitionID  INT IDENTITY 1 1  PRIMARY KEY        CompetitionName  NVARCHAR 255         CREATE TABLE CompetitionCompetitors        CompetitionID  INT        CompetitorID  INT        Score  INT        PRIMARY KEY            CompetitionID             CompetitorID                    With Constraints on CompetitionCompetitors CompetitionID and CompetitorID pointing at the other tables   With this kind of table structure -- your keys are all simple INTS -- there doesn t seem to be a good NATURAL KEY that would fit the model so I think a SURROGATE KEY is a good fit here   So if you had this then to get the the distinct list of competitors in a particular competition you can issue a query like this   DECLARE  CompetitionName VARCHAR 50  SET  CompetitionName    London Marathon       SELECT         p  CompetitorName  AS  CompetitorName      FROM         Competitor AS p     WHERE         EXISTS               SELECT 1             FROM                 CompetitionCompetitor AS cc                 JOIN Competition AS c ON c  ID    cc  CompetitionID              WHERE                 cc  CompetitorID    p  CompetitorID                  AND cc  CompetitionName     CompetitionNAme             And if you wanted the score for each competition a competitor is in   SELECT     p  CompetitorName        c  CompetitionName        cc  Score  FROM     Competitor AS p     JOIN CompetitionCompetitor AS cc ON cc  CompetitorID    p  CompetitorID      JOIN Competition AS c ON c  ID    cc  CompetitionID    And when you have a new competition with new competitors then you simply check which ones already exist in the Competitors table  If they already exist then you don t insert into Competitor for those Competitors and do insert for the new ones   Then you insert the new Competition in Competition and finally you just make all the links in CompetitionCompetitors

User · Answer

Ok  this was asked 7 years ago  but I think the best solution here is to forego the new table entirely and just do this as a custom view  That way you re not duplicating data  there s no worry about unique data  and it doesn t touch the actual database structure  Something like this   CREATE VIEW vw competitions   AS   SELECT    Id int    CompetitionName nvarchar 75     CompetitionType nvarchar 50     OtherField1 int    OtherField2 nvarchar 64   --add the fields you want viewed from the Competition table   FROM Competitions GO   Other items can be added here like joins on other tables  WHERE clauses  etc  This is most likely the most elegant solution to this problem  as you now can just query the view   SELECT   FROM vw competitions      and add any WHERE  IN  or EXISTS clauses to the view query

User · Answer

The answers above which talk about normalizing are great  But what if you find yourself in a position like me where you re not allowed to touch the database schema or structure as it stands  Eg  the DBA s are  gods  and all suggested revisions go to  dev null   In that respect  I feel like this has been answered with this Stack Overflow posting too in regards to all the users above giving code samples    I m reposting the code from INSERT VALUES WHERE NOT EXISTS which helped me the most since I can t alter any underlying database tables   INSERT INTO  table1  Id  guidd  TimeAdded  ExtraData  SELECT Id  guidd  TimeAdded  ExtraData FROM  table2 WHERE NOT EXISTS  Select Id  guidd From  table1 WHERE  table1 id    table2 id  ----------------------------------- MERGE  table1 as  Target  USING   select Id  guidd  TimeAdded  ExtraData from  table2  as  Source   id  guidd  TimeAdded  ExtraData      on  Target  id   Source  id WHEN NOT MATCHED THEN     INSERT  id  guidd  TimeAdded  ExtraData      VALUES   Source  id   Source  guidd   Source  TimeAdded   Source  ExtraData   ------------------------------ INSERT INTO  table1  id  guidd  TimeAdded  ExtraData  SELECT id  guidd  TimeAdded  ExtraData from  table2 EXCEPT SELECT id  guidd  TimeAdded  ExtraData from  table1 ------------------------------ INSERT INTO  table1  id  guidd  TimeAdded  ExtraData  SELECT  table2 id   table2 guidd   table2 TimeAdded   table2 ExtraData FROM  table2 LEFT JOIN  table1 on  table1 id    table2 id WHERE  table1 id is null   The above code uses different fields than what you have  but you get the general gist with the various techniques   Note that as per the original answer on Stack Overflow  this code was copied from here   Anyway my point is  best practice  often comes down to what you can and can t do as well as theory    If you re able to normalize and generate indexes keys -- great   If not and you have the resort to code hacks like me  hopefully the above helps    Good luck

User · Answer

Another option is to left join your Results table with your existing competitors Table and find the new competitors by filtering the distinct records that don  t match int the join   INSERT Competitors  cName  SELECT  DISTINCT cr Name FROM    CompResults cr left join         Competitors c on cr Name   c cName where   c cName is null   New syntax MERGE also offer a compact  elegant and efficient way to do that   MERGE INTO Competitors AS Target USING  SELECT DISTINCT Name FROM CompResults  AS Source ON Target Name   Source Name WHEN NOT MATCHED THEN     INSERT  Name  VALUES  Source Name

User · Answer

Semantically you are asking  insert Competitors where doesn t already exist    INSERT Competitors  cName  SELECT DISTINCT Name FROM CompResults cr WHERE    NOT EXISTS  SELECT   FROM Competitors c               WHERE cr Name   c cName

[sql-server] SQL Server insert if not exists best practice

Examples related to sql-server

Examples related to insert-update