What s the best practice for primary keys in tables

Question

When designing tables  I ve developed a habit of having one column that is unique and that I make the primary key   This is achieved in three ways depending on requirements   Identity integer column that auto increments  Unique identifier  GUID  A short character x  or integer  or other relatively small numeric type  column that can serve as a row identifier column  Number 3 would be used for fairly small lookup  mostly read tables that might have a unique static length string code  or a numeric value such as a year or other number  For the most part  all other tables will either have an auto-incrementing integer or unique identifier primary key  The Question  -  I have recently started working with databases that have no consistent row identifier and primary keys are currently clustered across various columns   Some examples   datetime character datetime integer datetime varchar char nvarchar nvarchar  Is there a valid case for this   I would have always defined an identity or unique identifier column for these cases  In addition there are many tables without primary keys at all   What are the valid reasons  if any  for this  I m trying to understand why tables were designed as they were  and it appears to be a big mess to me  but maybe there were good reasons for it  A third question to sort of help me decipher the answers  In cases where multiple columns are used to comprise the compound primary key  is there a specific advantage to this method vs  a surrogate artificial key   I m thinking mostly in regards to performance  maintenance  administration  etc

User · Answer

Besides all those good answers  I just want to share a good article I just read  The great primary-key debate   Just to quote a few points   The developer must apply a few rules when choosing a primary key for each table    The primary key must uniquely identify each record  A record   s primary-key value can   t be null  The primary key-value must exist when the record is created  The primary key must remain stable   you can   t change the primary-key field s   The primary key must be compact and contain the fewest possible attributes  The primary-key value can   t be changed    Natural keys  tend to  break the rules  Surrogate keys comply with the rules   You better read through that article  it is worth your time

User · Answer

All tables should have a primary key  Otherwise  what you have is a HEAP - this  in some situations  might be what you want  heavy insert load when the data is then replicated via a service broker to another database or table for instance     For lookup tables with a low volume of rows  you can use a 3 CHAR code as the primary key as this takes less room than an INT  but the performance difference is negligible  Other than that  I would always use an INT unless you have a reference table that perhaps has a composite primary key made up from foreign keys from associated tables

User · Answer

GUIDs can be used as a primary key  but you need to create the right type of GUID so that it performs well    You need to generate COMB GUIDs  A good article about it and performance statistics is  The Cost of GUIDs as Primary Keys   Also some code on building COMB GUIDs in SQL is in Uniqueidentifier vs identity archive

User · Answer

All tables should have a primary key  Otherwise  what you have is a HEAP - this  in some situations  might be what you want  heavy insert load when the data is then replicated via a service broker to another database or table for instance     For lookup tables with a low volume of rows  you can use a 3 CHAR code as the primary key as this takes less room than an INT  but the performance difference is negligible  Other than that  I would always use an INT unless you have a reference table that perhaps has a composite primary key made up from foreign keys from associated tables

User · Answer

I follow a few rules    Primary keys should be as small as necessary  Prefer a numeric type because numeric types are stored in a much more compact format than character formats  This is because most primary keys will be foreign keys in another table as well as used in multiple indexes  The smaller your key  the smaller the index  the less pages in the cache you will use  Primary keys should never change  Updating a primary key should always be out of the question  This is because it is most likely to be used in multiple indexes and used as a foreign key  Updating a single primary key could cause of ripple effect of changes  Do NOT use  your problem primary key  as your logic model primary key  For example passport number  social security number  or employee contract number as these  primary key  can change for real world situations       On surrogate vs natural key  I refer to the rules above  If the natural key is small and will never change it can be used as a primary key  If the natural key is large or likely to change I use surrogate keys  If there is no primary key I still make a surrogate key because experience shows you will always add tables to your schema and wish you d put a primary key in place

User · Answer

I avoid using natural keys for one simple reason -- human error  Although natural unique identifiers are often available  SSN  VIN  Account Number  etc    they require a human to enter them correctly  If you re using SSNs as a primary key  someone transposes a couple of numbers during data entry  and the error isn t discovered immediately  then you re faced with changing your primary key   My primary keys are all handled by the database program in the background and the user is never aware of them

User · Answer

A natural key  if available  is usually best   So  if datetime char uniquely identifies the row and both parts are meaningful to the row  that s great   If just the datetime is meaningful  and the char is just tacked on to make it unique  then you might as well just go with an identify field

User · Answer

I ll be up-front about my preference for natural keys - use them where possible  as they ll make your life of database administration a lot easier  I established a standard in our company that all tables have the following columns    Row ID  GUID  Creator  string  has a default of the current user s name  SUSER SNAME   in T-SQL   Created  DateTime  Timestamp   Row ID has a unique key on it per table  and in any case is auto-generated per row  and permissions prevent anyone editing it   and is reasonably guaranteed to be unique across all tables and databases  If any ORM systems need a single ID key  this is the one to use   Meanwhile  the actual PK is  if possible  a natural key  My internal rules are something like    People - use surrogate key  e g  INT  If it s internal  the Active Directory user GUID is an acceptable choice Lookup tables  e g  StatusCodes  - use a short CHAR code  it s easier to remember than INTs  and in many cases the paper forms and users will also use it for brevity  e g  Status    E  for  Expired    A  for  Approved    NADIS  for  No Asbestos Detected In Sample   Linking tables - combination of FKs  e g  EventId  AttendeeId    So ideally you end up with a natural  human-readable and memorable PK  and an ORM-friendly one-ID-per-table GUID   Caveat  the databases I maintain tend to the 100 000s of records rather than millions or billions  so if you have experience of larger systems which contraindicates my advice  feel free to ignore me

User · Answer

I too always use a numeric ID column  In oracle I use number 18 0  for no real reason above number 12 0   or whatever is an int rather than a long   maybe I just don t want to ever worry about getting a few billion rows in the db   I also include a created and modified column  type timestamp  for basic tracking  where it seems useful   I don t mind setting up unique constraints on other combinations of columns  but I really like my id  created  modified baseline requirements

User · Answer

I avoid using natural keys for one simple reason -- human error  Although natural unique identifiers are often available  SSN  VIN  Account Number  etc    they require a human to enter them correctly  If you re using SSNs as a primary key  someone transposes a couple of numbers during data entry  and the error isn t discovered immediately  then you re faced with changing your primary key   My primary keys are all handled by the database program in the background and the user is never aware of them

User · Answer

Just an extra comment on something that is often overlooked  Sometimes not using a surrogate key has benefits in the child tables  Let s say we have a design that allows you to run multiple companies within the one database  maybe it s a hosted solution  or whatever    Let s say we have these tables and columns   Company    CompanyId    primary key   CostCenter    CompanyId    primary key  foreign key to Company    CostCentre   primary key   CostElement   CompanyId    primary key  foreign key to Company    CostElement  primary key   Invoice    InvoiceId     primary key    CompanyId     primary key  in foreign key to CostCentre  in foreign key to CostElement    CostCentre    in foreign key to CostCentre    CostElement   in foreign key to CostElement    In case that last bit doesn t make sense  Invoice CompanyId is part of two foreign keys  one to the CostCentre table and one to the CostElement table  The primary key is  InvoiceId  CompanyId    In this model  it s not possible to screw-up and reference a CostElement from one company and a CostCentre from another company  If a surrogate key was used on the CostElement and CostCentre tables  it would be   The fewer chances to screw up  the better

User · Answer

I ll be up-front about my preference for natural keys - use them where possible  as they ll make your life of database administration a lot easier  I established a standard in our company that all tables have the following columns    Row ID  GUID  Creator  string  has a default of the current user s name  SUSER SNAME   in T-SQL   Created  DateTime  Timestamp   Row ID has a unique key on it per table  and in any case is auto-generated per row  and permissions prevent anyone editing it   and is reasonably guaranteed to be unique across all tables and databases  If any ORM systems need a single ID key  this is the one to use   Meanwhile  the actual PK is  if possible  a natural key  My internal rules are something like    People - use surrogate key  e g  INT  If it s internal  the Active Directory user GUID is an acceptable choice Lookup tables  e g  StatusCodes  - use a short CHAR code  it s easier to remember than INTs  and in many cases the paper forms and users will also use it for brevity  e g  Status    E  for  Expired    A  for  Approved    NADIS  for  No Asbestos Detected In Sample   Linking tables - combination of FKs  e g  EventId  AttendeeId    So ideally you end up with a natural  human-readable and memorable PK  and an ORM-friendly one-ID-per-table GUID   Caveat  the databases I maintain tend to the 100 000s of records rather than millions or billions  so if you have experience of larger systems which contraindicates my advice  feel free to ignore me

User · Answer

I too always use a numeric ID column  In oracle I use number 18 0  for no real reason above number 12 0   or whatever is an int rather than a long   maybe I just don t want to ever worry about getting a few billion rows in the db   I also include a created and modified column  type timestamp  for basic tracking  where it seems useful   I don t mind setting up unique constraints on other combinations of columns  but I really like my id  created  modified baseline requirements

User · Answer

I follow a few rules    Primary keys should be as small as necessary  Prefer a numeric type because numeric types are stored in a much more compact format than character formats  This is because most primary keys will be foreign keys in another table as well as used in multiple indexes  The smaller your key  the smaller the index  the less pages in the cache you will use  Primary keys should never change  Updating a primary key should always be out of the question  This is because it is most likely to be used in multiple indexes and used as a foreign key  Updating a single primary key could cause of ripple effect of changes  Do NOT use  your problem primary key  as your logic model primary key  For example passport number  social security number  or employee contract number as these  primary key  can change for real world situations       On surrogate vs natural key  I refer to the rules above  If the natural key is small and will never change it can be used as a primary key  If the natural key is large or likely to change I use surrogate keys  If there is no primary key I still make a surrogate key because experience shows you will always add tables to your schema and wish you d put a primary key in place

User · Answer

Natural versus artificial keys to me is a matter of how much of the business logic you want in your database  Social Security number  SSN  is a great example    Each client in my database will  and must  have an SSN   Bam  done  make it the primary key and be done with it  Just remember when your business rule changes you re burned   I don t like natural keys myself  due to my experience with changing business rules  But if your sure it won t change  it might prevent a few critical joins

User · Answer

I ll be up-front about my preference for natural keys - use them where possible  as they ll make your life of database administration a lot easier  I established a standard in our company that all tables have the following columns    Row ID  GUID  Creator  string  has a default of the current user s name  SUSER SNAME   in T-SQL   Created  DateTime  Timestamp   Row ID has a unique key on it per table  and in any case is auto-generated per row  and permissions prevent anyone editing it   and is reasonably guaranteed to be unique across all tables and databases  If any ORM systems need a single ID key  this is the one to use   Meanwhile  the actual PK is  if possible  a natural key  My internal rules are something like    People - use surrogate key  e g  INT  If it s internal  the Active Directory user GUID is an acceptable choice Lookup tables  e g  StatusCodes  - use a short CHAR code  it s easier to remember than INTs  and in many cases the paper forms and users will also use it for brevity  e g  Status    E  for  Expired    A  for  Approved    NADIS  for  No Asbestos Detected In Sample   Linking tables - combination of FKs  e g  EventId  AttendeeId    So ideally you end up with a natural  human-readable and memorable PK  and an ORM-friendly one-ID-per-table GUID   Caveat  the databases I maintain tend to the 100 000s of records rather than millions or billions  so if you have experience of larger systems which contraindicates my advice  feel free to ignore me

User · Answer

There  s no problem in making your primary key from various fields  that s a Natural Key   You can use a Identity column  associated with a unique index on the candidate fields  to make a Surrogate Key   That  s an old discussion  I prefer surrogate keys in most situations   But there  s no excuse for the lack of a key   RE  EDIT  Yeah  there  s a lot of controversy about that  D  I don  t see any obvious advantage on natural keys  besides the fact that they are the natural choice  You will always think in Name  SocialNumber - or something like that - instead of idPerson   Surrogate keys are the answer to some of the problems that natural keys have  propagating changes for example    As you get used to surrogates  it seems more clean  and manageable    But in the end  you  ll find out that it s just a matter of taste - or mindset -  People  think better  with natural keys  and others don  t

User · Answer

I suspect Steven A  Lowe s rolled up newspaper therapy is required for the designer of the original data structure   As an aside  GUIDs as a primary key can be a performance hog  I wouldn t recommend it

User · Answer

Just an extra comment on something that is often overlooked  Sometimes not using a surrogate key has benefits in the child tables  Let s say we have a design that allows you to run multiple companies within the one database  maybe it s a hosted solution  or whatever    Let s say we have these tables and columns   Company    CompanyId    primary key   CostCenter    CompanyId    primary key  foreign key to Company    CostCentre   primary key   CostElement   CompanyId    primary key  foreign key to Company    CostElement  primary key   Invoice    InvoiceId     primary key    CompanyId     primary key  in foreign key to CostCentre  in foreign key to CostElement    CostCentre    in foreign key to CostCentre    CostElement   in foreign key to CostElement    In case that last bit doesn t make sense  Invoice CompanyId is part of two foreign keys  one to the CostCentre table and one to the CostElement table  The primary key is  InvoiceId  CompanyId    In this model  it s not possible to screw-up and reference a CostElement from one company and a CostCentre from another company  If a surrogate key was used on the CostElement and CostCentre tables  it would be   The fewer chances to screw up  the better

User · Answer

I look for natural primary keys and use them where I can   If no natural keys can be found  I prefer a GUID to a INT   because SQL Server use trees  and it is bad to always add keys to the end in trees   On tables that are many-to-many couplings I use a compound primary key of the foreign keys   Because I m lucky enough to use SQL Server I can study execution plans and statistics with the profiler and the query analyzer and find out how my keys are performing very easily

User · Answer

I always use an autonumber or identity field   I worked for a client who had used SSN as a primary key and then because of HIPAA regulations was forced to change to a  MemberID  and it caused a ton of problems when updating the foreign keys in related tables   Sticking to a consistent standard of an identity column has helped me avoid a similar problem in all of my projects

User · Answer

I too always use a numeric ID column  In oracle I use number 18 0  for no real reason above number 12 0   or whatever is an int rather than a long   maybe I just don t want to ever worry about getting a few billion rows in the db   I also include a created and modified column  type timestamp  for basic tracking  where it seems useful   I don t mind setting up unique constraints on other combinations of columns  but I really like my id  created  modified baseline requirements

User · Answer

We do a lot of joins and composite primary keys have just become a performance hog   A simple int or long takes care of many problems even though you are introducing a second candidate key  but it s a lot easier and more understandable to join on one field versus three

User · Answer

GUIDs can be used as a primary key  but you need to create the right type of GUID so that it performs well    You need to generate COMB GUIDs  A good article about it and performance statistics is  The Cost of GUIDs as Primary Keys   Also some code on building COMB GUIDs in SQL is in Uniqueidentifier vs identity archive

User · Answer

Just an extra comment on something that is often overlooked  Sometimes not using a surrogate key has benefits in the child tables  Let s say we have a design that allows you to run multiple companies within the one database  maybe it s a hosted solution  or whatever    Let s say we have these tables and columns   Company    CompanyId    primary key   CostCenter    CompanyId    primary key  foreign key to Company    CostCentre   primary key   CostElement   CompanyId    primary key  foreign key to Company    CostElement  primary key   Invoice    InvoiceId     primary key    CompanyId     primary key  in foreign key to CostCentre  in foreign key to CostElement    CostCentre    in foreign key to CostCentre    CostElement   in foreign key to CostElement    In case that last bit doesn t make sense  Invoice CompanyId is part of two foreign keys  one to the CostCentre table and one to the CostElement table  The primary key is  InvoiceId  CompanyId    In this model  it s not possible to screw-up and reference a CostElement from one company and a CostCentre from another company  If a surrogate key was used on the CostElement and CostCentre tables  it would be   The fewer chances to screw up  the better

User · Answer

Tables should have a primary key all the time  When it doesn t it should have been an AutoIncrement fields   Sometime people omit primary key because they transfer a lot of data and it might slow down  depend of the database  the process  BUT  it should be added after it   Some one comment about link table  this is right  it s an exception BUT fields should be FK to keep the integrity  and is some case those fields can be primary keys too if duplicate in links is not authorized    but to keep in a simple form because exception is something often in programming  primary key should be present to keep the integrity of your data

User · Answer

What is special about the primary key   What is the purpose of a table in a schema   What is the purpose of a key of a table   What is special about the primary key   The discussions around primary keys seem to miss the point that the primary key is part of a table  and that table is part of a schema   What is best for the table and table relationships should drive the key that is used   Tables  and table relationships  contain facts about information you wish to record   These facts should be self-contained  meaningful  easily understood  and non-contradictory   From a design perspective  other tables added or removed from a schema should not impact on the table in question   There must be a purpose for storing the data related only to the information itself   Understanding what is stored in a table should not require undergoing a scientific research project   No fact stored for the same purpose should be stored more than once   Keys are a whole or part of the information being recorded which is unique  and the primary key is the specially designated key that is to be the primary access point to the table  i e  it should be chosen for data consistency and usage  not just insert performance     ASIDE  The unfortunately side effect of most databases being designed and developed by application programmers  which I am sometimes  is that what is best for the application or application framework often drives the primary key choice for tables   This leads to integer and GUID keys  as these are simple to use for application frameworks  and monolithic table designs  as these reduce the number of application framework objects needed to represent the data in memory    These application driven database design decisions lead to significant data consistency problems when used at scale   Application frameworks designed in this manner naturally lead to table at a time designs      Partial records    are created in tables and data filled in over time  Multi-table interaction is avoided or when used causes inconsistent data when the application functions improperly   These designs lead to data that is meaningless  or difficult to understand   data spread over tables  you have to look at other tables to make sense of the current table   and duplicated data    It was said that primary keys should be as small as necessary   I would says that keys should be only as large as necessary   Randomly adding meaningless fields to a table should be avoided   It is even worse to make a key out of a randomly added meaningless field  especially when it destroys the join dependency from another table to the non-primary key   This is only reasonable if there are no good candidate keys in the table  but this occurrence is surely a sign of a poor schema design if used for all tables   It was also said that primary keys should never change as updating a primary key should always be out of the question   But update is the same as delete followed by insert   By this logic  you should never delete a record from a table with one key and then add another record with a second key   Adding the surrogate primary key does not remove the fact that the other key in the table exists   Updating a non-primary key of a table can destroy the meaning of the data if other tables have a dependency on that meaning through a surrogate key  e g  a status table with a surrogate key having the status description changed from    Processed    to    Cancelled    would definitely corrupt the data    What should always be out of the question is destroying data meaning   Having said this  I am grateful for the many poorly designed databases that exist in businesses today  meaningless-surrogate-keyed-data-corrupted-1NF behemoths   because that means there is an endless amount of work for people that understand proper database design   But on the sad side  it does sometimes make me feel like Sisyphus  but I bet he had one heck of a 401k  before the crash    Stay away from blogs and websites for important database design questions   If you are designing databases  look up CJ Date   You can also reference Celko for SQL Server  but only if you hold your nose first   On the Oracle side  reference Tom Kyte

User · Answer

Tables should have a primary key all the time  When it doesn t it should have been an AutoIncrement fields   Sometime people omit primary key because they transfer a lot of data and it might slow down  depend of the database  the process  BUT  it should be added after it   Some one comment about link table  this is right  it s an exception BUT fields should be FK to keep the integrity  and is some case those fields can be primary keys too if duplicate in links is not authorized    but to keep in a simple form because exception is something often in programming  primary key should be present to keep the integrity of your data

User · Answer

I look for natural primary keys and use them where I can   If no natural keys can be found  I prefer a GUID to a INT   because SQL Server use trees  and it is bad to always add keys to the end in trees   On tables that are many-to-many couplings I use a compound primary key of the foreign keys   Because I m lucky enough to use SQL Server I can study execution plans and statistics with the profiler and the query analyzer and find out how my keys are performing very easily

User · Answer

You should use a  composite  or  compound  primary key that comprises of multiple fields   This is a perfectly acceptable solution  go here for more info

User · Answer

You should use a  composite  or  compound  primary key that comprises of multiple fields   This is a perfectly acceptable solution  go here for more info

User · Answer

You should use a  composite  or  compound  primary key that comprises of multiple fields   This is a perfectly acceptable solution  go here for more info

User · Answer

What is special about the primary key   What is the purpose of a table in a schema   What is the purpose of a key of a table   What is special about the primary key   The discussions around primary keys seem to miss the point that the primary key is part of a table  and that table is part of a schema   What is best for the table and table relationships should drive the key that is used   Tables  and table relationships  contain facts about information you wish to record   These facts should be self-contained  meaningful  easily understood  and non-contradictory   From a design perspective  other tables added or removed from a schema should not impact on the table in question   There must be a purpose for storing the data related only to the information itself   Understanding what is stored in a table should not require undergoing a scientific research project   No fact stored for the same purpose should be stored more than once   Keys are a whole or part of the information being recorded which is unique  and the primary key is the specially designated key that is to be the primary access point to the table  i e  it should be chosen for data consistency and usage  not just insert performance     ASIDE  The unfortunately side effect of most databases being designed and developed by application programmers  which I am sometimes  is that what is best for the application or application framework often drives the primary key choice for tables   This leads to integer and GUID keys  as these are simple to use for application frameworks  and monolithic table designs  as these reduce the number of application framework objects needed to represent the data in memory    These application driven database design decisions lead to significant data consistency problems when used at scale   Application frameworks designed in this manner naturally lead to table at a time designs      Partial records    are created in tables and data filled in over time  Multi-table interaction is avoided or when used causes inconsistent data when the application functions improperly   These designs lead to data that is meaningless  or difficult to understand   data spread over tables  you have to look at other tables to make sense of the current table   and duplicated data    It was said that primary keys should be as small as necessary   I would says that keys should be only as large as necessary   Randomly adding meaningless fields to a table should be avoided   It is even worse to make a key out of a randomly added meaningless field  especially when it destroys the join dependency from another table to the non-primary key   This is only reasonable if there are no good candidate keys in the table  but this occurrence is surely a sign of a poor schema design if used for all tables   It was also said that primary keys should never change as updating a primary key should always be out of the question   But update is the same as delete followed by insert   By this logic  you should never delete a record from a table with one key and then add another record with a second key   Adding the surrogate primary key does not remove the fact that the other key in the table exists   Updating a non-primary key of a table can destroy the meaning of the data if other tables have a dependency on that meaning through a surrogate key  e g  a status table with a surrogate key having the status description changed from    Processed    to    Cancelled    would definitely corrupt the data    What should always be out of the question is destroying data meaning   Having said this  I am grateful for the many poorly designed databases that exist in businesses today  meaningless-surrogate-keyed-data-corrupted-1NF behemoths   because that means there is an endless amount of work for people that understand proper database design   But on the sad side  it does sometimes make me feel like Sisyphus  but I bet he had one heck of a 401k  before the crash    Stay away from blogs and websites for important database design questions   If you are designing databases  look up CJ Date   You can also reference Celko for SQL Server  but only if you hold your nose first   On the Oracle side  reference Tom Kyte

User · Answer

A natural key  if available  is usually best   So  if datetime char uniquely identifies the row and both parts are meaningful to the row  that s great   If just the datetime is meaningful  and the char is just tacked on to make it unique  then you might as well just go with an identify field

User · Answer

Natural verses artifical keys is a kind of religious debate among the database community - see this article and others it links to   I m neither in favour of always having artifical keys  nor of never having them   I would decide on a case-by-case basis  for example   US States  I d go for state code   TX  for Texas etc    rather than state id 1 for Texas Employees  I d usually create an artifical employee id  because it s hard to find anything else that works   SSN or equivalent may work  but there could be issues like a new joiner who hasn t supplied his her SSN yet  Employee Salary History   employee id  start date    I would not create an artifical employee salary history id   What point would it serve  other than  quot foolish consistency quot    Wherever artificial keys are used  you should always also declare unique constraints on the natural keys   For example  use state id if you must  but then you d better declare a unique constraint on state code  otherwise you are sure to eventually end up with  state id    state code   state name 137         TX           Texas                              249         TX           Texas

User · Answer

All tables should have a primary key  Otherwise  what you have is a HEAP - this  in some situations  might be what you want  heavy insert load when the data is then replicated via a service broker to another database or table for instance     For lookup tables with a low volume of rows  you can use a 3 CHAR code as the primary key as this takes less room than an INT  but the performance difference is negligible  Other than that  I would always use an INT unless you have a reference table that perhaps has a composite primary key made up from foreign keys from associated tables

User · Answer

GUIDs can be used as a primary key  but you need to create the right type of GUID so that it performs well    You need to generate COMB GUIDs  A good article about it and performance statistics is  The Cost of GUIDs as Primary Keys   Also some code on building COMB GUIDs in SQL is in Uniqueidentifier vs identity archive

User · Answer

A natural key  if available  is usually best   So  if datetime char uniquely identifies the row and both parts are meaningful to the row  that s great   If just the datetime is meaningful  and the char is just tacked on to make it unique  then you might as well just go with an identify field

User · Answer

Natural versus artificial keys to me is a matter of how much of the business logic you want in your database  Social Security number  SSN  is a great example    Each client in my database will  and must  have an SSN   Bam  done  make it the primary key and be done with it  Just remember when your business rule changes you re burned   I don t like natural keys myself  due to my experience with changing business rules  But if your sure it won t change  it might prevent a few critical joins

User · Answer

All tables should have a primary key  Otherwise  what you have is a HEAP - this  in some situations  might be what you want  heavy insert load when the data is then replicated via a service broker to another database or table for instance     For lookup tables with a low volume of rows  you can use a 3 CHAR code as the primary key as this takes less room than an INT  but the performance difference is negligible  Other than that  I would always use an INT unless you have a reference table that perhaps has a composite primary key made up from foreign keys from associated tables

User · Answer

I always use an autonumber or identity field   I worked for a client who had used SSN as a primary key and then because of HIPAA regulations was forced to change to a  MemberID  and it caused a ton of problems when updating the foreign keys in related tables   Sticking to a consistent standard of an identity column has helped me avoid a similar problem in all of my projects

User · Answer

If you really want to read through all of the back and forth on this age-old debate  do a search for  natural key  on Stack Overflow  You should get back pages of results

User · Answer

I suspect Steven A  Lowe s rolled up newspaper therapy is required for the designer of the original data structure   As an aside  GUIDs as a primary key can be a performance hog  I wouldn t recommend it

User · Answer

I follow a few rules    Primary keys should be as small as necessary  Prefer a numeric type because numeric types are stored in a much more compact format than character formats  This is because most primary keys will be foreign keys in another table as well as used in multiple indexes  The smaller your key  the smaller the index  the less pages in the cache you will use  Primary keys should never change  Updating a primary key should always be out of the question  This is because it is most likely to be used in multiple indexes and used as a foreign key  Updating a single primary key could cause of ripple effect of changes  Do NOT use  your problem primary key  as your logic model primary key  For example passport number  social security number  or employee contract number as these  primary key  can change for real world situations       On surrogate vs natural key  I refer to the rules above  If the natural key is small and will never change it can be used as a primary key  If the natural key is large or likely to change I use surrogate keys  If there is no primary key I still make a surrogate key because experience shows you will always add tables to your schema and wish you d put a primary key in place

User · Answer

If you really want to read through all of the back and forth on this age-old debate  do a search for  natural key  on Stack Overflow  You should get back pages of results

User · Answer

We do a lot of joins and composite primary keys have just become a performance hog   A simple int or long takes care of many problems even though you are introducing a second candidate key  but it s a lot easier and more understandable to join on one field versus three

User · Answer

We do a lot of joins and composite primary keys have just become a performance hog   A simple int or long takes care of many problems even though you are introducing a second candidate key  but it s a lot easier and more understandable to join on one field versus three

User · Answer

I follow a few rules    Primary keys should be as small as necessary  Prefer a numeric type because numeric types are stored in a much more compact format than character formats  This is because most primary keys will be foreign keys in another table as well as used in multiple indexes  The smaller your key  the smaller the index  the less pages in the cache you will use  Primary keys should never change  Updating a primary key should always be out of the question  This is because it is most likely to be used in multiple indexes and used as a foreign key  Updating a single primary key could cause of ripple effect of changes  Do NOT use  your problem primary key  as your logic model primary key  For example passport number  social security number  or employee contract number as these  primary key  can change for real world situations       On surrogate vs natural key  I refer to the rules above  If the natural key is small and will never change it can be used as a primary key  If the natural key is large or likely to change I use surrogate keys  If there is no primary key I still make a surrogate key because experience shows you will always add tables to your schema and wish you d put a primary key in place

User · Answer

I look for natural primary keys and use them where I can   If no natural keys can be found  I prefer a GUID to a INT   because SQL Server use trees  and it is bad to always add keys to the end in trees   On tables that are many-to-many couplings I use a compound primary key of the foreign keys   Because I m lucky enough to use SQL Server I can study execution plans and statistics with the profiler and the query analyzer and find out how my keys are performing very easily

User · Answer

I always use an autonumber or identity field   I worked for a client who had used SSN as a primary key and then because of HIPAA regulations was forced to change to a  MemberID  and it caused a ton of problems when updating the foreign keys in related tables   Sticking to a consistent standard of an identity column has helped me avoid a similar problem in all of my projects

User · Answer

Natural verses artifical keys is a kind of religious debate among the database community - see this article and others it links to   I m neither in favour of always having artifical keys  nor of never having them   I would decide on a case-by-case basis  for example   US States  I d go for state code   TX  for Texas etc    rather than state id 1 for Texas Employees  I d usually create an artifical employee id  because it s hard to find anything else that works   SSN or equivalent may work  but there could be issues like a new joiner who hasn t supplied his her SSN yet  Employee Salary History   employee id  start date    I would not create an artifical employee salary history id   What point would it serve  other than  quot foolish consistency quot    Wherever artificial keys are used  you should always also declare unique constraints on the natural keys   For example  use state id if you must  but then you d better declare a unique constraint on state code  otherwise you are sure to eventually end up with  state id    state code   state name 137         TX           Texas                              249         TX           Texas

User · Answer

There  s no problem in making your primary key from various fields  that s a Natural Key   You can use a Identity column  associated with a unique index on the candidate fields  to make a Surrogate Key   That  s an old discussion  I prefer surrogate keys in most situations   But there  s no excuse for the lack of a key   RE  EDIT  Yeah  there  s a lot of controversy about that  D  I don  t see any obvious advantage on natural keys  besides the fact that they are the natural choice  You will always think in Name  SocialNumber - or something like that - instead of idPerson   Surrogate keys are the answer to some of the problems that natural keys have  propagating changes for example    As you get used to surrogates  it seems more clean  and manageable    But in the end  you  ll find out that it s just a matter of taste - or mindset -  People  think better  with natural keys  and others don  t

User · Answer

Natural versus artificial keys to me is a matter of how much of the business logic you want in your database  Social Security number  SSN  is a great example    Each client in my database will  and must  have an SSN   Bam  done  make it the primary key and be done with it  Just remember when your business rule changes you re burned   I don t like natural keys myself  due to my experience with changing business rules  But if your sure it won t change  it might prevent a few critical joins

User · Answer

I too always use a numeric ID column  In oracle I use number 18 0  for no real reason above number 12 0   or whatever is an int rather than a long   maybe I just don t want to ever worry about getting a few billion rows in the db   I also include a created and modified column  type timestamp  for basic tracking  where it seems useful   I don t mind setting up unique constraints on other combinations of columns  but I really like my id  created  modified baseline requirements

User · Answer

I suspect Steven A  Lowe s rolled up newspaper therapy is required for the designer of the original data structure   As an aside  GUIDs as a primary key can be a performance hog  I wouldn t recommend it

User · Answer

Besides all those good answers  I just want to share a good article I just read  The great primary-key debate   Just to quote a few points   The developer must apply a few rules when choosing a primary key for each table    The primary key must uniquely identify each record  A record   s primary-key value can   t be null  The primary key-value must exist when the record is created  The primary key must remain stable   you can   t change the primary-key field s   The primary key must be compact and contain the fewest possible attributes  The primary-key value can   t be changed    Natural keys  tend to  break the rules  Surrogate keys comply with the rules   You better read through that article  it is worth your time

User · Answer

If you really want to read through all of the back and forth on this age-old debate  do a search for  natural key  on Stack Overflow  You should get back pages of results

User · Answer

Here are my own rule of thumbs I have settled on after 25  years of development experience   All tables should have a single column primary key that auto increments  Include it in any view that is meant to be updateable The primary key should not have any meaning in the context of your application   This means that it should not be a SKU  or an account number or an employee id or any other information that is meaningful to your application   It is merely a unique key associated with an entity   The primary key is used by the database for optimization purposes and should not be used by your application for anything more than identifying a particular entity or relating to a particular entity  Always having a single value primary key makes performing UPSERTs very straightforward   Favor multiple indices on single columns over multi-column indices  For example  if you have a two column key  favor creating an index on each column over creating a two column index   If we create a multi-column key on firstname   lastname  we can t do indexed lookups on lastname without providing a firstname as well   Having indices on both columns allows the optimizer to perform indexed lookups on either or both columns regardless of how they are expressed in your WHERE clause   If your tables are massive  explore partitioning the table into segments based on the most prominent search criteria   If you have a table that has a significant number of Id fields in it  consider removing all except the primary key to a single table which has an id  PK   an org id  FK to original table  and an id type column   Create indices for all columns on the new table and relate it to the original table   In this manner  you can now perform indexed lookups of any number of ids using only a single index

User · Answer

There  s no problem in making your primary key from various fields  that s a Natural Key   You can use a Identity column  associated with a unique index on the candidate fields  to make a Surrogate Key   That  s an old discussion  I prefer surrogate keys in most situations   But there  s no excuse for the lack of a key   RE  EDIT  Yeah  there  s a lot of controversy about that  D  I don  t see any obvious advantage on natural keys  besides the fact that they are the natural choice  You will always think in Name  SocialNumber - or something like that - instead of idPerson   Surrogate keys are the answer to some of the problems that natural keys have  propagating changes for example    As you get used to surrogates  it seems more clean  and manageable    But in the end  you  ll find out that it s just a matter of taste - or mindset -  People  think better  with natural keys  and others don  t

User · Answer

I always use an autonumber or identity field   I worked for a client who had used SSN as a primary key and then because of HIPAA regulations was forced to change to a  MemberID  and it caused a ton of problems when updating the foreign keys in related tables   Sticking to a consistent standard of an identity column has helped me avoid a similar problem in all of my projects

User · Answer

Tables should have a primary key all the time  When it doesn t it should have been an AutoIncrement fields   Sometime people omit primary key because they transfer a lot of data and it might slow down  depend of the database  the process  BUT  it should be added after it   Some one comment about link table  this is right  it s an exception BUT fields should be FK to keep the integrity  and is some case those fields can be primary keys too if duplicate in links is not authorized    but to keep in a simple form because exception is something often in programming  primary key should be present to keep the integrity of your data

User · Answer

We do a lot of joins and composite primary keys have just become a performance hog   A simple int or long takes care of many problems even though you are introducing a second candidate key  but it s a lot easier and more understandable to join on one field versus three

User · Answer

Natural versus artificial keys to me is a matter of how much of the business logic you want in your database  Social Security number  SSN  is a great example    Each client in my database will  and must  have an SSN   Bam  done  make it the primary key and be done with it  Just remember when your business rule changes you re burned   I don t like natural keys myself  due to my experience with changing business rules  But if your sure it won t change  it might prevent a few critical joins

User · Answer

A natural key  if available  is usually best   So  if datetime char uniquely identifies the row and both parts are meaningful to the row  that s great   If just the datetime is meaningful  and the char is just tacked on to make it unique  then you might as well just go with an identify field

User · Answer

I look for natural primary keys and use them where I can   If no natural keys can be found  I prefer a GUID to a INT   because SQL Server use trees  and it is bad to always add keys to the end in trees   On tables that are many-to-many couplings I use a compound primary key of the foreign keys   Because I m lucky enough to use SQL Server I can study execution plans and statistics with the profiler and the query analyzer and find out how my keys are performing very easily

User · Answer

Here are my own rule of thumbs I have settled on after 25  years of development experience   All tables should have a single column primary key that auto increments  Include it in any view that is meant to be updateable The primary key should not have any meaning in the context of your application   This means that it should not be a SKU  or an account number or an employee id or any other information that is meaningful to your application   It is merely a unique key associated with an entity   The primary key is used by the database for optimization purposes and should not be used by your application for anything more than identifying a particular entity or relating to a particular entity  Always having a single value primary key makes performing UPSERTs very straightforward   Favor multiple indices on single columns over multi-column indices  For example  if you have a two column key  favor creating an index on each column over creating a two column index   If we create a multi-column key on firstname   lastname  we can t do indexed lookups on lastname without providing a firstname as well   Having indices on both columns allows the optimizer to perform indexed lookups on either or both columns regardless of how they are expressed in your WHERE clause   If your tables are massive  explore partitioning the table into segments based on the most prominent search criteria   If you have a table that has a significant number of Id fields in it  consider removing all except the primary key to a single table which has an id  PK   an org id  FK to original table  and an id type column   Create indices for all columns on the new table and relate it to the original table   In this manner  you can now perform indexed lookups of any number of ids using only a single index

User · Answer

Natural verses artifical keys is a kind of religious debate among the database community - see this article and others it links to   I m neither in favour of always having artifical keys  nor of never having them   I would decide on a case-by-case basis  for example   US States  I d go for state code   TX  for Texas etc    rather than state id 1 for Texas Employees  I d usually create an artifical employee id  because it s hard to find anything else that works   SSN or equivalent may work  but there could be issues like a new joiner who hasn t supplied his her SSN yet  Employee Salary History   employee id  start date    I would not create an artifical employee salary history id   What point would it serve  other than  quot foolish consistency quot    Wherever artificial keys are used  you should always also declare unique constraints on the natural keys   For example  use state id if you must  but then you d better declare a unique constraint on state code  otherwise you are sure to eventually end up with  state id    state code   state name 137         TX           Texas                              249         TX           Texas

User · Answer

Tables should have a primary key all the time  When it doesn t it should have been an AutoIncrement fields   Sometime people omit primary key because they transfer a lot of data and it might slow down  depend of the database  the process  BUT  it should be added after it   Some one comment about link table  this is right  it s an exception BUT fields should be FK to keep the integrity  and is some case those fields can be primary keys too if duplicate in links is not authorized    but to keep in a simple form because exception is something often in programming  primary key should be present to keep the integrity of your data

User · Answer

I suspect Steven A  Lowe s rolled up newspaper therapy is required for the designer of the original data structure   As an aside  GUIDs as a primary key can be a performance hog  I wouldn t recommend it

User · Answer

GUIDs can be used as a primary key  but you need to create the right type of GUID so that it performs well    You need to generate COMB GUIDs  A good article about it and performance statistics is  The Cost of GUIDs as Primary Keys   Also some code on building COMB GUIDs in SQL is in Uniqueidentifier vs identity archive

User · Answer

You should use a  composite  or  compound  primary key that comprises of multiple fields   This is a perfectly acceptable solution  go here for more info

User · Answer

There  s no problem in making your primary key from various fields  that s a Natural Key   You can use a Identity column  associated with a unique index on the candidate fields  to make a Surrogate Key   That  s an old discussion  I prefer surrogate keys in most situations   But there  s no excuse for the lack of a key   RE  EDIT  Yeah  there  s a lot of controversy about that  D  I don  t see any obvious advantage on natural keys  besides the fact that they are the natural choice  You will always think in Name  SocialNumber - or something like that - instead of idPerson   Surrogate keys are the answer to some of the problems that natural keys have  propagating changes for example    As you get used to surrogates  it seems more clean  and manageable    But in the end  you  ll find out that it s just a matter of taste - or mindset -  People  think better  with natural keys  and others don  t

User · Answer

Just an extra comment on something that is often overlooked  Sometimes not using a surrogate key has benefits in the child tables  Let s say we have a design that allows you to run multiple companies within the one database  maybe it s a hosted solution  or whatever    Let s say we have these tables and columns   Company    CompanyId    primary key   CostCenter    CompanyId    primary key  foreign key to Company    CostCentre   primary key   CostElement   CompanyId    primary key  foreign key to Company    CostElement  primary key   Invoice    InvoiceId     primary key    CompanyId     primary key  in foreign key to CostCentre  in foreign key to CostElement    CostCentre    in foreign key to CostCentre    CostElement   in foreign key to CostElement    In case that last bit doesn t make sense  Invoice CompanyId is part of two foreign keys  one to the CostCentre table and one to the CostElement table  The primary key is  InvoiceId  CompanyId    In this model  it s not possible to screw-up and reference a CostElement from one company and a CostCentre from another company  If a surrogate key was used on the CostElement and CostCentre tables  it would be   The fewer chances to screw up  the better

User · Answer

If you really want to read through all of the back and forth on this age-old debate  do a search for  natural key  on Stack Overflow  You should get back pages of results

User · Answer

Natural verses artifical keys is a kind of religious debate among the database community - see this article and others it links to   I m neither in favour of always having artifical keys  nor of never having them   I would decide on a case-by-case basis  for example   US States  I d go for state code   TX  for Texas etc    rather than state id 1 for Texas Employees  I d usually create an artifical employee id  because it s hard to find anything else that works   SSN or equivalent may work  but there could be issues like a new joiner who hasn t supplied his her SSN yet  Employee Salary History   employee id  start date    I would not create an artifical employee salary history id   What point would it serve  other than  quot foolish consistency quot    Wherever artificial keys are used  you should always also declare unique constraints on the natural keys   For example  use state id if you must  but then you d better declare a unique constraint on state code  otherwise you are sure to eventually end up with  state id    state code   state name 137         TX           Texas                              249         TX           Texas

User · Answer

I ll be up-front about my preference for natural keys - use them where possible  as they ll make your life of database administration a lot easier  I established a standard in our company that all tables have the following columns    Row ID  GUID  Creator  string  has a default of the current user s name  SUSER SNAME   in T-SQL   Created  DateTime  Timestamp   Row ID has a unique key on it per table  and in any case is auto-generated per row  and permissions prevent anyone editing it   and is reasonably guaranteed to be unique across all tables and databases  If any ORM systems need a single ID key  this is the one to use   Meanwhile  the actual PK is  if possible  a natural key  My internal rules are something like    People - use surrogate key  e g  INT  If it s internal  the Active Directory user GUID is an acceptable choice Lookup tables  e g  StatusCodes  - use a short CHAR code  it s easier to remember than INTs  and in many cases the paper forms and users will also use it for brevity  e g  Status    E  for  Expired    A  for  Approved    NADIS  for  No Asbestos Detected In Sample   Linking tables - combination of FKs  e g  EventId  AttendeeId    So ideally you end up with a natural  human-readable and memorable PK  and an ORM-friendly one-ID-per-table GUID   Caveat  the databases I maintain tend to the 100 000s of records rather than millions or billions  so if you have experience of larger systems which contraindicates my advice  feel free to ignore me

[sql] What's the best practice for primary keys in tables?

Examples related to sql

Examples related to sql-server

Examples related to database

Examples related to relational