Strings as Primary Keys in SQL Database

Question

I am not very familiar with databases and the theories behind how they work   Is it any slower from a performance standpoint  inserting updating querying  to use Strings for Primary Keys than integers

User · Answer

Yes  but unless you expect to have millions of rows  not using a string-based key because it s slower is usually  premature optimization   After all  strings are stored as big numbers while numeric keys are usually stored as smaller numbers   One thing to watch out for  though  is if you have clustered indices on a any key and are doing large numbers of inserts that are non-sequential in the index  Every line written will cause the index to re-write  if you re doing batch inserts  this can really slow the process down

User · Answer

What is your reason for having a string as a primary key   I would just set the primary key to an auto incrementing integer field  and put an index on the string field   That way if you do searches on the table they should be relatively fast  and all of your joins and normal look ups will be unaffected in their speed   You can also control the amount of the string field that gets indexed   In other words  you can say  only index the first 5 characters  if you think that will be enough   Or if your data can be relatively similar  you can index the whole field

User · Answer

Another issue with using Strings as a primary key is that because the index is constantly put into sequential order  when a new key is created that would be in the middle of the order the index has to be resequenced    if you use an auto number integer  the new key is just added to the end of the index

User · Answer

From performance standpoint - Yes string PK  will slow down the performance when compared to performance achieved using an integer PK   where PK ---  Primary Key   From requirement standpoint - Although this is not a part of your question still I would like to mention  When we are handling huge data across different tables we generally look for the probable set of keys that can be set for a particular table  This is primarily because there are many tables and mostly each or some table would be related to the other through some relation   a concept of Foreign Key    Therefore we really cannot always choose an integer as a Primary Key  rather we go for a combination of 3  4 or 5 attributes as the primary key for that tables  And those keys can be used as a foreign key when we would relate the records with some other table  This makes it useful to relate the records across different tables when required   Therefore for Optimal Usage - We always make a combination of 1 or 2 integers with 1 or 2 string attributes  but again only if it is required

User · Answer

Technically yes  but if a string makes sense to be the primary key then you should probably use it   This all depends on the size of the table you re making it for and the length of the string that is going to be the primary key  longer strings    harder to compare   I wouldn t necessarily use a string for a table that has millions of rows  but the amount of performance slowdown you ll get by using a string on smaller tables will be minuscule to the headaches that you can have by having an integer that doesn t mean anything in relation to the data

User · Answer

By default ASPNetUserIds are 128 char strings and performance is just fine   If the key HAS to be unique in the table it should be the Key  Here s why    primary string key   Correct DB relationships  1 string key The primary   and 1 string Index The Primary     The other option is a typical int Key  but if the string HAS to be unique you ll still probably need to add an index because of non-stop queries to validate or check that its unique   So using an int identity key   Incorrect DB Relationships  1 int key Primary   1 int index Primary   Probably a unique string Index  and manually having to validate the same string doesn t exist something like a sql check maybe    To get better performance using an int over a string for the primary key  when the string HAS to be unique  it would have to be a very odd situation  I ve always preferred to use string keys  And as a good rule of thumb  don t denormalize a database until you NEED to

User · Answer

I would probably use an integer as your primary key  and then just have your string  I assume it s some sort of ID  as a separate column  create table sample     sample pk             INT NOT NULL AUTO INCREMENT    sample id             VARCHAR 100  NOT NULL          PRIMARY KEY sample pk      You can always do queries and joins conditionally on the string  ID  column  where sample id

User · Answer

Don t worry about performance until you have got a simple and sound design that agrees with the subject matter that the data describes and fits well with the intended use of the data   Then  if performance problems emerge   you can deal with them by tweaking the system   In this case  it s almost always better to go with a string as a natural primary key   provide you can trust it   Don t worry if it s a string  as long as the string is reasonably short   say about 25 characters max  You won t pay a big price in terms of performance   Do the data entry people or automatic data sources always provide a value for the supposed natural key  or is sometimes omitted   Is it occasionally wrong in the input data   If so  how are errors detected and corrected     Are the programmers and interactive users who specify queries able to use the natural key to get what they want     If you can t trust the natural key  invent a surrogate   If you invent a surrogate  you might as well invent an integer   Then you have to worry about whther to conceal the surrogate from the user community  Some developers who didn t conceal the surrogate key came to regret it

User · Answer

Two reasons to use integers for PK columns    We can set identity for integer field which incremented automatically  When we create PKs  the db creates an index  Cluster or Non Cluster  which sorts the data before it s stored in the table  By using an identity on a PK  the optimizer need not check the sort order before saving a record  This improves performance on big tables

User · Answer

It doesn t matter what you use as a primary key so long as it is UNIQUE  If you care about speed or good database design use the int unless you plan on replicating data  then use a GUID   If this is an access database or some tiny app then who really cares  I think the reason why most of us developers slap the old int or guid at the front is because projects have a way of growing on us  and you want to leave yourself the option to grow

User · Answer

Too many variables   It depends on the size of the table  the indexes  nature of the string key domain     Generally  integers will be faster   But will the difference be large enough to care  It s hard to say   Also  what is your motivation for choosing strings   Numeric auto-increment keys are often so much easier as well   Is it semantics   Convenience   Replication disconnected concerns   Your answer here could limit your options   This also brings to mind a third  hybrid  option you re forgetting  Guids

User · Answer

Strings are slower in joins and in real life they are very rarely really unique  even when they are supposed to be   The only advantage is that they can reduce the number of joins if you are joining to the primary table only to get the name  However  strings are also often subject to change thus creating the problem of having to fix all related records when the company name changes or the person gets married  This can be a huge performance hit and if all tables that should be related somehow are not related  this happens more often than you think   then you might have data mismatches as well   An integer that will never change through the life of the record is a far safer choice from a data integrity standpoint as well as from a performance standpoint  Natural keys are usually not so good for maintenance of the data   I also want to point out that the best of both worlds is often to use an autoincrementing key  or in some specialized cases  a GUID  as the PK and then  put a unique index on the natural key  You get the faster joins  you don t get duplicate records  and you don t have to update a million child records because a company name changed

User · Answer

There could be a very big misunderstanding related to string in the database are  Almost everyone has thought that database representation of numbers are more compact than for strings  They think that in db-s numbers are represented as in the memory  BUT it is not true  In most cases number representation is more close to A string like representation as to other    The speed of using number or string is more dependent on the indexing then the type itself

User · Answer

Indices imply lots of comparisons   Typically  strings are longer than integers and collation rules may be applied for comparison  so comparing strings is usually more computationally intensive task than comparing integers    Sometimes  though  it s faster to use a string as a primary key than to make an extra join with a string to numerical id table

User · Answer

Inserts to a table having a clustered index where the insertion occurs in the middle of the sequence DOES NOT cause the index to be rewritten  It does not cause the pages comprising the data to be rewritten  If there is room on the page where the row will go  then it is placed in that page  The single page will be reformatted to place the row in the right place in the page  When the page is full  a page split will happen  with half of the rows on the page going to one page  and half going on the other  The pages are then relinked into the linked list of pages that comprise a tables data that has the clustered index  At most  you will end up writing 2 pages of database

[sql] Strings as Primary Keys in SQL Database

Examples related to sql

Examples related to database

Examples related to database-design

Examples related to string

Examples related to primary-key