Is there a REAL performance difference between INT and VARCHAR primary keys

Question

Is there a measurable performance difference between using INT vs  VARCHAR as a primary key in MySQL  I d like to use VARCHAR as the primary key for reference lists  think US States  Country Codes  and a coworker won t budge on the INT AUTO INCREMENT as a primary key for all tables    My argument  as detailed here  is that the performance difference between INT and VARCHAR is negligible  since every INT foreign key reference will require a JOIN to make sense of the reference  a VARCHAR key will directly present the information   So  does anyone have experience with this particular use-case and the performance concerns associated with it

User · Answer

It s not about performance  It s about what makes a good primary key  Unique and unchanging over time  You may think an entity such as a country code never changes over time and would be a good candidate for a primary key  But bitter experience is that is seldom so   INT AUTO INCREMENT meets the  unique and unchanging over time  condition  Hence the preference

User · Answer

Depends on the length   If the varchar will be 20 characters  and the int is 4  then if you use an int  your index will have FIVE times as many nodes per page of index space on disk    That means that traversing the index will require one fifth as many physical and or logical reads     So  if performance is an issue  given the opportunity  always use an integral non-meaningful key  called a surrogate  for your tables  and for Foreign Keys that reference the rows in these tables      At the same time  to guarantee data consistency  every table where it matters should also have a meaningful non-numeric alternate key   or unique Index  to ensure that duplicate rows cannot be inserted  duplicate based on meaningful table attributes      For the specific use you are talking about  like state lookups   it really doesn t matter because the size of the table is so small   In general there is no impact on performance from indices on tables with less than a few thousand rows

User · Answer

Common cases where a surrogate AUTO INCREMENT hurts   A common schema pattern is a many-to-many mapping   CREATE TABLE map       id     AUTO INCREMENT      foo id          bar id          PRIMARY KEY id       UNIQUE foo id  bar id       INDEX bar id       Performance of this pattern is much better  especially when using InnoDB   CREATE TABLE map         No surrogate     foo id          bar id          PRIMARY KEY foo id  bar id       INDEX       bar id  foo id       Why    InnoDB secondary keys need an extra lookup  by moving the pair into the PK  that is avoided for one direction  The secondary index is  covering   so it does not need the extra lookup  This table is smaller because of getting rid of id and one index    Another case  country    country id INT     -- versus country code CHAR 2  CHARACTER SET ascii   All too often the novice normalizes country code into a 4-byte INT instead of using a  natural  2-byte  nearly-unchanging 2-byte string   Faster  smaller  fewer JOINs  more readable

User · Answer

Not sure about the performance implications  but it seems a possible compromise  at least during development  would be to include both the auto-incremented  integer  surrogate  key  as well as your intended  unique   natural  key   This would give you the opportunity to evaluate performance  as well as other possible issues  including the changeability of natural keys

User · Answer

It s not about performance  It s about what makes a good primary key  Unique and unchanging over time  You may think an entity such as a country code never changes over time and would be a good candidate for a primary key  But bitter experience is that is seldom so   INT AUTO INCREMENT meets the  unique and unchanging over time  condition  Hence the preference

User · Answer

Allow me to say yes there is definitely a difference  taking into consideration the scope of performance  Out of the box definition    1- Using surrogate int is faster in application because you do not need to use ToUpper    ToLower    ToUpperInvarient    or ToLowerInvarient   in your code or in your query and these 4 functions have different performance benchmarks  See Microsoft performance rules on this   performance of application   2- Using surrogate int guarantees not changing the key over time  Even country codes may change  see Wikipedia how ISO codes changed over time  That would take lots of time to change the primary key for subtrees   performance of data maintenance   3- It seems there are issues with ORM solutions  such as NHibernate when PK FK is not int   developer performance

User · Answer

Not sure about the performance implications  but it seems a possible compromise  at least during development  would be to include both the auto-incremented  integer  surrogate  key  as well as your intended  unique   natural  key   This would give you the opportunity to evaluate performance  as well as other possible issues  including the changeability of natural keys

User · Answer

At HauteLook  we changed many of our tables to use natural keys  We did experience a real-world increase in performance  As you mention  many of our queries now use less joins which makes the queries more performant  We will even use a composite primary key if it makes sense  That being said  some tables are just easier to work with if they have a surrogate key   Also  if you are letting people write interfaces to your database  a surrogate key can be helpful  The 3rd party can rely on the fact that the surrogate key will change only in very rare circumstances

User · Answer

You make a good point that you can avoid some number of joined queries by using what s called a natural key instead of a surrogate key   Only you can assess if the benefit of this is significant in your application     That is  you can measure the queries in your application that are the most important to be speedy  because they work with large volumes of data or they are executed very frequently   If these queries benefit from eliminating a join  and do not suffer by using a varchar primary key  then do it   Don t use either strategy for all tables in your database   It s likely that in some cases  a natural key is better  but in other cases a surrogate key is better     Other folks make a good point that it s rare in practice for a natural key to never change or have duplicates  so surrogate keys are usually worthwhile

User · Answer

Absolutely not   I have done several    several    performance checks between INT  VARCHAR  and CHAR   10 million record table with a PRIMARY KEY  unique and clustered  had the exact same speed and performance  and subtree cost  no matter which of the three I used   That being said    use whatever is best for your application  Don t worry about the performance

User · Answer

For short codes  there s probably no difference   This is especially true as the table holding these codes are likely to be very small  a couple thousand rows at most  and not change often  when is the last time we added a new US State    For larger tables with a wider variation among the key  this can be dangerous   Think about using e-mail address user name from a User table  for example   What happens when you have a few million users and some of those users have long names or e-mail addresses    Now any time you need to join this table using that key it becomes much more expensive

User · Answer

Absolutely not   I have done several    several    performance checks between INT  VARCHAR  and CHAR   10 million record table with a PRIMARY KEY  unique and clustered  had the exact same speed and performance  and subtree cost  no matter which of the three I used   That being said    use whatever is best for your application  Don t worry about the performance

User · Answer

Depends on the length   If the varchar will be 20 characters  and the int is 4  then if you use an int  your index will have FIVE times as many nodes per page of index space on disk    That means that traversing the index will require one fifth as many physical and or logical reads     So  if performance is an issue  given the opportunity  always use an integral non-meaningful key  called a surrogate  for your tables  and for Foreign Keys that reference the rows in these tables      At the same time  to guarantee data consistency  every table where it matters should also have a meaningful non-numeric alternate key   or unique Index  to ensure that duplicate rows cannot be inserted  duplicate based on meaningful table attributes      For the specific use you are talking about  like state lookups   it really doesn t matter because the size of the table is so small   In general there is no impact on performance from indices on tables with less than a few thousand rows

User · Answer

You make a good point that you can avoid some number of joined queries by using what s called a natural key instead of a surrogate key   Only you can assess if the benefit of this is significant in your application     That is  you can measure the queries in your application that are the most important to be speedy  because they work with large volumes of data or they are executed very frequently   If these queries benefit from eliminating a join  and do not suffer by using a varchar primary key  then do it   Don t use either strategy for all tables in your database   It s likely that in some cases  a natural key is better  but in other cases a surrogate key is better     Other folks make a good point that it s rare in practice for a natural key to never change or have duplicates  so surrogate keys are usually worthwhile

User · Answer

For short codes  there s probably no difference   This is especially true as the table holding these codes are likely to be very small  a couple thousand rows at most  and not change often  when is the last time we added a new US State    For larger tables with a wider variation among the key  this can be dangerous   Think about using e-mail address user name from a User table  for example   What happens when you have a few million users and some of those users have long names or e-mail addresses    Now any time you need to join this table using that key it becomes much more expensive

User · Answer

As usual  there are no blanket answers   It depends   and I am not being facetious  My understanding of the original question was for keys on small tables - like Country  integer id or char varchar code  being a foreign key to a potentially huge table like address contact table   There are two scenarios here when you want data back from the DB  First is a list search kind of query where you want to list all the contacts with state and country codes or names  ids will not help and hence will need a lookup   The other is a get scenario on primary key which shows a single contact record where the name of the state  country needs to be shown   For the latter get  it probably does not matter what the FK is based on since we are bringing together tables for a single record or a few records and on key reads  The former  search or list  scenario may be impacted by our choice  Since it is required to show country  at least a recognizable code and perhaps even the search itself includes a country code   not having to join another table through a surrogate key can potentially  I am just being cautious here because I have not actually tested this  but seems highly probable  improve performance  notwithstanding the fact that it certainly helps with the search   As codes are small in size - not more than 3 chars usually for country and state  it may be okay to use the natural keys as foreign keys in this scenario   The other scenario where keys are dependent on longer varchar values and perhaps on larger tables  the surrogate key probably has the advantage

User · Answer

The question is about MySQL so I say there is a significant difference  If it was about Oracle  which stores numbers as string - yes  I couldn t believe it at first  then not much difference   Storage in the table is not the issue but updating and referring to the index is  Queries involving looking up a record based on its primary key are frequent - you want them to occur as fast as possible because they happen so often   The thing is a CPU deals with 4 byte and 8 byte integers naturally  in silicon  It s REALLY fast for it to compare two integers - it happens in one or two clock cycles    Now look at a string - it s made up of lots of characters  more than one byte per character these days   Comparing two strings for precedence can t be done in one or two cycles  Instead the strings  characters must be iterated until a difference is found  I m sure there are tricks to make it faster in some databases but that s irrelevant here because an int comparison is done naturally and lightning fast in silicon by the CPU   My general rule - every primary key should be an autoincrementing INT especially in OO apps using an ORM  Hibernate  Datanucleus  whatever  where there s lots of relationships between objects - they ll usually always be implemented as a simple FK and the ability for the DB to resolve those fast is important to your app  s responsiveness

User · Answer

Depends on the length   If the varchar will be 20 characters  and the int is 4  then if you use an int  your index will have FIVE times as many nodes per page of index space on disk    That means that traversing the index will require one fifth as many physical and or logical reads     So  if performance is an issue  given the opportunity  always use an integral non-meaningful key  called a surrogate  for your tables  and for Foreign Keys that reference the rows in these tables      At the same time  to guarantee data consistency  every table where it matters should also have a meaningful non-numeric alternate key   or unique Index  to ensure that duplicate rows cannot be inserted  duplicate based on meaningful table attributes      For the specific use you are talking about  like state lookups   it really doesn t matter because the size of the table is so small   In general there is no impact on performance from indices on tables with less than a few thousand rows

User · Answer

The question is about MySQL so I say there is a significant difference  If it was about Oracle  which stores numbers as string - yes  I couldn t believe it at first  then not much difference   Storage in the table is not the issue but updating and referring to the index is  Queries involving looking up a record based on its primary key are frequent - you want them to occur as fast as possible because they happen so often   The thing is a CPU deals with 4 byte and 8 byte integers naturally  in silicon  It s REALLY fast for it to compare two integers - it happens in one or two clock cycles    Now look at a string - it s made up of lots of characters  more than one byte per character these days   Comparing two strings for precedence can t be done in one or two cycles  Instead the strings  characters must be iterated until a difference is found  I m sure there are tricks to make it faster in some databases but that s irrelevant here because an int comparison is done naturally and lightning fast in silicon by the CPU   My general rule - every primary key should be an autoincrementing INT especially in OO apps using an ORM  Hibernate  Datanucleus  whatever  where there s lots of relationships between objects - they ll usually always be implemented as a simple FK and the ability for the DB to resolve those fast is important to your app  s responsiveness

User · Answer

As usual  there are no blanket answers   It depends   and I am not being facetious  My understanding of the original question was for keys on small tables - like Country  integer id or char varchar code  being a foreign key to a potentially huge table like address contact table   There are two scenarios here when you want data back from the DB  First is a list search kind of query where you want to list all the contacts with state and country codes or names  ids will not help and hence will need a lookup   The other is a get scenario on primary key which shows a single contact record where the name of the state  country needs to be shown   For the latter get  it probably does not matter what the FK is based on since we are bringing together tables for a single record or a few records and on key reads  The former  search or list  scenario may be impacted by our choice  Since it is required to show country  at least a recognizable code and perhaps even the search itself includes a country code   not having to join another table through a surrogate key can potentially  I am just being cautious here because I have not actually tested this  but seems highly probable  improve performance  notwithstanding the fact that it certainly helps with the search   As codes are small in size - not more than 3 chars usually for country and state  it may be okay to use the natural keys as foreign keys in this scenario   The other scenario where keys are dependent on longer varchar values and perhaps on larger tables  the surrogate key probably has the advantage

User · Answer

It s not about performance  It s about what makes a good primary key  Unique and unchanging over time  You may think an entity such as a country code never changes over time and would be a good candidate for a primary key  But bitter experience is that is seldom so   INT AUTO INCREMENT meets the  unique and unchanging over time  condition  Hence the preference

User · Answer

It s not about performance  It s about what makes a good primary key  Unique and unchanging over time  You may think an entity such as a country code never changes over time and would be a good candidate for a primary key  But bitter experience is that is seldom so   INT AUTO INCREMENT meets the  unique and unchanging over time  condition  Hence the preference

User · Answer

Not sure about the performance implications  but it seems a possible compromise  at least during development  would be to include both the auto-incremented  integer  surrogate  key  as well as your intended  unique   natural  key   This would give you the opportunity to evaluate performance  as well as other possible issues  including the changeability of natural keys

User · Answer

Common cases where a surrogate AUTO INCREMENT hurts   A common schema pattern is a many-to-many mapping   CREATE TABLE map       id     AUTO INCREMENT      foo id          bar id          PRIMARY KEY id       UNIQUE foo id  bar id       INDEX bar id       Performance of this pattern is much better  especially when using InnoDB   CREATE TABLE map         No surrogate     foo id          bar id          PRIMARY KEY foo id  bar id       INDEX       bar id  foo id       Why    InnoDB secondary keys need an extra lookup  by moving the pair into the PK  that is avoided for one direction  The secondary index is  covering   so it does not need the extra lookup  This table is smaller because of getting rid of id and one index    Another case  country    country id INT     -- versus country code CHAR 2  CHARACTER SET ascii   All too often the novice normalizes country code into a 4-byte INT instead of using a  natural  2-byte  nearly-unchanging 2-byte string   Faster  smaller  fewer JOINs  more readable

User · Answer

You make a good point that you can avoid some number of joined queries by using what s called a natural key instead of a surrogate key   Only you can assess if the benefit of this is significant in your application     That is  you can measure the queries in your application that are the most important to be speedy  because they work with large volumes of data or they are executed very frequently   If these queries benefit from eliminating a join  and do not suffer by using a varchar primary key  then do it   Don t use either strategy for all tables in your database   It s likely that in some cases  a natural key is better  but in other cases a surrogate key is better     Other folks make a good point that it s rare in practice for a natural key to never change or have duplicates  so surrogate keys are usually worthwhile

User · Answer

You make a good point that you can avoid some number of joined queries by using what s called a natural key instead of a surrogate key   Only you can assess if the benefit of this is significant in your application     That is  you can measure the queries in your application that are the most important to be speedy  because they work with large volumes of data or they are executed very frequently   If these queries benefit from eliminating a join  and do not suffer by using a varchar primary key  then do it   Don t use either strategy for all tables in your database   It s likely that in some cases  a natural key is better  but in other cases a surrogate key is better     Other folks make a good point that it s rare in practice for a natural key to never change or have duplicates  so surrogate keys are usually worthwhile

User · Answer

Absolutely not   I have done several    several    performance checks between INT  VARCHAR  and CHAR   10 million record table with a PRIMARY KEY  unique and clustered  had the exact same speed and performance  and subtree cost  no matter which of the three I used   That being said    use whatever is best for your application  Don t worry about the performance

User · Answer

I was a bit annoyed by the lack of benchmarks for this online  so I ran a test myself    Note though that I don t do it on a regular basic  so please check my setup and steps for any factors that could have influenced the results unintentionally  and post your concerns in comments   The setup was as follows    Intel   Core    i7-7500U CPU   2 70GHz    4  15 6 GiB RAM  of which I ensured around 8 GB was free during the test  148 6 GB SSD drive  with plenty of free space  Ubuntu 16 04 64-bit MySQL Ver 14 14 Distrib 5 7 20  for Linux  x86 64    The tables   create table jan int  data1 varchar 255   data2 int 10   myindex tinyint 4   ENGINE InnoDB  create table jan int index  data1 varchar 255   data2 int 10   myindex tinyint 4   INDEX  myindex   ENGINE InnoDB  create table jan char  data1 varchar 255   data2 int 10   myindex char 6   ENGINE InnoDB  create table jan char index  data1 varchar 255   data2 int 10   myindex char 6   INDEX  myindex   ENGINE InnoDB  create table jan varchar  data1 varchar 255   data2 int 10   myindex varchar 63   ENGINE InnoDB  create table jan varchar index  data1 varchar 255   data2 int 10   myindex varchar 63   INDEX  myindex   ENGINE InnoDB    Then  I filled 10 million rows in each table with a PHP script whose essence is like this    pdo   get pdo      keys      alabam    massac    newyor    newham    delawa    califo    nevada    texas     florid    ohio        for   k   0   k  lt  10   k          for   j   0   j  lt  1000   j               val               for   i   0   i  lt  1000   i                   val           generate random string             rand  0  10000              keys rand 0  9                                val   rtrim  val                 pdo- gt query  INSERT INTO jan char VALUES      val             echo   n      k   1      millon s  rows inserted        For int tables  the bit   keys rand 0  9    was replaced with just rand 0  9   and for varchar tables  I used full US state names  without cutting or extending them to 6 characters  generate random string   generates a 10-character random string   Then I ran in MySQL    SET SESSION query cache type 0  For jan int table    SELECT count    FROM jan int WHERE myindex   5  SELECT BENCHMARK 1000000000   SELECT count    FROM jan int WHERE myindex   5     For other tables  same as above  with myindex    califo  for char tables and myindex    california  for varchar tables    Times of the BENCHMARK query on each table    jan int  21 30 sec jan int index  18 79 sec jan char  21 70 sec jan char index  18 85 sec jan varchar  21 76 sec jan varchar index  18 86 sec   Regarding table  amp  index sizes  here s the output of show table status from janperformancetest   w  a few columns not shown     -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------    Name                Engine   Version   Row format   Rows      Avg row length   Data length   Max data length   Index length   Data free   Auto increment   Collation                 -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------    jan int             InnoDB        10   Dynamic      9739094               43     422510592                 0              0     4194304             NULL   utf8mb4 unicode 520 ci       jan int index       InnoDB        10   Dynamic      9740329               43     420413440                 0      132857856     7340032             NULL   utf8mb4 unicode 520 ci        jan char            InnoDB        10   Dynamic      9726613               51     500170752                 0              0     5242880             NULL   utf8mb4 unicode 520 ci       jan char index      InnoDB        10   Dynamic      9719059               52     513802240                 0      202342400     5242880             NULL   utf8mb4 unicode 520 ci       jan varchar         InnoDB        10   Dynamic      9722049               53     521142272                 0              0     7340032             NULL   utf8mb4 unicode 520 ci        jan varchar index   InnoDB        10   Dynamic      9738381               49     486539264                 0      202375168     7340032             NULL   utf8mb4 unicode 520 ci     -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------    My conclusion is that there s no performance difference for this particular use case

User · Answer

Absolutely not   I have done several    several    performance checks between INT  VARCHAR  and CHAR   10 million record table with a PRIMARY KEY  unique and clustered  had the exact same speed and performance  and subtree cost  no matter which of the three I used   That being said    use whatever is best for your application  Don t worry about the performance

User · Answer

Depends on the length   If the varchar will be 20 characters  and the int is 4  then if you use an int  your index will have FIVE times as many nodes per page of index space on disk    That means that traversing the index will require one fifth as many physical and or logical reads     So  if performance is an issue  given the opportunity  always use an integral non-meaningful key  called a surrogate  for your tables  and for Foreign Keys that reference the rows in these tables      At the same time  to guarantee data consistency  every table where it matters should also have a meaningful non-numeric alternate key   or unique Index  to ensure that duplicate rows cannot be inserted  duplicate based on meaningful table attributes      For the specific use you are talking about  like state lookups   it really doesn t matter because the size of the table is so small   In general there is no impact on performance from indices on tables with less than a few thousand rows

User · Answer

As for Primary Key  whatever physically makes a row unique should be determined as the primary key     For a reference as a foreign key  using an auto incrementing integer as a surrogate is a nice idea for two main reasons   - First  there s less overhead incurred in the join usually   - Second  if you need to update the table that contains the unique varchar then the update has to cascade down to all the child tables and update all of them as well as the indexes  whereas with the int surrogate  it only has to update the master table and it s indexes   The drawaback to using the surrogate is that you could possibly allow changing of the meaning of the surrogate   ex  id value 1 A 2 B 3 C  Update 3 to D id value 1 A 2 B 3 D  Update 2 to C id value 1 A 2 C 3 D  Update 3 to B id value 1 A 2 C 3 B   It all depends on what you really need to worry about in your structure and what means most

User · Answer

Allow me to say yes there is definitely a difference  taking into consideration the scope of performance  Out of the box definition    1- Using surrogate int is faster in application because you do not need to use ToUpper    ToLower    ToUpperInvarient    or ToLowerInvarient   in your code or in your query and these 4 functions have different performance benchmarks  See Microsoft performance rules on this   performance of application   2- Using surrogate int guarantees not changing the key over time  Even country codes may change  see Wikipedia how ISO codes changed over time  That would take lots of time to change the primary key for subtrees   performance of data maintenance   3- It seems there are issues with ORM solutions  such as NHibernate when PK FK is not int   developer performance

User · Answer

Not sure about the performance implications  but it seems a possible compromise  at least during development  would be to include both the auto-incremented  integer  surrogate  key  as well as your intended  unique   natural  key   This would give you the opportunity to evaluate performance  as well as other possible issues  including the changeability of natural keys

User · Answer

As for Primary Key  whatever physically makes a row unique should be determined as the primary key     For a reference as a foreign key  using an auto incrementing integer as a surrogate is a nice idea for two main reasons   - First  there s less overhead incurred in the join usually   - Second  if you need to update the table that contains the unique varchar then the update has to cascade down to all the child tables and update all of them as well as the indexes  whereas with the int surrogate  it only has to update the master table and it s indexes   The drawaback to using the surrogate is that you could possibly allow changing of the meaning of the surrogate   ex  id value 1 A 2 B 3 C  Update 3 to D id value 1 A 2 B 3 D  Update 2 to C id value 1 A 2 C 3 D  Update 3 to B id value 1 A 2 C 3 B   It all depends on what you really need to worry about in your structure and what means most

User · Answer

For short codes  there s probably no difference   This is especially true as the table holding these codes are likely to be very small  a couple thousand rows at most  and not change often  when is the last time we added a new US State    For larger tables with a wider variation among the key  this can be dangerous   Think about using e-mail address user name from a User table  for example   What happens when you have a few million users and some of those users have long names or e-mail addresses    Now any time you need to join this table using that key it becomes much more expensive

User · Answer

For short codes  there s probably no difference   This is especially true as the table holding these codes are likely to be very small  a couple thousand rows at most  and not change often  when is the last time we added a new US State    For larger tables with a wider variation among the key  this can be dangerous   Think about using e-mail address user name from a User table  for example   What happens when you have a few million users and some of those users have long names or e-mail addresses    Now any time you need to join this table using that key it becomes much more expensive

User · Answer

I faced the same dilemma  I made a DW  Constellation schema  with 3 fact tables  Road Accidents  Vehicles in Accidents and Casualties in Accidents  Data includes all accidents recorded in UK from 1979 to 2012  and 60 dimension tables  All together  about 20 million records   Fact tables relationships    ----------            ---------    Accident   gt -------- lt   Vehicle    -----v----  1         ----v----       1                      1             ----------              --- lt   Casualty   gt ---              ----------      RDMS  MySQL 5 6  Natively the Accident index is a varchar numbers and letters   with 15 digits  I tried not to have surrogate keys  once the accident indexes would never change  In a i7 8 cores  computer  the DW became too slow to query after 12 million records of load depending of the dimensions  After a lot of re-work and adding bigint surrogate keys I got a average 20  speed performance boost  Yet to low performance gain  but valid try  Im working in MySQL tuning and clustering

User · Answer

As for Primary Key  whatever physically makes a row unique should be determined as the primary key     For a reference as a foreign key  using an auto incrementing integer as a surrogate is a nice idea for two main reasons   - First  there s less overhead incurred in the join usually   - Second  if you need to update the table that contains the unique varchar then the update has to cascade down to all the child tables and update all of them as well as the indexes  whereas with the int surrogate  it only has to update the master table and it s indexes   The drawaback to using the surrogate is that you could possibly allow changing of the meaning of the surrogate   ex  id value 1 A 2 B 3 C  Update 3 to D id value 1 A 2 B 3 D  Update 2 to C id value 1 A 2 C 3 D  Update 3 to B id value 1 A 2 C 3 B   It all depends on what you really need to worry about in your structure and what means most

User · Answer

I was a bit annoyed by the lack of benchmarks for this online  so I ran a test myself    Note though that I don t do it on a regular basic  so please check my setup and steps for any factors that could have influenced the results unintentionally  and post your concerns in comments   The setup was as follows    Intel   Core    i7-7500U CPU   2 70GHz    4  15 6 GiB RAM  of which I ensured around 8 GB was free during the test  148 6 GB SSD drive  with plenty of free space  Ubuntu 16 04 64-bit MySQL Ver 14 14 Distrib 5 7 20  for Linux  x86 64    The tables   create table jan int  data1 varchar 255   data2 int 10   myindex tinyint 4   ENGINE InnoDB  create table jan int index  data1 varchar 255   data2 int 10   myindex tinyint 4   INDEX  myindex   ENGINE InnoDB  create table jan char  data1 varchar 255   data2 int 10   myindex char 6   ENGINE InnoDB  create table jan char index  data1 varchar 255   data2 int 10   myindex char 6   INDEX  myindex   ENGINE InnoDB  create table jan varchar  data1 varchar 255   data2 int 10   myindex varchar 63   ENGINE InnoDB  create table jan varchar index  data1 varchar 255   data2 int 10   myindex varchar 63   INDEX  myindex   ENGINE InnoDB    Then  I filled 10 million rows in each table with a PHP script whose essence is like this    pdo   get pdo      keys      alabam    massac    newyor    newham    delawa    califo    nevada    texas     florid    ohio        for   k   0   k  lt  10   k          for   j   0   j  lt  1000   j               val               for   i   0   i  lt  1000   i                   val           generate random string             rand  0  10000              keys rand 0  9                                val   rtrim  val                 pdo- gt query  INSERT INTO jan char VALUES      val             echo   n      k   1      millon s  rows inserted        For int tables  the bit   keys rand 0  9    was replaced with just rand 0  9   and for varchar tables  I used full US state names  without cutting or extending them to 6 characters  generate random string   generates a 10-character random string   Then I ran in MySQL    SET SESSION query cache type 0  For jan int table    SELECT count    FROM jan int WHERE myindex   5  SELECT BENCHMARK 1000000000   SELECT count    FROM jan int WHERE myindex   5     For other tables  same as above  with myindex    califo  for char tables and myindex    california  for varchar tables    Times of the BENCHMARK query on each table    jan int  21 30 sec jan int index  18 79 sec jan char  21 70 sec jan char index  18 85 sec jan varchar  21 76 sec jan varchar index  18 86 sec   Regarding table  amp  index sizes  here s the output of show table status from janperformancetest   w  a few columns not shown     -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------    Name                Engine   Version   Row format   Rows      Avg row length   Data length   Max data length   Index length   Data free   Auto increment   Collation                 -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------    jan int             InnoDB        10   Dynamic      9739094               43     422510592                 0              0     4194304             NULL   utf8mb4 unicode 520 ci       jan int index       InnoDB        10   Dynamic      9740329               43     420413440                 0      132857856     7340032             NULL   utf8mb4 unicode 520 ci        jan char            InnoDB        10   Dynamic      9726613               51     500170752                 0              0     5242880             NULL   utf8mb4 unicode 520 ci       jan char index      InnoDB        10   Dynamic      9719059               52     513802240                 0      202342400     5242880             NULL   utf8mb4 unicode 520 ci       jan varchar         InnoDB        10   Dynamic      9722049               53     521142272                 0              0     7340032             NULL   utf8mb4 unicode 520 ci        jan varchar index   InnoDB        10   Dynamic      9738381               49     486539264                 0      202375168     7340032             NULL   utf8mb4 unicode 520 ci     -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------    My conclusion is that there s no performance difference for this particular use case

User · Answer

At HauteLook  we changed many of our tables to use natural keys  We did experience a real-world increase in performance  As you mention  many of our queries now use less joins which makes the queries more performant  We will even use a composite primary key if it makes sense  That being said  some tables are just easier to work with if they have a surrogate key   Also  if you are letting people write interfaces to your database  a surrogate key can be helpful  The 3rd party can rely on the fact that the surrogate key will change only in very rare circumstances

User · Answer

I faced the same dilemma  I made a DW  Constellation schema  with 3 fact tables  Road Accidents  Vehicles in Accidents and Casualties in Accidents  Data includes all accidents recorded in UK from 1979 to 2012  and 60 dimension tables  All together  about 20 million records   Fact tables relationships    ----------            ---------    Accident   gt -------- lt   Vehicle    -----v----  1         ----v----       1                      1             ----------              --- lt   Casualty   gt ---              ----------      RDMS  MySQL 5 6  Natively the Accident index is a varchar numbers and letters   with 15 digits  I tried not to have surrogate keys  once the accident indexes would never change  In a i7 8 cores  computer  the DW became too slow to query after 12 million records of load depending of the dimensions  After a lot of re-work and adding bigint surrogate keys I got a average 20  speed performance boost  Yet to low performance gain  but valid try  Im working in MySQL tuning and clustering

User · Answer

As for Primary Key  whatever physically makes a row unique should be determined as the primary key     For a reference as a foreign key  using an auto incrementing integer as a surrogate is a nice idea for two main reasons   - First  there s less overhead incurred in the join usually   - Second  if you need to update the table that contains the unique varchar then the update has to cascade down to all the child tables and update all of them as well as the indexes  whereas with the int surrogate  it only has to update the master table and it s indexes   The drawaback to using the surrogate is that you could possibly allow changing of the meaning of the surrogate   ex  id value 1 A 2 B 3 C  Update 3 to D id value 1 A 2 B 3 D  Update 2 to C id value 1 A 2 C 3 D  Update 3 to B id value 1 A 2 C 3 B   It all depends on what you really need to worry about in your structure and what means most

[mysql] Is there a REAL performance difference between INT and VARCHAR primary keys?

Examples related to mysql

Examples related to performance

Examples related to primary-key

Examples related to innodb

Examples related to myisam