[database] What's wrong with foreign keys?

I remember hearing Joel Spolsky mention in podcast 014 that he'd barely ever used a foreign key (if I remember correctly). However, to me they seem pretty vital to avoid duplication and subsequent data integrity problems throughout your database.

Do people have some solid reasons as to why (to avoid a discussion in lines with Stack Overflow principles)?

Edit: "I've yet to have a reason to create a foreign key, so this might be my first reason to actually set up one."

The answer is


Update: I always use foreign keys now. My answer to the objection "they complicated testing" is "write your unit tests so they don't need the database at all. Any tests that use the database should use it properly, and that includes foreign keys. If the setup is painful, find a less painful way to do the setup."


Foreign keys complicate automated testing

Suppose you're using foreign keys. You're writing an automated test that says "when I update a financial account, it should save a record of the transaction." In this test, you're only concerned with two tables: accounts and transactions.

However, accounts has a foreign key to contracts, and contracts has a fk to clients, and clients has a fk to cities, and cities has a fk to states.

Now the database will not allow you to run your test without setting up data in four tables that aren't related to your test.

There are at least two possible perspectives on this:

  • "That's a good thing: your test should be realistic, and those data constraints will exist in production."
  • "That's a bad thing: you should be able to unit test pieces of the system without involving other pieces. You can add integration tests for the system as a whole."

It may also be possible to temporarily turn off foreign key checks while running tests. MySQL, at least, supports this.


The argument I have heard is that the front-end should have these business rules. Foreign keys "add unnecessary overhead" when you shouldn't be allowing any insertions that break your constraints in the first place. Do I agree with this? No, but that is what I have always heard.

EDIT: My guess is he was referring to foreign key constraints, not foreign keys as a concept.


I always use them, but then I make databases for financial systems. The database is the critical part of the application. If the data in a financial database isn't totally accurate then it really doesn't matter how much effort you put into your code/front-end design. You're just wasting your time.

There's also the fact that multiple systems generally need to interface directly with the database - from other systems that just read data out (Crystal Reports) to systems that insert data (not necessarily using an API I've designed; it may be written by a dull-witted manager who has just discovered VBScript and has the SA password for the SQL box). If the database isn't as idiot-proof as it can possibly be, well - bye bye database.

If your data is important, then yes, use foreign keys, create a suite of stored procedures to interact with the data, and make the toughest DB you can. If your data isn't important, why are you making a database to begin with?


There's one good reason not to use them: If you don't understand their role or how to use them.

In the wrong situations, foreign key constraints can lead to waterfall replication of accidents. If somebody removes the wrong record, undoing it can become a mammoth task.

Also, conversely, when you need to remove something, if poorly designed, constraints can cause all sorts of locks that prevent you.


Quite often we receive the errors with FK constraints Cannot add or update a child row: a foreign key constraint fails Suppose there are two tables inventory_source and contract_lines, and we are referring inventory_source_id in contract_lines from inventory_source and suppose we want to delete record from inventory_source and the record is already present in contract_lines or we want to delete the PK column from Base table, we get errors for FK constraints, we can avoid it using the steps jotted below.

CREATE TABLE inventory_source (
inventory_source_id int(11) NOT NULL AUTO_INCREMENT,
display_name varchar(40) NOT NULL,
state_id int(11) NOT NULL,
PRIMARY KEY (inventory_source_id),
KEY state_id (state_id),
CONSTRAINT ba_inventory_source_state_fk FOREIGN KEY (state_id) REFERENCES   ba_state (state_id)
) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=utf8;

CREATE TABLE contract_lines(
contract_line_id int(11) NOT NULL AUTO_INCREMENT,
inventory_source_id int(11) NULL ,
PRIMARY KEY (contract_line_id),
UNIQUE KEY contract_line_id (contract_line_id),
KEY AI_contract_line_id (contract_line_id),
KEY contract_lines_inventory_source_fk (inventory_source_id),
CONSTRAINT contract_lines_inventory_source_fk FOREIGN KEY       (inventory_source_id) REFERENCES ba_inventory_source (inventory_source_id)
) ENGINE=InnoDB AUTO_INCREMENT=135 DEFAULT CHARSET=utf8 ;

We can overcome it using the following steps:-

  1. Delete or update the row from the inventory_source will automatically delete or update the matching rows in the contract_lines table and this is known as cascade delete or update.
  2. Another way of doing it is setting the column i.e inventory_source_id in the contract_lines table to NULL, when a record corresponding to it is deleted in the inventory_source table.
  3. We can restrict the parent table for delete or update in other words one can reject the delete or update operation for the inventory_source table.
  4. Attempt to delete or update a primary key value will not be permitted to proceed if there is a related foreign key value in the referenced table.

@imphasing - this is exactly the kind of mindset that causes maintenance nightmares.

Why oh why would you ignore declarative referential integrity, where the data can be guaranteed to be at least consistent, in favour of so called "software enforcement" which is a weak preventative measure at best.


Like many things, it's a tradeoff. It's a question of where you want to do the work to verify the data integrity:

(1) use a foreign key (a single point to configure for a table, feature is already implemented, tested, proven to work)

(2) leave it to the users of the database (possible multiple users/apps updating the same table (s) meaning more potential points of failure and increased complexity in testing).

It's more efficient for the database to do (2), easier to maintain and less risk with (1).


Many of the people answering here get too hung up on the importance of referential integrity implemented via referential constraints. Working on large databases with referential integrity just does not perform well. Oracle seems particularly bad at cascading deletes. My rule of thumb is that applications should never update the database directly and should be via a stored procedure. This keeps the code base inside the database, and means that the database maintains its integrity.

Where many applications may be accessing the database, problems do arise because of referential integrity constraints but this is down to a control.

There is a wider issue too in that, application developers may have very different requirements that database developers may not necessarily be that familiar with.


I always thought it was lazy not to use them. I was taught it should always be done. But then, I didnt listen to Joel's discussion. He may have had a good reason, I don't know.


I can see a few reasons to use foreign keys (Orphaned rows, as someone mentioned, are annoying) but I never use them either. With a relatively sane DB schema, I don't think they are 100% needed. Constraints are good, but enforcing them via software is a better method, I think.

Alex


In a project I worked on there was often implicit rather than explicit relationships so that numerous tables could be joined on the same column.

Take the following table

Address

  • AddressId (PK)
  • EntityId
  • EntityType
  • City
  • State
  • Country
  • Etc..

Possible values of EntityType may be Employee, Company, Customer, and the EntityId refers to the primarky key of whichever table you were interested in.

I don't really think this is the best way to do things, but it worked for this project.


I'm sure there are plenty of applications where you can get away with it, but it's not the best idea. You can't always count on your application to properly manage your database, and frankly managing the database should not be of very much concern to your application.

If you are using a relational database then it seems you ought to have some relationships defined in it. Unfortunately this attitude (you don't need foreign keys) seems to be embraced by a lot of application developers who would rather not be bothered with silly things like data integrity (but need to because their companies don't have dedicated database developers). Usually in databases put together by these types you are lucky just to have primary keys ;)


Foreign keys are essential to any relational database model.


I know only Oracle databases, no other ones, and I can tell that Foreign Keys are essential for maintaining data integrity. Prior to inserting data, a data structure needs to be made, and be made correctlty. When that is done - and thus all primary AND foreign keys are created - the work is done !

Meaning : orphaned rows ? No. Never seen that in my life. Unless a bad programmer forgot the foreign key, or if he implemented that on another level. Both are - in context of Oracle - huge mistakes, which will lead to data duplication, orphan data, and thus : data corruption. I can't imagine a database without FK enforced. It looks like chaos to me. It's a bit like the Unix permission system : imagine that everybody is root. Think of the chaos.

Foreign Keys are essential, just like Primary Keys. It's like saying : what if we removing Primary Keys ? Well, total chaos is going to happen. That's what. You may not move the primary or foreign key responsibility to the programming level, it must be at the data level.

Drawbacks ? Yes, absolutely ! Because on insert, a lot more checks are going to be happening. But, if data integrity is more important than performance, it's a no-brainer. The problem with performance on Oracle is more related to indexes, which come with PK and FK's.


I also think that foreign keys are a necessity in most databases. The only drawback (besides the performance hit that comes with having enforced consistence) is that having a foreign key allows people to write code that assumes there is a functional foreign key. That should never be allowed.

For example, I've seen people write code that inserts into the referenced table and then attempts inserts into the referencing table without verifying the first insert was successful. If the foreign key is removed at a later time, that results in an inconsistent database.

You also don't have the option of assuming a specific behavior on update or delete. You still need to write your code to do what you want regardless of whether there is a foreign key present. If you assume deletes are cascaded when they are not, your deletes will fail. If you assume updates to the referenced columns are propogated to the referencing rows when they are not, your updates will fail. For the purposes of writing code, you might as well not have those features.

If those features are turned on, then your code will emulate them anyway and you'll lose a little performance.

So, the summary.... Foreign keys are essential if you need a consistent database. Foreign keys should never be assumed to be present or functional in code that you write.


Foreign keys are essential to any relational database model.


Additional Reason to use Foreign Keys: - Allows greater reuse of a database

Additional Reason to NOT use Foreign Keys: - You are trying to lock-in a customer into your tool by reducing reuse.


"Before adding a record, check that a corresponding record exists in another table" is business logic.

Here are some reasons you don't want this in the database:

  1. If the business rules change, you have to change the database. The database will need to recreate the index in a lot of cases and this is slow on large tables. (Changing rules include: allow guests to post messages or allow users to delete their account despite having posted comments, etc).

  2. Changing the database is not as easy as deploying a software fix by pushing the changes to the production repository. We want to avoid changing the database structure as much as possible. The more business logic there is in the database the more you increase the chances of needing to change the databae (and triggering re-indexing).

  3. TDD. In unit tests you can substitute the database for mocks and test the functionality. If you have any business logic in your database, you are not doing complete tests and would need to either test with the database or replicate the business logic in code for testing purposes, duplicating the logic and increasing the likelyhood of the logic not working in the same way.

  4. Reusing your logic with different data sources. If there is no logic in the database, my application can create objects from records from the database, create them from a web service, a json file or any other source. I just need to swap out the data mapper implementation and can use all my business logic with any source. If there is logic in the database, this isn't possible and you have to implement the logic at the data mapper layer or in the business logic. Either way, you need those checks in your code. If there's no logic in the database I can deploy the application in different locations using different database or flat-file implementations.


Additional Reason to use Foreign Keys: - Allows greater reuse of a database

Additional Reason to NOT use Foreign Keys: - You are trying to lock-in a customer into your tool by reducing reuse.


I also think that foreign keys are a necessity in most databases. The only drawback (besides the performance hit that comes with having enforced consistence) is that having a foreign key allows people to write code that assumes there is a functional foreign key. That should never be allowed.

For example, I've seen people write code that inserts into the referenced table and then attempts inserts into the referencing table without verifying the first insert was successful. If the foreign key is removed at a later time, that results in an inconsistent database.

You also don't have the option of assuming a specific behavior on update or delete. You still need to write your code to do what you want regardless of whether there is a foreign key present. If you assume deletes are cascaded when they are not, your deletes will fail. If you assume updates to the referenced columns are propogated to the referencing rows when they are not, your updates will fail. For the purposes of writing code, you might as well not have those features.

If those features are turned on, then your code will emulate them anyway and you'll lose a little performance.

So, the summary.... Foreign keys are essential if you need a consistent database. Foreign keys should never be assumed to be present or functional in code that you write.


To me, if you want to go by the ACID standards, it is critical to have foreign keys to ensure referential integrity.


Reasons to use Foreign Keys:

  • you won't get Orphaned Rows
  • you can get nice "on delete cascade" behavior, automatically cleaning up tables
  • knowing about the relationships between tables in the database helps the Optimizer plan your queries for most efficient execution, since it is able to get better estimates on join cardinality.
  • FKs give a pretty big hint on what statistics are most important to collect on the database, which in turn leads to better performance
  • they enable all kinds of auto-generated support -- ORMs can generate themselves, visualization tools will be able to create nice schema layouts for you, etc.
  • someone new to the project will get into the flow of things faster since otherwise implicit relationships are explicitly documented

Reasons not to use Foreign Keys:

  • you are making the DB work extra on every CRUD operation because it has to check FK consistency. This can be a big cost if you have a lot of churn
  • by enforcing relationships, FKs specify an order in which you have to add/delete things, which can lead to refusal by the DB to do what you want. (Granted, in such cases, what you are trying to do is create an Orphaned Row, and that's not usually a good thing). This is especially painful when you are doing large batch updates, and you load up one table before another, with the second table creating consistent state (but should you be doing that sort of thing if there is a possibility that the second load fails and your database is now inconsistent?).
  • sometimes you know beforehand your data is going to be dirty, you accept that, and you want the DB to accept it
  • you are just being lazy :-)

I think (I am not certain!) that most established databases provide a way to specify a foreign key that is not enforced, and is simply a bit of metadata. Since non-enforcement wipes out every reason not to use FKs, you should probably go that route if any of the reasons in the second section apply.


They can make deleting records more cumbersome - you can't delete the "master" record where there are records in other tables where foreign keys would violate that constraint. You can use triggers to have cascading deletes.

If you chose your primary key unwisely, then changing that value becomes even more complex. For example, if I have the PK of my "customers" table as the person's name, and make that key a FK in the "orders" table", if the customer wants to change his name, then it is a royal pain... but that is just shoddy database design.

I believe the advantages in using fireign keys outweighs any supposed disadvantages.


This is an issue of upbringing. If somewhere in your educational or professional career you spent time feeding and caring for databases (or worked closely with talented folks who did), then the fundamental tenets of entities and relationships are well-ingrained in your thought process. Among those rudiments is how/when/why to specify keys in your database (primary, foreign and perhaps alternate). It's second nature.

If, however, you've not had such a thorough or positive experience in your past with RDBMS-related endeavors, then you've likely not been exposed to such information. Or perhaps your past includes immersion in an environment that was vociferously anti-database (e.g., "those DBAs are idiots - we few, we chosen few java/c# code slingers will save the day"), in which case you might be vehemently opposed to the arcane babblings of some dweeb telling you that FKs (and the constraints they can imply) really are important if you'd just listen.

Most everyone was taught when they were kids that brushing your teeth was important. Can you get by without it? Sure, but somewhere down the line you'll have less teeth available than you could have if you had brushed after every meal. If moms and dads were responsible enough to cover database design as well as oral hygiene, we wouldn't be having this conversation. :-)


Update: I always use foreign keys now. My answer to the objection "they complicated testing" is "write your unit tests so they don't need the database at all. Any tests that use the database should use it properly, and that includes foreign keys. If the setup is painful, find a less painful way to do the setup."


Foreign keys complicate automated testing

Suppose you're using foreign keys. You're writing an automated test that says "when I update a financial account, it should save a record of the transaction." In this test, you're only concerned with two tables: accounts and transactions.

However, accounts has a foreign key to contracts, and contracts has a fk to clients, and clients has a fk to cities, and cities has a fk to states.

Now the database will not allow you to run your test without setting up data in four tables that aren't related to your test.

There are at least two possible perspectives on this:

  • "That's a good thing: your test should be realistic, and those data constraints will exist in production."
  • "That's a bad thing: you should be able to unit test pieces of the system without involving other pieces. You can add integration tests for the system as a whole."

It may also be possible to temporarily turn off foreign key checks while running tests. MySQL, at least, supports this.


This is an issue of upbringing. If somewhere in your educational or professional career you spent time feeding and caring for databases (or worked closely with talented folks who did), then the fundamental tenets of entities and relationships are well-ingrained in your thought process. Among those rudiments is how/when/why to specify keys in your database (primary, foreign and perhaps alternate). It's second nature.

If, however, you've not had such a thorough or positive experience in your past with RDBMS-related endeavors, then you've likely not been exposed to such information. Or perhaps your past includes immersion in an environment that was vociferously anti-database (e.g., "those DBAs are idiots - we few, we chosen few java/c# code slingers will save the day"), in which case you might be vehemently opposed to the arcane babblings of some dweeb telling you that FKs (and the constraints they can imply) really are important if you'd just listen.

Most everyone was taught when they were kids that brushing your teeth was important. Can you get by without it? Sure, but somewhere down the line you'll have less teeth available than you could have if you had brushed after every meal. If moms and dads were responsible enough to cover database design as well as oral hygiene, we wouldn't be having this conversation. :-)


Reasons to use Foreign Keys:

  • you won't get Orphaned Rows
  • you can get nice "on delete cascade" behavior, automatically cleaning up tables
  • knowing about the relationships between tables in the database helps the Optimizer plan your queries for most efficient execution, since it is able to get better estimates on join cardinality.
  • FKs give a pretty big hint on what statistics are most important to collect on the database, which in turn leads to better performance
  • they enable all kinds of auto-generated support -- ORMs can generate themselves, visualization tools will be able to create nice schema layouts for you, etc.
  • someone new to the project will get into the flow of things faster since otherwise implicit relationships are explicitly documented

Reasons not to use Foreign Keys:

  • you are making the DB work extra on every CRUD operation because it has to check FK consistency. This can be a big cost if you have a lot of churn
  • by enforcing relationships, FKs specify an order in which you have to add/delete things, which can lead to refusal by the DB to do what you want. (Granted, in such cases, what you are trying to do is create an Orphaned Row, and that's not usually a good thing). This is especially painful when you are doing large batch updates, and you load up one table before another, with the second table creating consistent state (but should you be doing that sort of thing if there is a possibility that the second load fails and your database is now inconsistent?).
  • sometimes you know beforehand your data is going to be dirty, you accept that, and you want the DB to accept it
  • you are just being lazy :-)

I think (I am not certain!) that most established databases provide a way to specify a foreign key that is not enforced, and is simply a bit of metadata. Since non-enforcement wipes out every reason not to use FKs, you should probably go that route if any of the reasons in the second section apply.


One time when an FK might cause you a problem is when you have historical data that references the key (in a lookup table) even though you no longer want the key available.
Obviously the solution is to design things better up front, but I am thinking of real world situations here where you don't always have control of the full solution.
For example: perhaps you have a look up table customer_type that lists different types of customers - lets say you need to remove a certain customer type, but (due to business restraints) aren't able to update the client software, and nobody invisaged this situation when developing the software, the fact that it is a foreign key in some other table may prevent you from removing the row even though you know the historical data that references it is irrelevant.
After being burnt with this a few times you probably lean away from db enforcement of relationships.
(I'm not saying this is good - just giving a reason why you may decide to avoid FKs and db contraints in general)


I have to second most of the comments here, Foreign Keys are necessary items to ensure that you have data with integrity. The different options for ON DELETE and ON UPDATE will allow you to get around some of the "down falls" that people mention here regarding their use.

I find that in 99% of all my projects I will have FK's to enforce the integrity of the data, however, there are those rare occasions where I have clients that MUST keep their old data, regardless of how bad it is....but then I spend a lot of time writing code that goes in to only get the valid data anyway, so it becomes pointless.


I'll echo what Dmitriy said, but adding on a point.

I worked on a batch billing system that needed to insert large sets of rows on 30+ tables. We weren't allowed to do a data pump (Oracle) so we had to do bulk inserts. Those tables had foreign keys on them, but we had already ensured that they were not breaking any relationships.

Before insert, we disable the foreign key constraints so that Oracle doesn't take forever doing the inserts. After the insert is successful, we re-enable the constraints.

PS: In a large database with many foreign keys and child row data for a single record, sometimes foreign keys can be bad, and you may want to disallow cascading deletes. For us in the billing system, it would take too long and be too taxing on the database if we did cascading deletes, so we just mark the record as bad with a field on the main driver (parent) table.


Like many things, it's a tradeoff. It's a question of where you want to do the work to verify the data integrity:

(1) use a foreign key (a single point to configure for a table, feature is already implemented, tested, proven to work)

(2) leave it to the users of the database (possible multiple users/apps updating the same table (s) meaning more potential points of failure and increased complexity in testing).

It's more efficient for the database to do (2), easier to maintain and less risk with (1).


I'll echo what Dmitriy said, but adding on a point.

I worked on a batch billing system that needed to insert large sets of rows on 30+ tables. We weren't allowed to do a data pump (Oracle) so we had to do bulk inserts. Those tables had foreign keys on them, but we had already ensured that they were not breaking any relationships.

Before insert, we disable the foreign key constraints so that Oracle doesn't take forever doing the inserts. After the insert is successful, we re-enable the constraints.

PS: In a large database with many foreign keys and child row data for a single record, sometimes foreign keys can be bad, and you may want to disallow cascading deletes. For us in the billing system, it would take too long and be too taxing on the database if we did cascading deletes, so we just mark the record as bad with a field on the main driver (parent) table.


From my experience its always better to avoid using FKs in Database Critical Applications. I would not disagree with guys here who say FKs is a good practice but its not practical where the database is huge and has huge CRUD operations/sec. I can share without naming ... one of the biggest investment bank of doesn't have a single FK in databases. These constrains are handled by programmers while creating applications involving DB. The basic reason is when ever a new CRUD is done it has to effect multiple tables and verify for each inserts/updates, though this won't be a big issue for queries affecting single rows but it does create a huge latency when you deal with batch processing which any big bank has to do as daily tasks.

Its better to avoid FKs but its risk has to be handled by programmers.


There's one good reason not to use them: If you don't understand their role or how to use them.

In the wrong situations, foreign key constraints can lead to waterfall replication of accidents. If somebody removes the wrong record, undoing it can become a mammoth task.

Also, conversely, when you need to remove something, if poorly designed, constraints can cause all sorts of locks that prevent you.


They can make deleting records more cumbersome - you can't delete the "master" record where there are records in other tables where foreign keys would violate that constraint. You can use triggers to have cascading deletes.

If you chose your primary key unwisely, then changing that value becomes even more complex. For example, if I have the PK of my "customers" table as the person's name, and make that key a FK in the "orders" table", if the customer wants to change his name, then it is a royal pain... but that is just shoddy database design.

I believe the advantages in using fireign keys outweighs any supposed disadvantages.


In DB2, if MQTs (Materialized Query Tables) are used, foreign key constraints are required for the optimizer to choose the right plan for any given query. Since they contain the cardinality information, the optimizer uses the metadata heavily to use a MQT or not.


I have heard this argument too - from people who forgot to put an index on their foreign keys and then complained that certain operations were slow (because constraint checking could take advantage of any index). So to sum up: There is no good reason not to use foreign keys. All modern databases support cascaded deletes, so...


I agree with the previous answers in that they are useful to mantain data consistency. However, there was an interesting post by Jeff Atwood some weeks ago that discussed the pros and cons of normalized and consistent data.

In a few words, a denormalized database can be faster when handling huge amounts of data; and you may not care about precise consistency depending on the application, but it forces you to be much more careful when dealing with data, as the DB won't be.


Bigger question is: would you drive with a blindfold on? That’s how it is if you develop a system without referential constraints. Keep in mind, that business requirements change, application design changes, respective logical assumptions in the code changes, logic itself can be refactored, and so on. In general, constraints in databases are put in place under contemporary logical assumptions, seemingly correct for particular set of logical assertions and assumptions.

Through the lifecycle of an application, referential and data checks constraints police data collection via the application, especially when new requirements drive logical application changes.

To the subject of this listing - a foreign key does not by itself "improve performance", nor does it "degrade performance" significantly from a standpoint of real-time transaction processing system. However, there is an aggregated cost for constraint checking in HIGH volume "batch" system. So, here is the difference, real-time vs. batch transaction process; batch processing - where aggreated cost, incured by constraint checks, of a sequentially processed batch poses a performance hit.

In a well designed system, data consistency checks would be done "before" processing a batch through (nevertheless, there is a cost associated here also); therefore, foreign key constraint checks are not required during load time. In fact all constraints, including foreign key, should be temporarily disabled till the batch is processed.

QUERY PERFORMANCE - if tables are joined on foreign keys, be cognizant of the fact that foreign key columns are NOT INDEXED (though the respective primary key is indexed by definition). By indexing a foreign key, for that matter, by indexing any key, and joining tables on indexed helps with better performances, not by joining on non-indexed key with foreign key constraint on it.

Changing subjects, if a database is just supporting website display/rendering content/etc and recording clicks, then a database with full constraints on all tables is over kill for such purposes. Think about it. Most websites don’t even use a database for such. For similar requirements, where data is just being recorded and not referenced per say, use an in-memory database, which does not have constraints. This doesn’t mean that there is no data model, yes logical model, but no physical data model.


Like many things, it's a tradeoff. It's a question of where you want to do the work to verify the data integrity:

(1) use a foreign key (a single point to configure for a table, feature is already implemented, tested, proven to work)

(2) leave it to the users of the database (possible multiple users/apps updating the same table (s) meaning more potential points of failure and increased complexity in testing).

It's more efficient for the database to do (2), easier to maintain and less risk with (1).


There's one good reason not to use them: If you don't understand their role or how to use them.

In the wrong situations, foreign key constraints can lead to waterfall replication of accidents. If somebody removes the wrong record, undoing it can become a mammoth task.

Also, conversely, when you need to remove something, if poorly designed, constraints can cause all sorts of locks that prevent you.


I'm sure there are plenty of applications where you can get away with it, but it's not the best idea. You can't always count on your application to properly manage your database, and frankly managing the database should not be of very much concern to your application.

If you are using a relational database then it seems you ought to have some relationships defined in it. Unfortunately this attitude (you don't need foreign keys) seems to be embraced by a lot of application developers who would rather not be bothered with silly things like data integrity (but need to because their companies don't have dedicated database developers). Usually in databases put together by these types you are lucky just to have primary keys ;)


They can make deleting records more cumbersome - you can't delete the "master" record where there are records in other tables where foreign keys would violate that constraint. You can use triggers to have cascading deletes.

If you chose your primary key unwisely, then changing that value becomes even more complex. For example, if I have the PK of my "customers" table as the person's name, and make that key a FK in the "orders" table", if the customer wants to change his name, then it is a royal pain... but that is just shoddy database design.

I believe the advantages in using fireign keys outweighs any supposed disadvantages.


To me, if you want to go by the ACID standards, it is critical to have foreign keys to ensure referential integrity.


Update: I always use foreign keys now. My answer to the objection "they complicated testing" is "write your unit tests so they don't need the database at all. Any tests that use the database should use it properly, and that includes foreign keys. If the setup is painful, find a less painful way to do the setup."


Foreign keys complicate automated testing

Suppose you're using foreign keys. You're writing an automated test that says "when I update a financial account, it should save a record of the transaction." In this test, you're only concerned with two tables: accounts and transactions.

However, accounts has a foreign key to contracts, and contracts has a fk to clients, and clients has a fk to cities, and cities has a fk to states.

Now the database will not allow you to run your test without setting up data in four tables that aren't related to your test.

There are at least two possible perspectives on this:

  • "That's a good thing: your test should be realistic, and those data constraints will exist in production."
  • "That's a bad thing: you should be able to unit test pieces of the system without involving other pieces. You can add integration tests for the system as a whole."

It may also be possible to temporarily turn off foreign key checks while running tests. MySQL, at least, supports this.


The argument I have heard is that the front-end should have these business rules. Foreign keys "add unnecessary overhead" when you shouldn't be allowing any insertions that break your constraints in the first place. Do I agree with this? No, but that is what I have always heard.

EDIT: My guess is he was referring to foreign key constraints, not foreign keys as a concept.


There's one good reason not to use them: If you don't understand their role or how to use them.

In the wrong situations, foreign key constraints can lead to waterfall replication of accidents. If somebody removes the wrong record, undoing it can become a mammoth task.

Also, conversely, when you need to remove something, if poorly designed, constraints can cause all sorts of locks that prevent you.


I agree with the previous answers in that they are useful to mantain data consistency. However, there was an interesting post by Jeff Atwood some weeks ago that discussed the pros and cons of normalized and consistent data.

In a few words, a denormalized database can be faster when handling huge amounts of data; and you may not care about precise consistency depending on the application, but it forces you to be much more careful when dealing with data, as the DB won't be.


Wowowo...
Answers everywhere. Actually this is the most complicated topic I have ever encountered. I use FKs when they are needed but on production environment I rarely use them.

Here is my whys I rarely use the Fks:

1. Most of the time I am dealing with huge data on small server to improve performance I need to remove the FKs. Because when you have FKs and you do Create, Update or Delete the RDBMS first check if there no constraint violation and if you have huge DB that could be something fatal

2. Sometimes I need to import data from others places and because I am not too sure of how well structured they are, I simply drop the FKs.

3. In case you are dealing with multiple DBs and having reference key in an other DB will not go well(as for now) until you remove the FKs (cross database relations)
4. They was also a case when you write an application which will seat on whatever RDBMS or you want your DB to be exported and imported in any RDBMS system in this case each specific RDBMS system has his own way of dealing with FKs and you will probably be obliged to drop the use of FKs.

5. If you user RDBMS platform (ORMs) you know that some of them offer their own mapping depending on the solution and technicality their offer and you don't care about creating the tables and their FKs.

6. Before the last point will be knowledge to deal with DB that has FKs and the knowledge to write an application that does all the Job without the need of FK 7. Lastly as I started saying it all depend on your scenario, in case knowledge is not a barrier. You will always want to run the best of the best you can get!


Thank you everybody!


One good principle of data structure design is to ensure that every attribute of a table or object be subject to a well-understood constraint. This is important because if you or your program can count on valid data in the database, you are less likely to have program defects caused by bad data. You also spend less time writing code to handle error conditions, and you are more likely to write error-handling code up front.

In many cases these constraints can be defined at compile-time, in which case you can write a filter to ensure that the attribute always falls within range, or the attempt to save the attribute fails.

However, in many cases these constraints can change at run-time. For example, you may have a "cars" table that has "colour" as an attribute which initially takes on the values, say, of "red", "green" and "blue". It is possible during the execution of the program to add valid colours to that initial list, and new "cars" added may take on any colour in the up-to-date list of colours. Furthermore, you usually want this updated list of colours to survive a program restart.

To answer your question, it turns out that if you have a requirement for data constraint that can change at run-time, and those changes must survive a program restart, foreign keys are the simplest and most concise solution to the problem. The development cost is the addition of one table (e.g. "colours", a foreign key constraint to the "cars" table, and an index), and the run-time cost is the extra table lookup for the up-to-date colours to validate the data, and this run-time cost is usually mitigated by indexing and caching.

If you don't use foreign keys for these requirements, you must write software to manage the list, look valid entries, save it to disk, structure the data efficiently if the list is large, ensure that any updates to the list don't corrupt the list file, provide serial access to the list in case there are multiple readers and/or writers, and so on. i.e. You need to implement a lot of RDBMS functionality.


I'm sure there are plenty of applications where you can get away with it, but it's not the best idea. You can't always count on your application to properly manage your database, and frankly managing the database should not be of very much concern to your application.

If you are using a relational database then it seems you ought to have some relationships defined in it. Unfortunately this attitude (you don't need foreign keys) seems to be embraced by a lot of application developers who would rather not be bothered with silly things like data integrity (but need to because their companies don't have dedicated database developers). Usually in databases put together by these types you are lucky just to have primary keys ;)


They can make deleting records more cumbersome - you can't delete the "master" record where there are records in other tables where foreign keys would violate that constraint. You can use triggers to have cascading deletes.

If you chose your primary key unwisely, then changing that value becomes even more complex. For example, if I have the PK of my "customers" table as the person's name, and make that key a FK in the "orders" table", if the customer wants to change his name, then it is a royal pain... but that is just shoddy database design.

I believe the advantages in using fireign keys outweighs any supposed disadvantages.


Like many things, it's a tradeoff. It's a question of where you want to do the work to verify the data integrity:

(1) use a foreign key (a single point to configure for a table, feature is already implemented, tested, proven to work)

(2) leave it to the users of the database (possible multiple users/apps updating the same table (s) meaning more potential points of failure and increased complexity in testing).

It's more efficient for the database to do (2), easier to maintain and less risk with (1).


"They can make deleting records more cumbersome - you can't delete the "master" record where there are records in other tables where foreign keys would violate that constraint."

It's important to remember that the SQL standard defines actions that are taken when a foreign key is deleted or updated. The ones I know of are:

  • ON DELETE RESTRICT - Prevents any rows in the other table that have keys in this column from being deleted. This is what Ken Ray described above.
  • ON DELETE CASCADE - If a row in the other table is deleted, delete any rows in this table that reference it.
  • ON DELETE SET DEFAULT - If a row in the other table is deleted, set any foreign keys referencing it to the column's default.
  • ON DELETE SET NULL - If a row in the other table is deleted, set any foreign keys referencing it in this table to null.
  • ON DELETE NO ACTION - This foreign key only marks that it is a foreign key; namely for use in OR mappers.

These same actions also apply to ON UPDATE.

The default seems to depend on which server you're using.


I agree with the previous answers in that they are useful to mantain data consistency. However, there was an interesting post by Jeff Atwood some weeks ago that discussed the pros and cons of normalized and consistent data.

In a few words, a denormalized database can be faster when handling huge amounts of data; and you may not care about precise consistency depending on the application, but it forces you to be much more careful when dealing with data, as the DB won't be.


I also think that foreign keys are a necessity in most databases. The only drawback (besides the performance hit that comes with having enforced consistence) is that having a foreign key allows people to write code that assumes there is a functional foreign key. That should never be allowed.

For example, I've seen people write code that inserts into the referenced table and then attempts inserts into the referencing table without verifying the first insert was successful. If the foreign key is removed at a later time, that results in an inconsistent database.

You also don't have the option of assuming a specific behavior on update or delete. You still need to write your code to do what you want regardless of whether there is a foreign key present. If you assume deletes are cascaded when they are not, your deletes will fail. If you assume updates to the referenced columns are propogated to the referencing rows when they are not, your updates will fail. For the purposes of writing code, you might as well not have those features.

If those features are turned on, then your code will emulate them anyway and you'll lose a little performance.

So, the summary.... Foreign keys are essential if you need a consistent database. Foreign keys should never be assumed to be present or functional in code that you write.


Reasons to use Foreign Keys:

  • you won't get Orphaned Rows
  • you can get nice "on delete cascade" behavior, automatically cleaning up tables
  • knowing about the relationships between tables in the database helps the Optimizer plan your queries for most efficient execution, since it is able to get better estimates on join cardinality.
  • FKs give a pretty big hint on what statistics are most important to collect on the database, which in turn leads to better performance
  • they enable all kinds of auto-generated support -- ORMs can generate themselves, visualization tools will be able to create nice schema layouts for you, etc.
  • someone new to the project will get into the flow of things faster since otherwise implicit relationships are explicitly documented

Reasons not to use Foreign Keys:

  • you are making the DB work extra on every CRUD operation because it has to check FK consistency. This can be a big cost if you have a lot of churn
  • by enforcing relationships, FKs specify an order in which you have to add/delete things, which can lead to refusal by the DB to do what you want. (Granted, in such cases, what you are trying to do is create an Orphaned Row, and that's not usually a good thing). This is especially painful when you are doing large batch updates, and you load up one table before another, with the second table creating consistent state (but should you be doing that sort of thing if there is a possibility that the second load fails and your database is now inconsistent?).
  • sometimes you know beforehand your data is going to be dirty, you accept that, and you want the DB to accept it
  • you are just being lazy :-)

I think (I am not certain!) that most established databases provide a way to specify a foreign key that is not enforced, and is simply a bit of metadata. Since non-enforcement wipes out every reason not to use FKs, you should probably go that route if any of the reasons in the second section apply.


Foreign keys are essential to any relational database model.


I'll echo what Dmitriy said, but adding on a point.

I worked on a batch billing system that needed to insert large sets of rows on 30+ tables. We weren't allowed to do a data pump (Oracle) so we had to do bulk inserts. Those tables had foreign keys on them, but we had already ensured that they were not breaking any relationships.

Before insert, we disable the foreign key constraints so that Oracle doesn't take forever doing the inserts. After the insert is successful, we re-enable the constraints.

PS: In a large database with many foreign keys and child row data for a single record, sometimes foreign keys can be bad, and you may want to disallow cascading deletes. For us in the billing system, it would take too long and be too taxing on the database if we did cascading deletes, so we just mark the record as bad with a field on the main driver (parent) table.


@imphasing - this is exactly the kind of mindset that causes maintenance nightmares.

Why oh why would you ignore declarative referential integrity, where the data can be guaranteed to be at least consistent, in favour of so called "software enforcement" which is a weak preventative measure at best.


I always thought it was lazy not to use them. I was taught it should always be done. But then, I didnt listen to Joel's discussion. He may have had a good reason, I don't know.


This is an issue of upbringing. If somewhere in your educational or professional career you spent time feeding and caring for databases (or worked closely with talented folks who did), then the fundamental tenets of entities and relationships are well-ingrained in your thought process. Among those rudiments is how/when/why to specify keys in your database (primary, foreign and perhaps alternate). It's second nature.

If, however, you've not had such a thorough or positive experience in your past with RDBMS-related endeavors, then you've likely not been exposed to such information. Or perhaps your past includes immersion in an environment that was vociferously anti-database (e.g., "those DBAs are idiots - we few, we chosen few java/c# code slingers will save the day"), in which case you might be vehemently opposed to the arcane babblings of some dweeb telling you that FKs (and the constraints they can imply) really are important if you'd just listen.

Most everyone was taught when they were kids that brushing your teeth was important. Can you get by without it? Sure, but somewhere down the line you'll have less teeth available than you could have if you had brushed after every meal. If moms and dads were responsible enough to cover database design as well as oral hygiene, we wouldn't be having this conversation. :-)


Quite often we receive the errors with FK constraints Cannot add or update a child row: a foreign key constraint fails Suppose there are two tables inventory_source and contract_lines, and we are referring inventory_source_id in contract_lines from inventory_source and suppose we want to delete record from inventory_source and the record is already present in contract_lines or we want to delete the PK column from Base table, we get errors for FK constraints, we can avoid it using the steps jotted below.

CREATE TABLE inventory_source (
inventory_source_id int(11) NOT NULL AUTO_INCREMENT,
display_name varchar(40) NOT NULL,
state_id int(11) NOT NULL,
PRIMARY KEY (inventory_source_id),
KEY state_id (state_id),
CONSTRAINT ba_inventory_source_state_fk FOREIGN KEY (state_id) REFERENCES   ba_state (state_id)
) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=utf8;

CREATE TABLE contract_lines(
contract_line_id int(11) NOT NULL AUTO_INCREMENT,
inventory_source_id int(11) NULL ,
PRIMARY KEY (contract_line_id),
UNIQUE KEY contract_line_id (contract_line_id),
KEY AI_contract_line_id (contract_line_id),
KEY contract_lines_inventory_source_fk (inventory_source_id),
CONSTRAINT contract_lines_inventory_source_fk FOREIGN KEY       (inventory_source_id) REFERENCES ba_inventory_source (inventory_source_id)
) ENGINE=InnoDB AUTO_INCREMENT=135 DEFAULT CHARSET=utf8 ;

We can overcome it using the following steps:-

  1. Delete or update the row from the inventory_source will automatically delete or update the matching rows in the contract_lines table and this is known as cascade delete or update.
  2. Another way of doing it is setting the column i.e inventory_source_id in the contract_lines table to NULL, when a record corresponding to it is deleted in the inventory_source table.
  3. We can restrict the parent table for delete or update in other words one can reject the delete or update operation for the inventory_source table.
  4. Attempt to delete or update a primary key value will not be permitted to proceed if there is a related foreign key value in the referenced table.

I always use them, but then I make databases for financial systems. The database is the critical part of the application. If the data in a financial database isn't totally accurate then it really doesn't matter how much effort you put into your code/front-end design. You're just wasting your time.

There's also the fact that multiple systems generally need to interface directly with the database - from other systems that just read data out (Crystal Reports) to systems that insert data (not necessarily using an API I've designed; it may be written by a dull-witted manager who has just discovered VBScript and has the SA password for the SQL box). If the database isn't as idiot-proof as it can possibly be, well - bye bye database.

If your data is important, then yes, use foreign keys, create a suite of stored procedures to interact with the data, and make the toughest DB you can. If your data isn't important, why are you making a database to begin with?


Reasons to use Foreign Keys:

  • you won't get Orphaned Rows
  • you can get nice "on delete cascade" behavior, automatically cleaning up tables
  • knowing about the relationships between tables in the database helps the Optimizer plan your queries for most efficient execution, since it is able to get better estimates on join cardinality.
  • FKs give a pretty big hint on what statistics are most important to collect on the database, which in turn leads to better performance
  • they enable all kinds of auto-generated support -- ORMs can generate themselves, visualization tools will be able to create nice schema layouts for you, etc.
  • someone new to the project will get into the flow of things faster since otherwise implicit relationships are explicitly documented

Reasons not to use Foreign Keys:

  • you are making the DB work extra on every CRUD operation because it has to check FK consistency. This can be a big cost if you have a lot of churn
  • by enforcing relationships, FKs specify an order in which you have to add/delete things, which can lead to refusal by the DB to do what you want. (Granted, in such cases, what you are trying to do is create an Orphaned Row, and that's not usually a good thing). This is especially painful when you are doing large batch updates, and you load up one table before another, with the second table creating consistent state (but should you be doing that sort of thing if there is a possibility that the second load fails and your database is now inconsistent?).
  • sometimes you know beforehand your data is going to be dirty, you accept that, and you want the DB to accept it
  • you are just being lazy :-)

I think (I am not certain!) that most established databases provide a way to specify a foreign key that is not enforced, and is simply a bit of metadata. Since non-enforcement wipes out every reason not to use FKs, you should probably go that route if any of the reasons in the second section apply.


One time when an FK might cause you a problem is when you have historical data that references the key (in a lookup table) even though you no longer want the key available.
Obviously the solution is to design things better up front, but I am thinking of real world situations here where you don't always have control of the full solution.
For example: perhaps you have a look up table customer_type that lists different types of customers - lets say you need to remove a certain customer type, but (due to business restraints) aren't able to update the client software, and nobody invisaged this situation when developing the software, the fact that it is a foreign key in some other table may prevent you from removing the row even though you know the historical data that references it is irrelevant.
After being burnt with this a few times you probably lean away from db enforcement of relationships.
(I'm not saying this is good - just giving a reason why you may decide to avoid FKs and db contraints in general)


I have to second most of the comments here, Foreign Keys are necessary items to ensure that you have data with integrity. The different options for ON DELETE and ON UPDATE will allow you to get around some of the "down falls" that people mention here regarding their use.

I find that in 99% of all my projects I will have FK's to enforce the integrity of the data, however, there are those rare occasions where I have clients that MUST keep their old data, regardless of how bad it is....but then I spend a lot of time writing code that goes in to only get the valid data anyway, so it becomes pointless.


The Clarify database is an example of a commercial database that has no primary or foreign keys.

http://www.geekinterview.com/question_details/18869

The funny thing is, the technical documentation goes to great lengths to explain how tables are related, what columns to use to join them etc.

In other words, they could have joined the tables with explicit declarations (DRI) but they chose not to.

Consequently, the Clarify database is full of inconsistencies and it underperforms.

But I suppose it made the developers job easier, not having to write code to deal with referential integrity such as checking for related rows before deleting, adding.

And that, I think, is the main benefit of not having foreign key constraints in a relational database. It makes it easier to develop, at least that is from a devil-may-care point of view.


There are no good reasons not to use them... unless orphaned rows aren't a big deal to you I guess.


I'll echo what Dmitriy said, but adding on a point.

I worked on a batch billing system that needed to insert large sets of rows on 30+ tables. We weren't allowed to do a data pump (Oracle) so we had to do bulk inserts. Those tables had foreign keys on them, but we had already ensured that they were not breaking any relationships.

Before insert, we disable the foreign key constraints so that Oracle doesn't take forever doing the inserts. After the insert is successful, we re-enable the constraints.

PS: In a large database with many foreign keys and child row data for a single record, sometimes foreign keys can be bad, and you may want to disallow cascading deletes. For us in the billing system, it would take too long and be too taxing on the database if we did cascading deletes, so we just mark the record as bad with a field on the main driver (parent) table.


I have to second most of the comments here, Foreign Keys are necessary items to ensure that you have data with integrity. The different options for ON DELETE and ON UPDATE will allow you to get around some of the "down falls" that people mention here regarding their use.

I find that in 99% of all my projects I will have FK's to enforce the integrity of the data, however, there are those rare occasions where I have clients that MUST keep their old data, regardless of how bad it is....but then I spend a lot of time writing code that goes in to only get the valid data anyway, so it becomes pointless.


I have heard this argument too - from people who forgot to put an index on their foreign keys and then complained that certain operations were slow (because constraint checking could take advantage of any index). So to sum up: There is no good reason not to use foreign keys. All modern databases support cascaded deletes, so...


Verifying foreign key constraints takes some CPU time, so some folks omit foreign keys to get some extra performance.


To me, if you want to go by the ACID standards, it is critical to have foreign keys to ensure referential integrity.


Wowowo...
Answers everywhere. Actually this is the most complicated topic I have ever encountered. I use FKs when they are needed but on production environment I rarely use them.

Here is my whys I rarely use the Fks:

1. Most of the time I am dealing with huge data on small server to improve performance I need to remove the FKs. Because when you have FKs and you do Create, Update or Delete the RDBMS first check if there no constraint violation and if you have huge DB that could be something fatal

2. Sometimes I need to import data from others places and because I am not too sure of how well structured they are, I simply drop the FKs.

3. In case you are dealing with multiple DBs and having reference key in an other DB will not go well(as for now) until you remove the FKs (cross database relations)
4. They was also a case when you write an application which will seat on whatever RDBMS or you want your DB to be exported and imported in any RDBMS system in this case each specific RDBMS system has his own way of dealing with FKs and you will probably be obliged to drop the use of FKs.

5. If you user RDBMS platform (ORMs) you know that some of them offer their own mapping depending on the solution and technicality their offer and you don't care about creating the tables and their FKs.

6. Before the last point will be knowledge to deal with DB that has FKs and the knowledge to write an application that does all the Job without the need of FK 7. Lastly as I started saying it all depend on your scenario, in case knowledge is not a barrier. You will always want to run the best of the best you can get!


Thank you everybody!


I have to second most of the comments here, Foreign Keys are necessary items to ensure that you have data with integrity. The different options for ON DELETE and ON UPDATE will allow you to get around some of the "down falls" that people mention here regarding their use.

I find that in 99% of all my projects I will have FK's to enforce the integrity of the data, however, there are those rare occasions where I have clients that MUST keep their old data, regardless of how bad it is....but then I spend a lot of time writing code that goes in to only get the valid data anyway, so it becomes pointless.


One good principle of data structure design is to ensure that every attribute of a table or object be subject to a well-understood constraint. This is important because if you or your program can count on valid data in the database, you are less likely to have program defects caused by bad data. You also spend less time writing code to handle error conditions, and you are more likely to write error-handling code up front.

In many cases these constraints can be defined at compile-time, in which case you can write a filter to ensure that the attribute always falls within range, or the attempt to save the attribute fails.

However, in many cases these constraints can change at run-time. For example, you may have a "cars" table that has "colour" as an attribute which initially takes on the values, say, of "red", "green" and "blue". It is possible during the execution of the program to add valid colours to that initial list, and new "cars" added may take on any colour in the up-to-date list of colours. Furthermore, you usually want this updated list of colours to survive a program restart.

To answer your question, it turns out that if you have a requirement for data constraint that can change at run-time, and those changes must survive a program restart, foreign keys are the simplest and most concise solution to the problem. The development cost is the addition of one table (e.g. "colours", a foreign key constraint to the "cars" table, and an index), and the run-time cost is the extra table lookup for the up-to-date colours to validate the data, and this run-time cost is usually mitigated by indexing and caching.

If you don't use foreign keys for these requirements, you must write software to manage the list, look valid entries, save it to disk, structure the data efficiently if the list is large, ensure that any updates to the list don't corrupt the list file, provide serial access to the list in case there are multiple readers and/or writers, and so on. i.e. You need to implement a lot of RDBMS functionality.


"Before adding a record, check that a corresponding record exists in another table" is business logic.

Here are some reasons you don't want this in the database:

  1. If the business rules change, you have to change the database. The database will need to recreate the index in a lot of cases and this is slow on large tables. (Changing rules include: allow guests to post messages or allow users to delete their account despite having posted comments, etc).

  2. Changing the database is not as easy as deploying a software fix by pushing the changes to the production repository. We want to avoid changing the database structure as much as possible. The more business logic there is in the database the more you increase the chances of needing to change the databae (and triggering re-indexing).

  3. TDD. In unit tests you can substitute the database for mocks and test the functionality. If you have any business logic in your database, you are not doing complete tests and would need to either test with the database or replicate the business logic in code for testing purposes, duplicating the logic and increasing the likelyhood of the logic not working in the same way.

  4. Reusing your logic with different data sources. If there is no logic in the database, my application can create objects from records from the database, create them from a web service, a json file or any other source. I just need to swap out the data mapper implementation and can use all my business logic with any source. If there is logic in the database, this isn't possible and you have to implement the logic at the data mapper layer or in the business logic. Either way, you need those checks in your code. If there's no logic in the database I can deploy the application in different locations using different database or flat-file implementations.


I can see a few reasons to use foreign keys (Orphaned rows, as someone mentioned, are annoying) but I never use them either. With a relatively sane DB schema, I don't think they are 100% needed. Constraints are good, but enforcing them via software is a better method, I think.

Alex


I'm sure there are plenty of applications where you can get away with it, but it's not the best idea. You can't always count on your application to properly manage your database, and frankly managing the database should not be of very much concern to your application.

If you are using a relational database then it seems you ought to have some relationships defined in it. Unfortunately this attitude (you don't need foreign keys) seems to be embraced by a lot of application developers who would rather not be bothered with silly things like data integrity (but need to because their companies don't have dedicated database developers). Usually in databases put together by these types you are lucky just to have primary keys ;)


This is an issue of upbringing. If somewhere in your educational or professional career you spent time feeding and caring for databases (or worked closely with talented folks who did), then the fundamental tenets of entities and relationships are well-ingrained in your thought process. Among those rudiments is how/when/why to specify keys in your database (primary, foreign and perhaps alternate). It's second nature.

If, however, you've not had such a thorough or positive experience in your past with RDBMS-related endeavors, then you've likely not been exposed to such information. Or perhaps your past includes immersion in an environment that was vociferously anti-database (e.g., "those DBAs are idiots - we few, we chosen few java/c# code slingers will save the day"), in which case you might be vehemently opposed to the arcane babblings of some dweeb telling you that FKs (and the constraints they can imply) really are important if you'd just listen.

Most everyone was taught when they were kids that brushing your teeth was important. Can you get by without it? Sure, but somewhere down the line you'll have less teeth available than you could have if you had brushed after every meal. If moms and dads were responsible enough to cover database design as well as oral hygiene, we wouldn't be having this conversation. :-)


Reasons to use Foreign Keys:

  • you won't get Orphaned Rows
  • you can get nice "on delete cascade" behavior, automatically cleaning up tables
  • knowing about the relationships between tables in the database helps the Optimizer plan your queries for most efficient execution, since it is able to get better estimates on join cardinality.
  • FKs give a pretty big hint on what statistics are most important to collect on the database, which in turn leads to better performance
  • they enable all kinds of auto-generated support -- ORMs can generate themselves, visualization tools will be able to create nice schema layouts for you, etc.
  • someone new to the project will get into the flow of things faster since otherwise implicit relationships are explicitly documented

Reasons not to use Foreign Keys:

  • you are making the DB work extra on every CRUD operation because it has to check FK consistency. This can be a big cost if you have a lot of churn
  • by enforcing relationships, FKs specify an order in which you have to add/delete things, which can lead to refusal by the DB to do what you want. (Granted, in such cases, what you are trying to do is create an Orphaned Row, and that's not usually a good thing). This is especially painful when you are doing large batch updates, and you load up one table before another, with the second table creating consistent state (but should you be doing that sort of thing if there is a possibility that the second load fails and your database is now inconsistent?).
  • sometimes you know beforehand your data is going to be dirty, you accept that, and you want the DB to accept it
  • you are just being lazy :-)

I think (I am not certain!) that most established databases provide a way to specify a foreign key that is not enforced, and is simply a bit of metadata. Since non-enforcement wipes out every reason not to use FKs, you should probably go that route if any of the reasons in the second section apply.


To me, if you want to go by the ACID standards, it is critical to have foreign keys to ensure referential integrity.


Verifying foreign key constraints takes some CPU time, so some folks omit foreign keys to get some extra performance.


Foreign keys are essential to any relational database model.


Bigger question is: would you drive with a blindfold on? That’s how it is if you develop a system without referential constraints. Keep in mind, that business requirements change, application design changes, respective logical assumptions in the code changes, logic itself can be refactored, and so on. In general, constraints in databases are put in place under contemporary logical assumptions, seemingly correct for particular set of logical assertions and assumptions.

Through the lifecycle of an application, referential and data checks constraints police data collection via the application, especially when new requirements drive logical application changes.

To the subject of this listing - a foreign key does not by itself "improve performance", nor does it "degrade performance" significantly from a standpoint of real-time transaction processing system. However, there is an aggregated cost for constraint checking in HIGH volume "batch" system. So, here is the difference, real-time vs. batch transaction process; batch processing - where aggreated cost, incured by constraint checks, of a sequentially processed batch poses a performance hit.

In a well designed system, data consistency checks would be done "before" processing a batch through (nevertheless, there is a cost associated here also); therefore, foreign key constraint checks are not required during load time. In fact all constraints, including foreign key, should be temporarily disabled till the batch is processed.

QUERY PERFORMANCE - if tables are joined on foreign keys, be cognizant of the fact that foreign key columns are NOT INDEXED (though the respective primary key is indexed by definition). By indexing a foreign key, for that matter, by indexing any key, and joining tables on indexed helps with better performances, not by joining on non-indexed key with foreign key constraint on it.

Changing subjects, if a database is just supporting website display/rendering content/etc and recording clicks, then a database with full constraints on all tables is over kill for such purposes. Think about it. Most websites don’t even use a database for such. For similar requirements, where data is just being recorded and not referenced per say, use an in-memory database, which does not have constraints. This doesn’t mean that there is no data model, yes logical model, but no physical data model.


Reasons to use Foreign Keys:

  • you won't get Orphaned Rows
  • you can get nice "on delete cascade" behavior, automatically cleaning up tables
  • knowing about the relationships between tables in the database helps the Optimizer plan your queries for most efficient execution, since it is able to get better estimates on join cardinality.
  • FKs give a pretty big hint on what statistics are most important to collect on the database, which in turn leads to better performance
  • they enable all kinds of auto-generated support -- ORMs can generate themselves, visualization tools will be able to create nice schema layouts for you, etc.
  • someone new to the project will get into the flow of things faster since otherwise implicit relationships are explicitly documented

Reasons not to use Foreign Keys:

  • you are making the DB work extra on every CRUD operation because it has to check FK consistency. This can be a big cost if you have a lot of churn
  • by enforcing relationships, FKs specify an order in which you have to add/delete things, which can lead to refusal by the DB to do what you want. (Granted, in such cases, what you are trying to do is create an Orphaned Row, and that's not usually a good thing). This is especially painful when you are doing large batch updates, and you load up one table before another, with the second table creating consistent state (but should you be doing that sort of thing if there is a possibility that the second load fails and your database is now inconsistent?).
  • sometimes you know beforehand your data is going to be dirty, you accept that, and you want the DB to accept it
  • you are just being lazy :-)

I think (I am not certain!) that most established databases provide a way to specify a foreign key that is not enforced, and is simply a bit of metadata. Since non-enforcement wipes out every reason not to use FKs, you should probably go that route if any of the reasons in the second section apply.


Wowowo...
Answers everywhere. Actually this is the most complicated topic I have ever encountered. I use FKs when they are needed but on production environment I rarely use them.

Here is my whys I rarely use the Fks:

1. Most of the time I am dealing with huge data on small server to improve performance I need to remove the FKs. Because when you have FKs and you do Create, Update or Delete the RDBMS first check if there no constraint violation and if you have huge DB that could be something fatal

2. Sometimes I need to import data from others places and because I am not too sure of how well structured they are, I simply drop the FKs.

3. In case you are dealing with multiple DBs and having reference key in an other DB will not go well(as for now) until you remove the FKs (cross database relations)
4. They was also a case when you write an application which will seat on whatever RDBMS or you want your DB to be exported and imported in any RDBMS system in this case each specific RDBMS system has his own way of dealing with FKs and you will probably be obliged to drop the use of FKs.

5. If you user RDBMS platform (ORMs) you know that some of them offer their own mapping depending on the solution and technicality their offer and you don't care about creating the tables and their FKs.

6. Before the last point will be knowledge to deal with DB that has FKs and the knowledge to write an application that does all the Job without the need of FK 7. Lastly as I started saying it all depend on your scenario, in case knowledge is not a barrier. You will always want to run the best of the best you can get!


Thank you everybody!


The Clarify database is an example of a commercial database that has no primary or foreign keys.

http://www.geekinterview.com/question_details/18869

The funny thing is, the technical documentation goes to great lengths to explain how tables are related, what columns to use to join them etc.

In other words, they could have joined the tables with explicit declarations (DRI) but they chose not to.

Consequently, the Clarify database is full of inconsistencies and it underperforms.

But I suppose it made the developers job easier, not having to write code to deal with referential integrity such as checking for related rows before deleting, adding.

And that, I think, is the main benefit of not having foreign key constraints in a relational database. It makes it easier to develop, at least that is from a devil-may-care point of view.


How about maintainability and constancy across application life cycles? Most data has a longer lifespan than the applications that make use of it. Relationships and data integrity are much too important to leave to the hope that the next dev team gets it right in the app code. If you haven't worked on a db with dirty data that doesn't respect the natural relationships, you will. The importance of data integrity will then become very clear.


The argument I have heard is that the front-end should have these business rules. Foreign keys "add unnecessary overhead" when you shouldn't be allowing any insertions that break your constraints in the first place. Do I agree with this? No, but that is what I have always heard.

EDIT: My guess is he was referring to foreign key constraints, not foreign keys as a concept.


Reasons to use Foreign Keys:

  • you won't get Orphaned Rows
  • you can get nice "on delete cascade" behavior, automatically cleaning up tables
  • knowing about the relationships between tables in the database helps the Optimizer plan your queries for most efficient execution, since it is able to get better estimates on join cardinality.
  • FKs give a pretty big hint on what statistics are most important to collect on the database, which in turn leads to better performance
  • they enable all kinds of auto-generated support -- ORMs can generate themselves, visualization tools will be able to create nice schema layouts for you, etc.
  • someone new to the project will get into the flow of things faster since otherwise implicit relationships are explicitly documented

Reasons not to use Foreign Keys:

  • you are making the DB work extra on every CRUD operation because it has to check FK consistency. This can be a big cost if you have a lot of churn
  • by enforcing relationships, FKs specify an order in which you have to add/delete things, which can lead to refusal by the DB to do what you want. (Granted, in such cases, what you are trying to do is create an Orphaned Row, and that's not usually a good thing). This is especially painful when you are doing large batch updates, and you load up one table before another, with the second table creating consistent state (but should you be doing that sort of thing if there is a possibility that the second load fails and your database is now inconsistent?).
  • sometimes you know beforehand your data is going to be dirty, you accept that, and you want the DB to accept it
  • you are just being lazy :-)

I think (I am not certain!) that most established databases provide a way to specify a foreign key that is not enforced, and is simply a bit of metadata. Since non-enforcement wipes out every reason not to use FKs, you should probably go that route if any of the reasons in the second section apply.


Additional Reason to use Foreign Keys: - Allows greater reuse of a database

Additional Reason to NOT use Foreign Keys: - You are trying to lock-in a customer into your tool by reducing reuse.


I know only Oracle databases, no other ones, and I can tell that Foreign Keys are essential for maintaining data integrity. Prior to inserting data, a data structure needs to be made, and be made correctlty. When that is done - and thus all primary AND foreign keys are created - the work is done !

Meaning : orphaned rows ? No. Never seen that in my life. Unless a bad programmer forgot the foreign key, or if he implemented that on another level. Both are - in context of Oracle - huge mistakes, which will lead to data duplication, orphan data, and thus : data corruption. I can't imagine a database without FK enforced. It looks like chaos to me. It's a bit like the Unix permission system : imagine that everybody is root. Think of the chaos.

Foreign Keys are essential, just like Primary Keys. It's like saying : what if we removing Primary Keys ? Well, total chaos is going to happen. That's what. You may not move the primary or foreign key responsibility to the programming level, it must be at the data level.

Drawbacks ? Yes, absolutely ! Because on insert, a lot more checks are going to be happening. But, if data integrity is more important than performance, it's a no-brainer. The problem with performance on Oracle is more related to indexes, which come with PK and FK's.


I have heard this argument too - from people who forgot to put an index on their foreign keys and then complained that certain operations were slow (because constraint checking could take advantage of any index). So to sum up: There is no good reason not to use foreign keys. All modern databases support cascaded deletes, so...


Foreign keys are essential to any relational database model.


I'll echo what Dmitriy said, but adding on a point.

I worked on a batch billing system that needed to insert large sets of rows on 30+ tables. We weren't allowed to do a data pump (Oracle) so we had to do bulk inserts. Those tables had foreign keys on them, but we had already ensured that they were not breaking any relationships.

Before insert, we disable the foreign key constraints so that Oracle doesn't take forever doing the inserts. After the insert is successful, we re-enable the constraints.

PS: In a large database with many foreign keys and child row data for a single record, sometimes foreign keys can be bad, and you may want to disallow cascading deletes. For us in the billing system, it would take too long and be too taxing on the database if we did cascading deletes, so we just mark the record as bad with a field on the main driver (parent) table.


Many of the people answering here get too hung up on the importance of referential integrity implemented via referential constraints. Working on large databases with referential integrity just does not perform well. Oracle seems particularly bad at cascading deletes. My rule of thumb is that applications should never update the database directly and should be via a stored procedure. This keeps the code base inside the database, and means that the database maintains its integrity.

Where many applications may be accessing the database, problems do arise because of referential integrity constraints but this is down to a control.

There is a wider issue too in that, application developers may have very different requirements that database developers may not necessarily be that familiar with.


I always use them, but then I make databases for financial systems. The database is the critical part of the application. If the data in a financial database isn't totally accurate then it really doesn't matter how much effort you put into your code/front-end design. You're just wasting your time.

There's also the fact that multiple systems generally need to interface directly with the database - from other systems that just read data out (Crystal Reports) to systems that insert data (not necessarily using an API I've designed; it may be written by a dull-witted manager who has just discovered VBScript and has the SA password for the SQL box). If the database isn't as idiot-proof as it can possibly be, well - bye bye database.

If your data is important, then yes, use foreign keys, create a suite of stored procedures to interact with the data, and make the toughest DB you can. If your data isn't important, why are you making a database to begin with?


How about maintainability and constancy across application life cycles? Most data has a longer lifespan than the applications that make use of it. Relationships and data integrity are much too important to leave to the hope that the next dev team gets it right in the app code. If you haven't worked on a db with dirty data that doesn't respect the natural relationships, you will. The importance of data integrity will then become very clear.


I always use them, but then I make databases for financial systems. The database is the critical part of the application. If the data in a financial database isn't totally accurate then it really doesn't matter how much effort you put into your code/front-end design. You're just wasting your time.

There's also the fact that multiple systems generally need to interface directly with the database - from other systems that just read data out (Crystal Reports) to systems that insert data (not necessarily using an API I've designed; it may be written by a dull-witted manager who has just discovered VBScript and has the SA password for the SQL box). If the database isn't as idiot-proof as it can possibly be, well - bye bye database.

If your data is important, then yes, use foreign keys, create a suite of stored procedures to interact with the data, and make the toughest DB you can. If your data isn't important, why are you making a database to begin with?


I have to second most of the comments here, Foreign Keys are necessary items to ensure that you have data with integrity. The different options for ON DELETE and ON UPDATE will allow you to get around some of the "down falls" that people mention here regarding their use.

I find that in 99% of all my projects I will have FK's to enforce the integrity of the data, however, there are those rare occasions where I have clients that MUST keep their old data, regardless of how bad it is....but then I spend a lot of time writing code that goes in to only get the valid data anyway, so it becomes pointless.


I always use them, but then I make databases for financial systems. The database is the critical part of the application. If the data in a financial database isn't totally accurate then it really doesn't matter how much effort you put into your code/front-end design. You're just wasting your time.

There's also the fact that multiple systems generally need to interface directly with the database - from other systems that just read data out (Crystal Reports) to systems that insert data (not necessarily using an API I've designed; it may be written by a dull-witted manager who has just discovered VBScript and has the SA password for the SQL box). If the database isn't as idiot-proof as it can possibly be, well - bye bye database.

If your data is important, then yes, use foreign keys, create a suite of stored procedures to interact with the data, and make the toughest DB you can. If your data isn't important, why are you making a database to begin with?


How about maintainability and constancy across application life cycles? Most data has a longer lifespan than the applications that make use of it. Relationships and data integrity are much too important to leave to the hope that the next dev team gets it right in the app code. If you haven't worked on a db with dirty data that doesn't respect the natural relationships, you will. The importance of data integrity will then become very clear.


I always use them, but then I make databases for financial systems. The database is the critical part of the application. If the data in a financial database isn't totally accurate then it really doesn't matter how much effort you put into your code/front-end design. You're just wasting your time.

There's also the fact that multiple systems generally need to interface directly with the database - from other systems that just read data out (Crystal Reports) to systems that insert data (not necessarily using an API I've designed; it may be written by a dull-witted manager who has just discovered VBScript and has the SA password for the SQL box). If the database isn't as idiot-proof as it can possibly be, well - bye bye database.

If your data is important, then yes, use foreign keys, create a suite of stored procedures to interact with the data, and make the toughest DB you can. If your data isn't important, why are you making a database to begin with?


In a project I worked on there was often implicit rather than explicit relationships so that numerous tables could be joined on the same column.

Take the following table

Address

  • AddressId (PK)
  • EntityId
  • EntityType
  • City
  • State
  • Country
  • Etc..

Possible values of EntityType may be Employee, Company, Customer, and the EntityId refers to the primarky key of whichever table you were interested in.

I don't really think this is the best way to do things, but it worked for this project.


Additional Reason to use Foreign Keys: - Allows greater reuse of a database

Additional Reason to NOT use Foreign Keys: - You are trying to lock-in a customer into your tool by reducing reuse.


If you are absolutey sure, that the one underlying database system will not change in the future, I would use foreign keys to ensure data integrity.

But here is another very good real-life reason not to use foreign keys at all:

You are developing a product, which should support different database systems.

If you are working with the Entity Framework, which is able to connect to many different database systems, you may also want to support "open-source-free-of-charge" serverless databases. Not all of these databases may support your foreign key rules (updating, deleting rows...).

This can lead to different problems:

1.) You may run into errors, when the database structure is created or updated. Maybe there will only be silent errors, because your foreign keys are just ignored by the database system.

2.) If you rely on foreign keys, you will propably make less or even no data integrity checks in your business logic. Now, if the new database system does not support these foreign key rules or just behaves in a different way, you have to rewrite your business logic.

You may ask: Who needs different database systems? Well, not everybody can afford or wants a full blown SQL-Server on his machine. This is software, which needs to be maintained. Others already have invested time and money in some other DB system. Serverless database are great for small customers on only one machine.

Nobody knows, how all of these DB systems behave, but your business logic, with integrity checks, always stays the same.


The Clarify database is an example of a commercial database that has no primary or foreign keys.

http://www.geekinterview.com/question_details/18869

The funny thing is, the technical documentation goes to great lengths to explain how tables are related, what columns to use to join them etc.

In other words, they could have joined the tables with explicit declarations (DRI) but they chose not to.

Consequently, the Clarify database is full of inconsistencies and it underperforms.

But I suppose it made the developers job easier, not having to write code to deal with referential integrity such as checking for related rows before deleting, adding.

And that, I think, is the main benefit of not having foreign key constraints in a relational database. It makes it easier to develop, at least that is from a devil-may-care point of view.


"Before adding a record, check that a corresponding record exists in another table" is business logic.

Here are some reasons you don't want this in the database:

  1. If the business rules change, you have to change the database. The database will need to recreate the index in a lot of cases and this is slow on large tables. (Changing rules include: allow guests to post messages or allow users to delete their account despite having posted comments, etc).

  2. Changing the database is not as easy as deploying a software fix by pushing the changes to the production repository. We want to avoid changing the database structure as much as possible. The more business logic there is in the database the more you increase the chances of needing to change the databae (and triggering re-indexing).

  3. TDD. In unit tests you can substitute the database for mocks and test the functionality. If you have any business logic in your database, you are not doing complete tests and would need to either test with the database or replicate the business logic in code for testing purposes, duplicating the logic and increasing the likelyhood of the logic not working in the same way.

  4. Reusing your logic with different data sources. If there is no logic in the database, my application can create objects from records from the database, create them from a web service, a json file or any other source. I just need to swap out the data mapper implementation and can use all my business logic with any source. If there is logic in the database, this isn't possible and you have to implement the logic at the data mapper layer or in the business logic. Either way, you need those checks in your code. If there's no logic in the database I can deploy the application in different locations using different database or flat-file implementations.


"They can make deleting records more cumbersome - you can't delete the "master" record where there are records in other tables where foreign keys would violate that constraint."

It's important to remember that the SQL standard defines actions that are taken when a foreign key is deleted or updated. The ones I know of are:

  • ON DELETE RESTRICT - Prevents any rows in the other table that have keys in this column from being deleted. This is what Ken Ray described above.
  • ON DELETE CASCADE - If a row in the other table is deleted, delete any rows in this table that reference it.
  • ON DELETE SET DEFAULT - If a row in the other table is deleted, set any foreign keys referencing it to the column's default.
  • ON DELETE SET NULL - If a row in the other table is deleted, set any foreign keys referencing it in this table to null.
  • ON DELETE NO ACTION - This foreign key only marks that it is a foreign key; namely for use in OR mappers.

These same actions also apply to ON UPDATE.

The default seems to depend on which server you're using.


I'm sure there are plenty of applications where you can get away with it, but it's not the best idea. You can't always count on your application to properly manage your database, and frankly managing the database should not be of very much concern to your application.

If you are using a relational database then it seems you ought to have some relationships defined in it. Unfortunately this attitude (you don't need foreign keys) seems to be embraced by a lot of application developers who would rather not be bothered with silly things like data integrity (but need to because their companies don't have dedicated database developers). Usually in databases put together by these types you are lucky just to have primary keys ;)


Quite often we receive the errors with FK constraints Cannot add or update a child row: a foreign key constraint fails Suppose there are two tables inventory_source and contract_lines, and we are referring inventory_source_id in contract_lines from inventory_source and suppose we want to delete record from inventory_source and the record is already present in contract_lines or we want to delete the PK column from Base table, we get errors for FK constraints, we can avoid it using the steps jotted below.

CREATE TABLE inventory_source (
inventory_source_id int(11) NOT NULL AUTO_INCREMENT,
display_name varchar(40) NOT NULL,
state_id int(11) NOT NULL,
PRIMARY KEY (inventory_source_id),
KEY state_id (state_id),
CONSTRAINT ba_inventory_source_state_fk FOREIGN KEY (state_id) REFERENCES   ba_state (state_id)
) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=utf8;

CREATE TABLE contract_lines(
contract_line_id int(11) NOT NULL AUTO_INCREMENT,
inventory_source_id int(11) NULL ,
PRIMARY KEY (contract_line_id),
UNIQUE KEY contract_line_id (contract_line_id),
KEY AI_contract_line_id (contract_line_id),
KEY contract_lines_inventory_source_fk (inventory_source_id),
CONSTRAINT contract_lines_inventory_source_fk FOREIGN KEY       (inventory_source_id) REFERENCES ba_inventory_source (inventory_source_id)
) ENGINE=InnoDB AUTO_INCREMENT=135 DEFAULT CHARSET=utf8 ;

We can overcome it using the following steps:-

  1. Delete or update the row from the inventory_source will automatically delete or update the matching rows in the contract_lines table and this is known as cascade delete or update.
  2. Another way of doing it is setting the column i.e inventory_source_id in the contract_lines table to NULL, when a record corresponding to it is deleted in the inventory_source table.
  3. We can restrict the parent table for delete or update in other words one can reject the delete or update operation for the inventory_source table.
  4. Attempt to delete or update a primary key value will not be permitted to proceed if there is a related foreign key value in the referenced table.

In DB2, if MQTs (Materialized Query Tables) are used, foreign key constraints are required for the optimizer to choose the right plan for any given query. Since they contain the cardinality information, the optimizer uses the metadata heavily to use a MQT or not.


From my experience its always better to avoid using FKs in Database Critical Applications. I would not disagree with guys here who say FKs is a good practice but its not practical where the database is huge and has huge CRUD operations/sec. I can share without naming ... one of the biggest investment bank of doesn't have a single FK in databases. These constrains are handled by programmers while creating applications involving DB. The basic reason is when ever a new CRUD is done it has to effect multiple tables and verify for each inserts/updates, though this won't be a big issue for queries affecting single rows but it does create a huge latency when you deal with batch processing which any big bank has to do as daily tasks.

Its better to avoid FKs but its risk has to be handled by programmers.


"They can make deleting records more cumbersome - you can't delete the "master" record where there are records in other tables where foreign keys would violate that constraint."

It's important to remember that the SQL standard defines actions that are taken when a foreign key is deleted or updated. The ones I know of are:

  • ON DELETE RESTRICT - Prevents any rows in the other table that have keys in this column from being deleted. This is what Ken Ray described above.
  • ON DELETE CASCADE - If a row in the other table is deleted, delete any rows in this table that reference it.
  • ON DELETE SET DEFAULT - If a row in the other table is deleted, set any foreign keys referencing it to the column's default.
  • ON DELETE SET NULL - If a row in the other table is deleted, set any foreign keys referencing it in this table to null.
  • ON DELETE NO ACTION - This foreign key only marks that it is a foreign key; namely for use in OR mappers.

These same actions also apply to ON UPDATE.

The default seems to depend on which server you're using.


The Clarify database is an example of a commercial database that has no primary or foreign keys.

http://www.geekinterview.com/question_details/18869

The funny thing is, the technical documentation goes to great lengths to explain how tables are related, what columns to use to join them etc.

In other words, they could have joined the tables with explicit declarations (DRI) but they chose not to.

Consequently, the Clarify database is full of inconsistencies and it underperforms.

But I suppose it made the developers job easier, not having to write code to deal with referential integrity such as checking for related rows before deleting, adding.

And that, I think, is the main benefit of not having foreign key constraints in a relational database. It makes it easier to develop, at least that is from a devil-may-care point of view.


@imphasing - this is exactly the kind of mindset that causes maintenance nightmares.

Why oh why would you ignore declarative referential integrity, where the data can be guaranteed to be at least consistent, in favour of so called "software enforcement" which is a weak preventative measure at best.


How about maintainability and constancy across application life cycles? Most data has a longer lifespan than the applications that make use of it. Relationships and data integrity are much too important to leave to the hope that the next dev team gets it right in the app code. If you haven't worked on a db with dirty data that doesn't respect the natural relationships, you will. The importance of data integrity will then become very clear.


Foreign keys are essential to any relational database model.


Verifying foreign key constraints takes some CPU time, so some folks omit foreign keys to get some extra performance.


There are no good reasons not to use them... unless orphaned rows aren't a big deal to you I guess.


I always thought it was lazy not to use them. I was taught it should always be done. But then, I didnt listen to Joel's discussion. He may have had a good reason, I don't know.


From my experience its always better to avoid using FKs in Database Critical Applications. I would not disagree with guys here who say FKs is a good practice but its not practical where the database is huge and has huge CRUD operations/sec. I can share without naming ... one of the biggest investment bank of doesn't have a single FK in databases. These constrains are handled by programmers while creating applications involving DB. The basic reason is when ever a new CRUD is done it has to effect multiple tables and verify for each inserts/updates, though this won't be a big issue for queries affecting single rows but it does create a huge latency when you deal with batch processing which any big bank has to do as daily tasks.

Its better to avoid FKs but its risk has to be handled by programmers.


The argument I have heard is that the front-end should have these business rules. Foreign keys "add unnecessary overhead" when you shouldn't be allowing any insertions that break your constraints in the first place. Do I agree with this? No, but that is what I have always heard.

EDIT: My guess is he was referring to foreign key constraints, not foreign keys as a concept.


The Clarify database is an example of a commercial database that has no primary or foreign keys.

http://www.geekinterview.com/question_details/18869

The funny thing is, the technical documentation goes to great lengths to explain how tables are related, what columns to use to join them etc.

In other words, they could have joined the tables with explicit declarations (DRI) but they chose not to.

Consequently, the Clarify database is full of inconsistencies and it underperforms.

But I suppose it made the developers job easier, not having to write code to deal with referential integrity such as checking for related rows before deleting, adding.

And that, I think, is the main benefit of not having foreign key constraints in a relational database. It makes it easier to develop, at least that is from a devil-may-care point of view.


I agree with the previous answers in that they are useful to mantain data consistency. However, there was an interesting post by Jeff Atwood some weeks ago that discussed the pros and cons of normalized and consistent data.

In a few words, a denormalized database can be faster when handling huge amounts of data; and you may not care about precise consistency depending on the application, but it forces you to be much more careful when dealing with data, as the DB won't be.


I know only Oracle databases, no other ones, and I can tell that Foreign Keys are essential for maintaining data integrity. Prior to inserting data, a data structure needs to be made, and be made correctlty. When that is done - and thus all primary AND foreign keys are created - the work is done !

Meaning : orphaned rows ? No. Never seen that in my life. Unless a bad programmer forgot the foreign key, or if he implemented that on another level. Both are - in context of Oracle - huge mistakes, which will lead to data duplication, orphan data, and thus : data corruption. I can't imagine a database without FK enforced. It looks like chaos to me. It's a bit like the Unix permission system : imagine that everybody is root. Think of the chaos.

Foreign Keys are essential, just like Primary Keys. It's like saying : what if we removing Primary Keys ? Well, total chaos is going to happen. That's what. You may not move the primary or foreign key responsibility to the programming level, it must be at the data level.

Drawbacks ? Yes, absolutely ! Because on insert, a lot more checks are going to be happening. But, if data integrity is more important than performance, it's a no-brainer. The problem with performance on Oracle is more related to indexes, which come with PK and FK's.


One time when an FK might cause you a problem is when you have historical data that references the key (in a lookup table) even though you no longer want the key available.
Obviously the solution is to design things better up front, but I am thinking of real world situations here where you don't always have control of the full solution.
For example: perhaps you have a look up table customer_type that lists different types of customers - lets say you need to remove a certain customer type, but (due to business restraints) aren't able to update the client software, and nobody invisaged this situation when developing the software, the fact that it is a foreign key in some other table may prevent you from removing the row even though you know the historical data that references it is irrelevant.
After being burnt with this a few times you probably lean away from db enforcement of relationships.
(I'm not saying this is good - just giving a reason why you may decide to avoid FKs and db contraints in general)


I always thought it was lazy not to use them. I was taught it should always be done. But then, I didnt listen to Joel's discussion. He may have had a good reason, I don't know.


I have heard this argument too - from people who forgot to put an index on their foreign keys and then complained that certain operations were slow (because constraint checking could take advantage of any index). So to sum up: There is no good reason not to use foreign keys. All modern databases support cascaded deletes, so...


There are no good reasons not to use them... unless orphaned rows aren't a big deal to you I guess.


I'll echo what Dmitriy said, but adding on a point.

I worked on a batch billing system that needed to insert large sets of rows on 30+ tables. We weren't allowed to do a data pump (Oracle) so we had to do bulk inserts. Those tables had foreign keys on them, but we had already ensured that they were not breaking any relationships.

Before insert, we disable the foreign key constraints so that Oracle doesn't take forever doing the inserts. After the insert is successful, we re-enable the constraints.

PS: In a large database with many foreign keys and child row data for a single record, sometimes foreign keys can be bad, and you may want to disallow cascading deletes. For us in the billing system, it would take too long and be too taxing on the database if we did cascading deletes, so we just mark the record as bad with a field on the main driver (parent) table.


Wowowo...
Answers everywhere. Actually this is the most complicated topic I have ever encountered. I use FKs when they are needed but on production environment I rarely use them.

Here is my whys I rarely use the Fks:

1. Most of the time I am dealing with huge data on small server to improve performance I need to remove the FKs. Because when you have FKs and you do Create, Update or Delete the RDBMS first check if there no constraint violation and if you have huge DB that could be something fatal

2. Sometimes I need to import data from others places and because I am not too sure of how well structured they are, I simply drop the FKs.

3. In case you are dealing with multiple DBs and having reference key in an other DB will not go well(as for now) until you remove the FKs (cross database relations)
4. They was also a case when you write an application which will seat on whatever RDBMS or you want your DB to be exported and imported in any RDBMS system in this case each specific RDBMS system has his own way of dealing with FKs and you will probably be obliged to drop the use of FKs.

5. If you user RDBMS platform (ORMs) you know that some of them offer their own mapping depending on the solution and technicality their offer and you don't care about creating the tables and their FKs.

6. Before the last point will be knowledge to deal with DB that has FKs and the knowledge to write an application that does all the Job without the need of FK 7. Lastly as I started saying it all depend on your scenario, in case knowledge is not a barrier. You will always want to run the best of the best you can get!


Thank you everybody!


If you are absolutey sure, that the one underlying database system will not change in the future, I would use foreign keys to ensure data integrity.

But here is another very good real-life reason not to use foreign keys at all:

You are developing a product, which should support different database systems.

If you are working with the Entity Framework, which is able to connect to many different database systems, you may also want to support "open-source-free-of-charge" serverless databases. Not all of these databases may support your foreign key rules (updating, deleting rows...).

This can lead to different problems:

1.) You may run into errors, when the database structure is created or updated. Maybe there will only be silent errors, because your foreign keys are just ignored by the database system.

2.) If you rely on foreign keys, you will propably make less or even no data integrity checks in your business logic. Now, if the new database system does not support these foreign key rules or just behaves in a different way, you have to rewrite your business logic.

You may ask: Who needs different database systems? Well, not everybody can afford or wants a full blown SQL-Server on his machine. This is software, which needs to be maintained. Others already have invested time and money in some other DB system. Serverless database are great for small customers on only one machine.

Nobody knows, how all of these DB systems behave, but your business logic, with integrity checks, always stays the same.


I always use them, but then I make databases for financial systems. The database is the critical part of the application. If the data in a financial database isn't totally accurate then it really doesn't matter how much effort you put into your code/front-end design. You're just wasting your time.

There's also the fact that multiple systems generally need to interface directly with the database - from other systems that just read data out (Crystal Reports) to systems that insert data (not necessarily using an API I've designed; it may be written by a dull-witted manager who has just discovered VBScript and has the SA password for the SQL box). If the database isn't as idiot-proof as it can possibly be, well - bye bye database.

If your data is important, then yes, use foreign keys, create a suite of stored procedures to interact with the data, and make the toughest DB you can. If your data isn't important, why are you making a database to begin with?


In a project I worked on there was often implicit rather than explicit relationships so that numerous tables could be joined on the same column.

Take the following table

Address

  • AddressId (PK)
  • EntityId
  • EntityType
  • City
  • State
  • Country
  • Etc..

Possible values of EntityType may be Employee, Company, Customer, and the EntityId refers to the primarky key of whichever table you were interested in.

I don't really think this is the best way to do things, but it worked for this project.


I always thought it was lazy not to use them. I was taught it should always be done. But then, I didnt listen to Joel's discussion. He may have had a good reason, I don't know.


I echo the answer by Dmitriy - very well put.

For those who are worried about the performance overhead FK's often bring, there's a way (in Oracle) you can get the query optimiser advantage of the FK constraint without the cost overhead of constraint validation during insert, delete or update. That is to create the FK constraint with the attributes RELY DISABLE NOVALIDATE. This means the query optimiser ASSUMES that the constraint has been enforced when building queries, without the database actually enforcing the constraint. You have to be very careful here to take the responsibility when you populate a table with an FK constraint like this to make absolutely sure you don't have data in your FK column(s) that violate the constraint, as if you do so you could get unreliable results from queries that involve the table this FK constraint is on.

I usually use this strategy on some tables in my data mart schema, but not in my integrated staging schema. I make sure the tables I am copying data from already have the same constraint enforced, or the ETL routine enforces the constraint.


This is an issue of upbringing. If somewhere in your educational or professional career you spent time feeding and caring for databases (or worked closely with talented folks who did), then the fundamental tenets of entities and relationships are well-ingrained in your thought process. Among those rudiments is how/when/why to specify keys in your database (primary, foreign and perhaps alternate). It's second nature.

If, however, you've not had such a thorough or positive experience in your past with RDBMS-related endeavors, then you've likely not been exposed to such information. Or perhaps your past includes immersion in an environment that was vociferously anti-database (e.g., "those DBAs are idiots - we few, we chosen few java/c# code slingers will save the day"), in which case you might be vehemently opposed to the arcane babblings of some dweeb telling you that FKs (and the constraints they can imply) really are important if you'd just listen.

Most everyone was taught when they were kids that brushing your teeth was important. Can you get by without it? Sure, but somewhere down the line you'll have less teeth available than you could have if you had brushed after every meal. If moms and dads were responsible enough to cover database design as well as oral hygiene, we wouldn't be having this conversation. :-)


I always thought it was lazy not to use them. I was taught it should always be done. But then, I didnt listen to Joel's discussion. He may have had a good reason, I don't know.


The argument I have heard is that the front-end should have these business rules. Foreign keys "add unnecessary overhead" when you shouldn't be allowing any insertions that break your constraints in the first place. Do I agree with this? No, but that is what I have always heard.

EDIT: My guess is he was referring to foreign key constraints, not foreign keys as a concept.


@imphasing - this is exactly the kind of mindset that causes maintenance nightmares.

Why oh why would you ignore declarative referential integrity, where the data can be guaranteed to be at least consistent, in favour of so called "software enforcement" which is a weak preventative measure at best.


In a project I worked on there was often implicit rather than explicit relationships so that numerous tables could be joined on the same column.

Take the following table

Address

  • AddressId (PK)
  • EntityId
  • EntityType
  • City
  • State
  • Country
  • Etc..

Possible values of EntityType may be Employee, Company, Customer, and the EntityId refers to the primarky key of whichever table you were interested in.

I don't really think this is the best way to do things, but it worked for this project.


I'll echo what Dmitriy said, but adding on a point.

I worked on a batch billing system that needed to insert large sets of rows on 30+ tables. We weren't allowed to do a data pump (Oracle) so we had to do bulk inserts. Those tables had foreign keys on them, but we had already ensured that they were not breaking any relationships.

Before insert, we disable the foreign key constraints so that Oracle doesn't take forever doing the inserts. After the insert is successful, we re-enable the constraints.

PS: In a large database with many foreign keys and child row data for a single record, sometimes foreign keys can be bad, and you may want to disallow cascading deletes. For us in the billing system, it would take too long and be too taxing on the database if we did cascading deletes, so we just mark the record as bad with a field on the main driver (parent) table.


I know only Oracle databases, no other ones, and I can tell that Foreign Keys are essential for maintaining data integrity. Prior to inserting data, a data structure needs to be made, and be made correctlty. When that is done - and thus all primary AND foreign keys are created - the work is done !

Meaning : orphaned rows ? No. Never seen that in my life. Unless a bad programmer forgot the foreign key, or if he implemented that on another level. Both are - in context of Oracle - huge mistakes, which will lead to data duplication, orphan data, and thus : data corruption. I can't imagine a database without FK enforced. It looks like chaos to me. It's a bit like the Unix permission system : imagine that everybody is root. Think of the chaos.

Foreign Keys are essential, just like Primary Keys. It's like saying : what if we removing Primary Keys ? Well, total chaos is going to happen. That's what. You may not move the primary or foreign key responsibility to the programming level, it must be at the data level.

Drawbacks ? Yes, absolutely ! Because on insert, a lot more checks are going to be happening. But, if data integrity is more important than performance, it's a no-brainer. The problem with performance on Oracle is more related to indexes, which come with PK and FK's.


There are no good reasons not to use them... unless orphaned rows aren't a big deal to you I guess.


The Clarify database is an example of a commercial database that has no primary or foreign keys.

http://www.geekinterview.com/question_details/18869

The funny thing is, the technical documentation goes to great lengths to explain how tables are related, what columns to use to join them etc.

In other words, they could have joined the tables with explicit declarations (DRI) but they chose not to.

Consequently, the Clarify database is full of inconsistencies and it underperforms.

But I suppose it made the developers job easier, not having to write code to deal with referential integrity such as checking for related rows before deleting, adding.

And that, I think, is the main benefit of not having foreign key constraints in a relational database. It makes it easier to develop, at least that is from a devil-may-care point of view.


I have to second most of the comments here, Foreign Keys are necessary items to ensure that you have data with integrity. The different options for ON DELETE and ON UPDATE will allow you to get around some of the "down falls" that people mention here regarding their use.

I find that in 99% of all my projects I will have FK's to enforce the integrity of the data, however, there are those rare occasions where I have clients that MUST keep their old data, regardless of how bad it is....but then I spend a lot of time writing code that goes in to only get the valid data anyway, so it becomes pointless.


I echo the answer by Dmitriy - very well put.

For those who are worried about the performance overhead FK's often bring, there's a way (in Oracle) you can get the query optimiser advantage of the FK constraint without the cost overhead of constraint validation during insert, delete or update. That is to create the FK constraint with the attributes RELY DISABLE NOVALIDATE. This means the query optimiser ASSUMES that the constraint has been enforced when building queries, without the database actually enforcing the constraint. You have to be very careful here to take the responsibility when you populate a table with an FK constraint like this to make absolutely sure you don't have data in your FK column(s) that violate the constraint, as if you do so you could get unreliable results from queries that involve the table this FK constraint is on.

I usually use this strategy on some tables in my data mart schema, but not in my integrated staging schema. I make sure the tables I am copying data from already have the same constraint enforced, or the ETL routine enforces the constraint.


Many of the people answering here get too hung up on the importance of referential integrity implemented via referential constraints. Working on large databases with referential integrity just does not perform well. Oracle seems particularly bad at cascading deletes. My rule of thumb is that applications should never update the database directly and should be via a stored procedure. This keeps the code base inside the database, and means that the database maintains its integrity.

Where many applications may be accessing the database, problems do arise because of referential integrity constraints but this is down to a control.

There is a wider issue too in that, application developers may have very different requirements that database developers may not necessarily be that familiar with.


Verifying foreign key constraints takes some CPU time, so some folks omit foreign keys to get some extra performance.


They can make deleting records more cumbersome - you can't delete the "master" record where there are records in other tables where foreign keys would violate that constraint. You can use triggers to have cascading deletes.

If you chose your primary key unwisely, then changing that value becomes even more complex. For example, if I have the PK of my "customers" table as the person's name, and make that key a FK in the "orders" table", if the customer wants to change his name, then it is a royal pain... but that is just shoddy database design.

I believe the advantages in using fireign keys outweighs any supposed disadvantages.


From my experience its always better to avoid using FKs in Database Critical Applications. I would not disagree with guys here who say FKs is a good practice but its not practical where the database is huge and has huge CRUD operations/sec. I can share without naming ... one of the biggest investment bank of doesn't have a single FK in databases. These constrains are handled by programmers while creating applications involving DB. The basic reason is when ever a new CRUD is done it has to effect multiple tables and verify for each inserts/updates, though this won't be a big issue for queries affecting single rows but it does create a huge latency when you deal with batch processing which any big bank has to do as daily tasks.

Its better to avoid FKs but its risk has to be handled by programmers.


Additional Reason to use Foreign Keys: - Allows greater reuse of a database

Additional Reason to NOT use Foreign Keys: - You are trying to lock-in a customer into your tool by reducing reuse.


There are no good reasons not to use them... unless orphaned rows aren't a big deal to you I guess.


I echo the answer by Dmitriy - very well put.

For those who are worried about the performance overhead FK's often bring, there's a way (in Oracle) you can get the query optimiser advantage of the FK constraint without the cost overhead of constraint validation during insert, delete or update. That is to create the FK constraint with the attributes RELY DISABLE NOVALIDATE. This means the query optimiser ASSUMES that the constraint has been enforced when building queries, without the database actually enforcing the constraint. You have to be very careful here to take the responsibility when you populate a table with an FK constraint like this to make absolutely sure you don't have data in your FK column(s) that violate the constraint, as if you do so you could get unreliable results from queries that involve the table this FK constraint is on.

I usually use this strategy on some tables in my data mart schema, but not in my integrated staging schema. I make sure the tables I am copying data from already have the same constraint enforced, or the ETL routine enforces the constraint.


One time when an FK might cause you a problem is when you have historical data that references the key (in a lookup table) even though you no longer want the key available.
Obviously the solution is to design things better up front, but I am thinking of real world situations here where you don't always have control of the full solution.
For example: perhaps you have a look up table customer_type that lists different types of customers - lets say you need to remove a certain customer type, but (due to business restraints) aren't able to update the client software, and nobody invisaged this situation when developing the software, the fact that it is a foreign key in some other table may prevent you from removing the row even though you know the historical data that references it is irrelevant.
After being burnt with this a few times you probably lean away from db enforcement of relationships.
(I'm not saying this is good - just giving a reason why you may decide to avoid FKs and db contraints in general)


One good principle of data structure design is to ensure that every attribute of a table or object be subject to a well-understood constraint. This is important because if you or your program can count on valid data in the database, you are less likely to have program defects caused by bad data. You also spend less time writing code to handle error conditions, and you are more likely to write error-handling code up front.

In many cases these constraints can be defined at compile-time, in which case you can write a filter to ensure that the attribute always falls within range, or the attempt to save the attribute fails.

However, in many cases these constraints can change at run-time. For example, you may have a "cars" table that has "colour" as an attribute which initially takes on the values, say, of "red", "green" and "blue". It is possible during the execution of the program to add valid colours to that initial list, and new "cars" added may take on any colour in the up-to-date list of colours. Furthermore, you usually want this updated list of colours to survive a program restart.

To answer your question, it turns out that if you have a requirement for data constraint that can change at run-time, and those changes must survive a program restart, foreign keys are the simplest and most concise solution to the problem. The development cost is the addition of one table (e.g. "colours", a foreign key constraint to the "cars" table, and an index), and the run-time cost is the extra table lookup for the up-to-date colours to validate the data, and this run-time cost is usually mitigated by indexing and caching.

If you don't use foreign keys for these requirements, you must write software to manage the list, look valid entries, save it to disk, structure the data efficiently if the list is large, ensure that any updates to the list don't corrupt the list file, provide serial access to the list in case there are multiple readers and/or writers, and so on. i.e. You need to implement a lot of RDBMS functionality.


I always thought it was lazy not to use them. I was taught it should always be done. But then, I didnt listen to Joel's discussion. He may have had a good reason, I don't know.


I'm sure there are plenty of applications where you can get away with it, but it's not the best idea. You can't always count on your application to properly manage your database, and frankly managing the database should not be of very much concern to your application.

If you are using a relational database then it seems you ought to have some relationships defined in it. Unfortunately this attitude (you don't need foreign keys) seems to be embraced by a lot of application developers who would rather not be bothered with silly things like data integrity (but need to because their companies don't have dedicated database developers). Usually in databases put together by these types you are lucky just to have primary keys ;)


I can see a few reasons to use foreign keys (Orphaned rows, as someone mentioned, are annoying) but I never use them either. With a relatively sane DB schema, I don't think they are 100% needed. Constraints are good, but enforcing them via software is a better method, I think.

Alex


This is an issue of upbringing. If somewhere in your educational or professional career you spent time feeding and caring for databases (or worked closely with talented folks who did), then the fundamental tenets of entities and relationships are well-ingrained in your thought process. Among those rudiments is how/when/why to specify keys in your database (primary, foreign and perhaps alternate). It's second nature.

If, however, you've not had such a thorough or positive experience in your past with RDBMS-related endeavors, then you've likely not been exposed to such information. Or perhaps your past includes immersion in an environment that was vociferously anti-database (e.g., "those DBAs are idiots - we few, we chosen few java/c# code slingers will save the day"), in which case you might be vehemently opposed to the arcane babblings of some dweeb telling you that FKs (and the constraints they can imply) really are important if you'd just listen.

Most everyone was taught when they were kids that brushing your teeth was important. Can you get by without it? Sure, but somewhere down the line you'll have less teeth available than you could have if you had brushed after every meal. If moms and dads were responsible enough to cover database design as well as oral hygiene, we wouldn't be having this conversation. :-)


They can make deleting records more cumbersome - you can't delete the "master" record where there are records in other tables where foreign keys would violate that constraint. You can use triggers to have cascading deletes.

If you chose your primary key unwisely, then changing that value becomes even more complex. For example, if I have the PK of my "customers" table as the person's name, and make that key a FK in the "orders" table", if the customer wants to change his name, then it is a royal pain... but that is just shoddy database design.

I believe the advantages in using fireign keys outweighs any supposed disadvantages.


I agree with the previous answers in that they are useful to mantain data consistency. However, there was an interesting post by Jeff Atwood some weeks ago that discussed the pros and cons of normalized and consistent data.

In a few words, a denormalized database can be faster when handling huge amounts of data; and you may not care about precise consistency depending on the application, but it forces you to be much more careful when dealing with data, as the DB won't be.


To me, if you want to go by the ACID standards, it is critical to have foreign keys to ensure referential integrity.


The argument I have heard is that the front-end should have these business rules. Foreign keys "add unnecessary overhead" when you shouldn't be allowing any insertions that break your constraints in the first place. Do I agree with this? No, but that is what I have always heard.

EDIT: My guess is he was referring to foreign key constraints, not foreign keys as a concept.


I agree with the previous answers in that they are useful to mantain data consistency. However, there was an interesting post by Jeff Atwood some weeks ago that discussed the pros and cons of normalized and consistent data.

In a few words, a denormalized database can be faster when handling huge amounts of data; and you may not care about precise consistency depending on the application, but it forces you to be much more careful when dealing with data, as the DB won't be.


I have heard this argument too - from people who forgot to put an index on their foreign keys and then complained that certain operations were slow (because constraint checking could take advantage of any index). So to sum up: There is no good reason not to use foreign keys. All modern databases support cascaded deletes, so...


Additional Reason to use Foreign Keys: - Allows greater reuse of a database

Additional Reason to NOT use Foreign Keys: - You are trying to lock-in a customer into your tool by reducing reuse.


Verifying foreign key constraints takes some CPU time, so some folks omit foreign keys to get some extra performance.


Additional Reason to use Foreign Keys: - Allows greater reuse of a database

Additional Reason to NOT use Foreign Keys: - You are trying to lock-in a customer into your tool by reducing reuse.


I also think that foreign keys are a necessity in most databases. The only drawback (besides the performance hit that comes with having enforced consistence) is that having a foreign key allows people to write code that assumes there is a functional foreign key. That should never be allowed.

For example, I've seen people write code that inserts into the referenced table and then attempts inserts into the referencing table without verifying the first insert was successful. If the foreign key is removed at a later time, that results in an inconsistent database.

You also don't have the option of assuming a specific behavior on update or delete. You still need to write your code to do what you want regardless of whether there is a foreign key present. If you assume deletes are cascaded when they are not, your deletes will fail. If you assume updates to the referenced columns are propogated to the referencing rows when they are not, your updates will fail. For the purposes of writing code, you might as well not have those features.

If those features are turned on, then your code will emulate them anyway and you'll lose a little performance.

So, the summary.... Foreign keys are essential if you need a consistent database. Foreign keys should never be assumed to be present or functional in code that you write.


Verifying foreign key constraints takes some CPU time, so some folks omit foreign keys to get some extra performance.


I have heard this argument too - from people who forgot to put an index on their foreign keys and then complained that certain operations were slow (because constraint checking could take advantage of any index). So to sum up: There is no good reason not to use foreign keys. All modern databases support cascaded deletes, so...


They can make deleting records more cumbersome - you can't delete the "master" record where there are records in other tables where foreign keys would violate that constraint. You can use triggers to have cascading deletes.

If you chose your primary key unwisely, then changing that value becomes even more complex. For example, if I have the PK of my "customers" table as the person's name, and make that key a FK in the "orders" table", if the customer wants to change his name, then it is a royal pain... but that is just shoddy database design.

I believe the advantages in using fireign keys outweighs any supposed disadvantages.


One good principle of data structure design is to ensure that every attribute of a table or object be subject to a well-understood constraint. This is important because if you or your program can count on valid data in the database, you are less likely to have program defects caused by bad data. You also spend less time writing code to handle error conditions, and you are more likely to write error-handling code up front.

In many cases these constraints can be defined at compile-time, in which case you can write a filter to ensure that the attribute always falls within range, or the attempt to save the attribute fails.

However, in many cases these constraints can change at run-time. For example, you may have a "cars" table that has "colour" as an attribute which initially takes on the values, say, of "red", "green" and "blue". It is possible during the execution of the program to add valid colours to that initial list, and new "cars" added may take on any colour in the up-to-date list of colours. Furthermore, you usually want this updated list of colours to survive a program restart.

To answer your question, it turns out that if you have a requirement for data constraint that can change at run-time, and those changes must survive a program restart, foreign keys are the simplest and most concise solution to the problem. The development cost is the addition of one table (e.g. "colours", a foreign key constraint to the "cars" table, and an index), and the run-time cost is the extra table lookup for the up-to-date colours to validate the data, and this run-time cost is usually mitigated by indexing and caching.

If you don't use foreign keys for these requirements, you must write software to manage the list, look valid entries, save it to disk, structure the data efficiently if the list is large, ensure that any updates to the list don't corrupt the list file, provide serial access to the list in case there are multiple readers and/or writers, and so on. i.e. You need to implement a lot of RDBMS functionality.


I'm sure there are plenty of applications where you can get away with it, but it's not the best idea. You can't always count on your application to properly manage your database, and frankly managing the database should not be of very much concern to your application.

If you are using a relational database then it seems you ought to have some relationships defined in it. Unfortunately this attitude (you don't need foreign keys) seems to be embraced by a lot of application developers who would rather not be bothered with silly things like data integrity (but need to because their companies don't have dedicated database developers). Usually in databases put together by these types you are lucky just to have primary keys ;)


I echo the answer by Dmitriy - very well put.

For those who are worried about the performance overhead FK's often bring, there's a way (in Oracle) you can get the query optimiser advantage of the FK constraint without the cost overhead of constraint validation during insert, delete or update. That is to create the FK constraint with the attributes RELY DISABLE NOVALIDATE. This means the query optimiser ASSUMES that the constraint has been enforced when building queries, without the database actually enforcing the constraint. You have to be very careful here to take the responsibility when you populate a table with an FK constraint like this to make absolutely sure you don't have data in your FK column(s) that violate the constraint, as if you do so you could get unreliable results from queries that involve the table this FK constraint is on.

I usually use this strategy on some tables in my data mart schema, but not in my integrated staging schema. I make sure the tables I am copying data from already have the same constraint enforced, or the ETL routine enforces the constraint.


To me, if you want to go by the ACID standards, it is critical to have foreign keys to ensure referential integrity.


Like many things, it's a tradeoff. It's a question of where you want to do the work to verify the data integrity:

(1) use a foreign key (a single point to configure for a table, feature is already implemented, tested, proven to work)

(2) leave it to the users of the database (possible multiple users/apps updating the same table (s) meaning more potential points of failure and increased complexity in testing).

It's more efficient for the database to do (2), easier to maintain and less risk with (1).


I can see a few reasons to use foreign keys (Orphaned rows, as someone mentioned, are annoying) but I never use them either. With a relatively sane DB schema, I don't think they are 100% needed. Constraints are good, but enforcing them via software is a better method, I think.

Alex


@imphasing - this is exactly the kind of mindset that causes maintenance nightmares.

Why oh why would you ignore declarative referential integrity, where the data can be guaranteed to be at least consistent, in favour of so called "software enforcement" which is a weak preventative measure at best.


I can see a few reasons to use foreign keys (Orphaned rows, as someone mentioned, are annoying) but I never use them either. With a relatively sane DB schema, I don't think they are 100% needed. Constraints are good, but enforcing them via software is a better method, I think.

Alex


Quite often we receive the errors with FK constraints Cannot add or update a child row: a foreign key constraint fails Suppose there are two tables inventory_source and contract_lines, and we are referring inventory_source_id in contract_lines from inventory_source and suppose we want to delete record from inventory_source and the record is already present in contract_lines or we want to delete the PK column from Base table, we get errors for FK constraints, we can avoid it using the steps jotted below.

CREATE TABLE inventory_source (
inventory_source_id int(11) NOT NULL AUTO_INCREMENT,
display_name varchar(40) NOT NULL,
state_id int(11) NOT NULL,
PRIMARY KEY (inventory_source_id),
KEY state_id (state_id),
CONSTRAINT ba_inventory_source_state_fk FOREIGN KEY (state_id) REFERENCES   ba_state (state_id)
) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=utf8;

CREATE TABLE contract_lines(
contract_line_id int(11) NOT NULL AUTO_INCREMENT,
inventory_source_id int(11) NULL ,
PRIMARY KEY (contract_line_id),
UNIQUE KEY contract_line_id (contract_line_id),
KEY AI_contract_line_id (contract_line_id),
KEY contract_lines_inventory_source_fk (inventory_source_id),
CONSTRAINT contract_lines_inventory_source_fk FOREIGN KEY       (inventory_source_id) REFERENCES ba_inventory_source (inventory_source_id)
) ENGINE=InnoDB AUTO_INCREMENT=135 DEFAULT CHARSET=utf8 ;

We can overcome it using the following steps:-

  1. Delete or update the row from the inventory_source will automatically delete or update the matching rows in the contract_lines table and this is known as cascade delete or update.
  2. Another way of doing it is setting the column i.e inventory_source_id in the contract_lines table to NULL, when a record corresponding to it is deleted in the inventory_source table.
  3. We can restrict the parent table for delete or update in other words one can reject the delete or update operation for the inventory_source table.
  4. Attempt to delete or update a primary key value will not be permitted to proceed if there is a related foreign key value in the referenced table.

@imphasing - this is exactly the kind of mindset that causes maintenance nightmares.

Why oh why would you ignore declarative referential integrity, where the data can be guaranteed to be at least consistent, in favour of so called "software enforcement" which is a weak preventative measure at best.


I'll echo what Dmitriy said, but adding on a point.

I worked on a batch billing system that needed to insert large sets of rows on 30+ tables. We weren't allowed to do a data pump (Oracle) so we had to do bulk inserts. Those tables had foreign keys on them, but we had already ensured that they were not breaking any relationships.

Before insert, we disable the foreign key constraints so that Oracle doesn't take forever doing the inserts. After the insert is successful, we re-enable the constraints.

PS: In a large database with many foreign keys and child row data for a single record, sometimes foreign keys can be bad, and you may want to disallow cascading deletes. For us in the billing system, it would take too long and be too taxing on the database if we did cascading deletes, so we just mark the record as bad with a field on the main driver (parent) table.


There's one good reason not to use them: If you don't understand their role or how to use them.

In the wrong situations, foreign key constraints can lead to waterfall replication of accidents. If somebody removes the wrong record, undoing it can become a mammoth task.

Also, conversely, when you need to remove something, if poorly designed, constraints can cause all sorts of locks that prevent you.


I also think that foreign keys are a necessity in most databases. The only drawback (besides the performance hit that comes with having enforced consistence) is that having a foreign key allows people to write code that assumes there is a functional foreign key. That should never be allowed.

For example, I've seen people write code that inserts into the referenced table and then attempts inserts into the referencing table without verifying the first insert was successful. If the foreign key is removed at a later time, that results in an inconsistent database.

You also don't have the option of assuming a specific behavior on update or delete. You still need to write your code to do what you want regardless of whether there is a foreign key present. If you assume deletes are cascaded when they are not, your deletes will fail. If you assume updates to the referenced columns are propogated to the referencing rows when they are not, your updates will fail. For the purposes of writing code, you might as well not have those features.

If those features are turned on, then your code will emulate them anyway and you'll lose a little performance.

So, the summary.... Foreign keys are essential if you need a consistent database. Foreign keys should never be assumed to be present or functional in code that you write.


Foreign keys are essential to any relational database model.


I always thought it was lazy not to use them. I was taught it should always be done. But then, I didnt listen to Joel's discussion. He may have had a good reason, I don't know.


Bigger question is: would you drive with a blindfold on? That’s how it is if you develop a system without referential constraints. Keep in mind, that business requirements change, application design changes, respective logical assumptions in the code changes, logic itself can be refactored, and so on. In general, constraints in databases are put in place under contemporary logical assumptions, seemingly correct for particular set of logical assertions and assumptions.

Through the lifecycle of an application, referential and data checks constraints police data collection via the application, especially when new requirements drive logical application changes.

To the subject of this listing - a foreign key does not by itself "improve performance", nor does it "degrade performance" significantly from a standpoint of real-time transaction processing system. However, there is an aggregated cost for constraint checking in HIGH volume "batch" system. So, here is the difference, real-time vs. batch transaction process; batch processing - where aggreated cost, incured by constraint checks, of a sequentially processed batch poses a performance hit.

In a well designed system, data consistency checks would be done "before" processing a batch through (nevertheless, there is a cost associated here also); therefore, foreign key constraint checks are not required during load time. In fact all constraints, including foreign key, should be temporarily disabled till the batch is processed.

QUERY PERFORMANCE - if tables are joined on foreign keys, be cognizant of the fact that foreign key columns are NOT INDEXED (though the respective primary key is indexed by definition). By indexing a foreign key, for that matter, by indexing any key, and joining tables on indexed helps with better performances, not by joining on non-indexed key with foreign key constraint on it.

Changing subjects, if a database is just supporting website display/rendering content/etc and recording clicks, then a database with full constraints on all tables is over kill for such purposes. Think about it. Most websites don’t even use a database for such. For similar requirements, where data is just being recorded and not referenced per say, use an in-memory database, which does not have constraints. This doesn’t mean that there is no data model, yes logical model, but no physical data model.


One time when an FK might cause you a problem is when you have historical data that references the key (in a lookup table) even though you no longer want the key available.
Obviously the solution is to design things better up front, but I am thinking of real world situations here where you don't always have control of the full solution.
For example: perhaps you have a look up table customer_type that lists different types of customers - lets say you need to remove a certain customer type, but (due to business restraints) aren't able to update the client software, and nobody invisaged this situation when developing the software, the fact that it is a foreign key in some other table may prevent you from removing the row even though you know the historical data that references it is irrelevant.
After being burnt with this a few times you probably lean away from db enforcement of relationships.
(I'm not saying this is good - just giving a reason why you may decide to avoid FKs and db contraints in general)


"They can make deleting records more cumbersome - you can't delete the "master" record where there are records in other tables where foreign keys would violate that constraint."

It's important to remember that the SQL standard defines actions that are taken when a foreign key is deleted or updated. The ones I know of are:

  • ON DELETE RESTRICT - Prevents any rows in the other table that have keys in this column from being deleted. This is what Ken Ray described above.
  • ON DELETE CASCADE - If a row in the other table is deleted, delete any rows in this table that reference it.
  • ON DELETE SET DEFAULT - If a row in the other table is deleted, set any foreign keys referencing it to the column's default.
  • ON DELETE SET NULL - If a row in the other table is deleted, set any foreign keys referencing it in this table to null.
  • ON DELETE NO ACTION - This foreign key only marks that it is a foreign key; namely for use in OR mappers.

These same actions also apply to ON UPDATE.

The default seems to depend on which server you're using.


In DB2, if MQTs (Materialized Query Tables) are used, foreign key constraints are required for the optimizer to choose the right plan for any given query. Since they contain the cardinality information, the optimizer uses the metadata heavily to use a MQT or not.


The argument I have heard is that the front-end should have these business rules. Foreign keys "add unnecessary overhead" when you shouldn't be allowing any insertions that break your constraints in the first place. Do I agree with this? No, but that is what I have always heard.

EDIT: My guess is he was referring to foreign key constraints, not foreign keys as a concept.


I have heard this argument too - from people who forgot to put an index on their foreign keys and then complained that certain operations were slow (because constraint checking could take advantage of any index). So to sum up: There is no good reason not to use foreign keys. All modern databases support cascaded deletes, so...


If you are absolutey sure, that the one underlying database system will not change in the future, I would use foreign keys to ensure data integrity.

But here is another very good real-life reason not to use foreign keys at all:

You are developing a product, which should support different database systems.

If you are working with the Entity Framework, which is able to connect to many different database systems, you may also want to support "open-source-free-of-charge" serverless databases. Not all of these databases may support your foreign key rules (updating, deleting rows...).

This can lead to different problems:

1.) You may run into errors, when the database structure is created or updated. Maybe there will only be silent errors, because your foreign keys are just ignored by the database system.

2.) If you rely on foreign keys, you will propably make less or even no data integrity checks in your business logic. Now, if the new database system does not support these foreign key rules or just behaves in a different way, you have to rewrite your business logic.

You may ask: Who needs different database systems? Well, not everybody can afford or wants a full blown SQL-Server on his machine. This is software, which needs to be maintained. Others already have invested time and money in some other DB system. Serverless database are great for small customers on only one machine.

Nobody knows, how all of these DB systems behave, but your business logic, with integrity checks, always stays the same.


I also think that foreign keys are a necessity in most databases. The only drawback (besides the performance hit that comes with having enforced consistence) is that having a foreign key allows people to write code that assumes there is a functional foreign key. That should never be allowed.

For example, I've seen people write code that inserts into the referenced table and then attempts inserts into the referencing table without verifying the first insert was successful. If the foreign key is removed at a later time, that results in an inconsistent database.

You also don't have the option of assuming a specific behavior on update or delete. You still need to write your code to do what you want regardless of whether there is a foreign key present. If you assume deletes are cascaded when they are not, your deletes will fail. If you assume updates to the referenced columns are propogated to the referencing rows when they are not, your updates will fail. For the purposes of writing code, you might as well not have those features.

If those features are turned on, then your code will emulate them anyway and you'll lose a little performance.

So, the summary.... Foreign keys are essential if you need a consistent database. Foreign keys should never be assumed to be present or functional in code that you write.


There are no good reasons not to use them... unless orphaned rows aren't a big deal to you I guess.


To me, if you want to go by the ACID standards, it is critical to have foreign keys to ensure referential integrity.


One time when an FK might cause you a problem is when you have historical data that references the key (in a lookup table) even though you no longer want the key available.
Obviously the solution is to design things better up front, but I am thinking of real world situations here where you don't always have control of the full solution.
For example: perhaps you have a look up table customer_type that lists different types of customers - lets say you need to remove a certain customer type, but (due to business restraints) aren't able to update the client software, and nobody invisaged this situation when developing the software, the fact that it is a foreign key in some other table may prevent you from removing the row even though you know the historical data that references it is irrelevant.
After being burnt with this a few times you probably lean away from db enforcement of relationships.
(I'm not saying this is good - just giving a reason why you may decide to avoid FKs and db contraints in general)


If you are absolutey sure, that the one underlying database system will not change in the future, I would use foreign keys to ensure data integrity.

But here is another very good real-life reason not to use foreign keys at all:

You are developing a product, which should support different database systems.

If you are working with the Entity Framework, which is able to connect to many different database systems, you may also want to support "open-source-free-of-charge" serverless databases. Not all of these databases may support your foreign key rules (updating, deleting rows...).

This can lead to different problems:

1.) You may run into errors, when the database structure is created or updated. Maybe there will only be silent errors, because your foreign keys are just ignored by the database system.

2.) If you rely on foreign keys, you will propably make less or even no data integrity checks in your business logic. Now, if the new database system does not support these foreign key rules or just behaves in a different way, you have to rewrite your business logic.

You may ask: Who needs different database systems? Well, not everybody can afford or wants a full blown SQL-Server on his machine. This is software, which needs to be maintained. Others already have invested time and money in some other DB system. Serverless database are great for small customers on only one machine.

Nobody knows, how all of these DB systems behave, but your business logic, with integrity checks, always stays the same.


How about maintainability and constancy across application life cycles? Most data has a longer lifespan than the applications that make use of it. Relationships and data integrity are much too important to leave to the hope that the next dev team gets it right in the app code. If you haven't worked on a db with dirty data that doesn't respect the natural relationships, you will. The importance of data integrity will then become very clear.


Verifying foreign key constraints takes some CPU time, so some folks omit foreign keys to get some extra performance.


There's one good reason not to use them: If you don't understand their role or how to use them.

In the wrong situations, foreign key constraints can lead to waterfall replication of accidents. If somebody removes the wrong record, undoing it can become a mammoth task.

Also, conversely, when you need to remove something, if poorly designed, constraints can cause all sorts of locks that prevent you.


I echo the answer by Dmitriy - very well put.

For those who are worried about the performance overhead FK's often bring, there's a way (in Oracle) you can get the query optimiser advantage of the FK constraint without the cost overhead of constraint validation during insert, delete or update. That is to create the FK constraint with the attributes RELY DISABLE NOVALIDATE. This means the query optimiser ASSUMES that the constraint has been enforced when building queries, without the database actually enforcing the constraint. You have to be very careful here to take the responsibility when you populate a table with an FK constraint like this to make absolutely sure you don't have data in your FK column(s) that violate the constraint, as if you do so you could get unreliable results from queries that involve the table this FK constraint is on.

I usually use this strategy on some tables in my data mart schema, but not in my integrated staging schema. I make sure the tables I am copying data from already have the same constraint enforced, or the ETL routine enforces the constraint.


Bigger question is: would you drive with a blindfold on? That’s how it is if you develop a system without referential constraints. Keep in mind, that business requirements change, application design changes, respective logical assumptions in the code changes, logic itself can be refactored, and so on. In general, constraints in databases are put in place under contemporary logical assumptions, seemingly correct for particular set of logical assertions and assumptions.

Through the lifecycle of an application, referential and data checks constraints police data collection via the application, especially when new requirements drive logical application changes.

To the subject of this listing - a foreign key does not by itself "improve performance", nor does it "degrade performance" significantly from a standpoint of real-time transaction processing system. However, there is an aggregated cost for constraint checking in HIGH volume "batch" system. So, here is the difference, real-time vs. batch transaction process; batch processing - where aggreated cost, incured by constraint checks, of a sequentially processed batch poses a performance hit.

In a well designed system, data consistency checks would be done "before" processing a batch through (nevertheless, there is a cost associated here also); therefore, foreign key constraint checks are not required during load time. In fact all constraints, including foreign key, should be temporarily disabled till the batch is processed.

QUERY PERFORMANCE - if tables are joined on foreign keys, be cognizant of the fact that foreign key columns are NOT INDEXED (though the respective primary key is indexed by definition). By indexing a foreign key, for that matter, by indexing any key, and joining tables on indexed helps with better performances, not by joining on non-indexed key with foreign key constraint on it.

Changing subjects, if a database is just supporting website display/rendering content/etc and recording clicks, then a database with full constraints on all tables is over kill for such purposes. Think about it. Most websites don’t even use a database for such. For similar requirements, where data is just being recorded and not referenced per say, use an in-memory database, which does not have constraints. This doesn’t mean that there is no data model, yes logical model, but no physical data model.


I can see a few reasons to use foreign keys (Orphaned rows, as someone mentioned, are annoying) but I never use them either. With a relatively sane DB schema, I don't think they are 100% needed. Constraints are good, but enforcing them via software is a better method, I think.

Alex


"They can make deleting records more cumbersome - you can't delete the "master" record where there are records in other tables where foreign keys would violate that constraint."

It's important to remember that the SQL standard defines actions that are taken when a foreign key is deleted or updated. The ones I know of are:

  • ON DELETE RESTRICT - Prevents any rows in the other table that have keys in this column from being deleted. This is what Ken Ray described above.
  • ON DELETE CASCADE - If a row in the other table is deleted, delete any rows in this table that reference it.
  • ON DELETE SET DEFAULT - If a row in the other table is deleted, set any foreign keys referencing it to the column's default.
  • ON DELETE SET NULL - If a row in the other table is deleted, set any foreign keys referencing it in this table to null.
  • ON DELETE NO ACTION - This foreign key only marks that it is a foreign key; namely for use in OR mappers.

These same actions also apply to ON UPDATE.

The default seems to depend on which server you're using.


How about maintainability and constancy across application life cycles? Most data has a longer lifespan than the applications that make use of it. Relationships and data integrity are much too important to leave to the hope that the next dev team gets it right in the app code. If you haven't worked on a db with dirty data that doesn't respect the natural relationships, you will. The importance of data integrity will then become very clear.


They can make deleting records more cumbersome - you can't delete the "master" record where there are records in other tables where foreign keys would violate that constraint. You can use triggers to have cascading deletes.

If you chose your primary key unwisely, then changing that value becomes even more complex. For example, if I have the PK of my "customers" table as the person's name, and make that key a FK in the "orders" table", if the customer wants to change his name, then it is a royal pain... but that is just shoddy database design.

I believe the advantages in using fireign keys outweighs any supposed disadvantages.


"Before adding a record, check that a corresponding record exists in another table" is business logic.

Here are some reasons you don't want this in the database:

  1. If the business rules change, you have to change the database. The database will need to recreate the index in a lot of cases and this is slow on large tables. (Changing rules include: allow guests to post messages or allow users to delete their account despite having posted comments, etc).

  2. Changing the database is not as easy as deploying a software fix by pushing the changes to the production repository. We want to avoid changing the database structure as much as possible. The more business logic there is in the database the more you increase the chances of needing to change the databae (and triggering re-indexing).

  3. TDD. In unit tests you can substitute the database for mocks and test the functionality. If you have any business logic in your database, you are not doing complete tests and would need to either test with the database or replicate the business logic in code for testing purposes, duplicating the logic and increasing the likelyhood of the logic not working in the same way.

  4. Reusing your logic with different data sources. If there is no logic in the database, my application can create objects from records from the database, create them from a web service, a json file or any other source. I just need to swap out the data mapper implementation and can use all my business logic with any source. If there is logic in the database, this isn't possible and you have to implement the logic at the data mapper layer or in the business logic. Either way, you need those checks in your code. If there's no logic in the database I can deploy the application in different locations using different database or flat-file implementations.


This is an issue of upbringing. If somewhere in your educational or professional career you spent time feeding and caring for databases (or worked closely with talented folks who did), then the fundamental tenets of entities and relationships are well-ingrained in your thought process. Among those rudiments is how/when/why to specify keys in your database (primary, foreign and perhaps alternate). It's second nature.

If, however, you've not had such a thorough or positive experience in your past with RDBMS-related endeavors, then you've likely not been exposed to such information. Or perhaps your past includes immersion in an environment that was vociferously anti-database (e.g., "those DBAs are idiots - we few, we chosen few java/c# code slingers will save the day"), in which case you might be vehemently opposed to the arcane babblings of some dweeb telling you that FKs (and the constraints they can imply) really are important if you'd just listen.

Most everyone was taught when they were kids that brushing your teeth was important. Can you get by without it? Sure, but somewhere down the line you'll have less teeth available than you could have if you had brushed after every meal. If moms and dads were responsible enough to cover database design as well as oral hygiene, we wouldn't be having this conversation. :-)


Verifying foreign key constraints takes some CPU time, so some folks omit foreign keys to get some extra performance.


How about maintainability and constancy across application life cycles? Most data has a longer lifespan than the applications that make use of it. Relationships and data integrity are much too important to leave to the hope that the next dev team gets it right in the app code. If you haven't worked on a db with dirty data that doesn't respect the natural relationships, you will. The importance of data integrity will then become very clear.


I have heard this argument too - from people who forgot to put an index on their foreign keys and then complained that certain operations were slow (because constraint checking could take advantage of any index). So to sum up: There is no good reason not to use foreign keys. All modern databases support cascaded deletes, so...


One time when an FK might cause you a problem is when you have historical data that references the key (in a lookup table) even though you no longer want the key available.
Obviously the solution is to design things better up front, but I am thinking of real world situations here where you don't always have control of the full solution.
For example: perhaps you have a look up table customer_type that lists different types of customers - lets say you need to remove a certain customer type, but (due to business restraints) aren't able to update the client software, and nobody invisaged this situation when developing the software, the fact that it is a foreign key in some other table may prevent you from removing the row even though you know the historical data that references it is irrelevant.
After being burnt with this a few times you probably lean away from db enforcement of relationships.
(I'm not saying this is good - just giving a reason why you may decide to avoid FKs and db contraints in general)


I agree with the previous answers in that they are useful to mantain data consistency. However, there was an interesting post by Jeff Atwood some weeks ago that discussed the pros and cons of normalized and consistent data.

In a few words, a denormalized database can be faster when handling huge amounts of data; and you may not care about precise consistency depending on the application, but it forces you to be much more careful when dealing with data, as the DB won't be.


I echo the answer by Dmitriy - very well put.

For those who are worried about the performance overhead FK's often bring, there's a way (in Oracle) you can get the query optimiser advantage of the FK constraint without the cost overhead of constraint validation during insert, delete or update. That is to create the FK constraint with the attributes RELY DISABLE NOVALIDATE. This means the query optimiser ASSUMES that the constraint has been enforced when building queries, without the database actually enforcing the constraint. You have to be very careful here to take the responsibility when you populate a table with an FK constraint like this to make absolutely sure you don't have data in your FK column(s) that violate the constraint, as if you do so you could get unreliable results from queries that involve the table this FK constraint is on.

I usually use this strategy on some tables in my data mart schema, but not in my integrated staging schema. I make sure the tables I am copying data from already have the same constraint enforced, or the ETL routine enforces the constraint.


Foreign keys are essential to any relational database model.


There are no good reasons not to use them... unless orphaned rows aren't a big deal to you I guess.


"They can make deleting records more cumbersome - you can't delete the "master" record where there are records in other tables where foreign keys would violate that constraint."

It's important to remember that the SQL standard defines actions that are taken when a foreign key is deleted or updated. The ones I know of are:

  • ON DELETE RESTRICT - Prevents any rows in the other table that have keys in this column from being deleted. This is what Ken Ray described above.
  • ON DELETE CASCADE - If a row in the other table is deleted, delete any rows in this table that reference it.
  • ON DELETE SET DEFAULT - If a row in the other table is deleted, set any foreign keys referencing it to the column's default.
  • ON DELETE SET NULL - If a row in the other table is deleted, set any foreign keys referencing it in this table to null.
  • ON DELETE NO ACTION - This foreign key only marks that it is a foreign key; namely for use in OR mappers.

These same actions also apply to ON UPDATE.

The default seems to depend on which server you're using.


I always use them, but then I make databases for financial systems. The database is the critical part of the application. If the data in a financial database isn't totally accurate then it really doesn't matter how much effort you put into your code/front-end design. You're just wasting your time.

There's also the fact that multiple systems generally need to interface directly with the database - from other systems that just read data out (Crystal Reports) to systems that insert data (not necessarily using an API I've designed; it may be written by a dull-witted manager who has just discovered VBScript and has the SA password for the SQL box). If the database isn't as idiot-proof as it can possibly be, well - bye bye database.

If your data is important, then yes, use foreign keys, create a suite of stored procedures to interact with the data, and make the toughest DB you can. If your data isn't important, why are you making a database to begin with?


The argument I have heard is that the front-end should have these business rules. Foreign keys "add unnecessary overhead" when you shouldn't be allowing any insertions that break your constraints in the first place. Do I agree with this? No, but that is what I have always heard.

EDIT: My guess is he was referring to foreign key constraints, not foreign keys as a concept.


I have to second most of the comments here, Foreign Keys are necessary items to ensure that you have data with integrity. The different options for ON DELETE and ON UPDATE will allow you to get around some of the "down falls" that people mention here regarding their use.

I find that in 99% of all my projects I will have FK's to enforce the integrity of the data, however, there are those rare occasions where I have clients that MUST keep their old data, regardless of how bad it is....but then I spend a lot of time writing code that goes in to only get the valid data anyway, so it becomes pointless.


I can see a few reasons to use foreign keys (Orphaned rows, as someone mentioned, are annoying) but I never use them either. With a relatively sane DB schema, I don't think they are 100% needed. Constraints are good, but enforcing them via software is a better method, I think.

Alex


There's one good reason not to use them: If you don't understand their role or how to use them.

In the wrong situations, foreign key constraints can lead to waterfall replication of accidents. If somebody removes the wrong record, undoing it can become a mammoth task.

Also, conversely, when you need to remove something, if poorly designed, constraints can cause all sorts of locks that prevent you.


In DB2, if MQTs (Materialized Query Tables) are used, foreign key constraints are required for the optimizer to choose the right plan for any given query. Since they contain the cardinality information, the optimizer uses the metadata heavily to use a MQT or not.


Update: I always use foreign keys now. My answer to the objection "they complicated testing" is "write your unit tests so they don't need the database at all. Any tests that use the database should use it properly, and that includes foreign keys. If the setup is painful, find a less painful way to do the setup."


Foreign keys complicate automated testing

Suppose you're using foreign keys. You're writing an automated test that says "when I update a financial account, it should save a record of the transaction." In this test, you're only concerned with two tables: accounts and transactions.

However, accounts has a foreign key to contracts, and contracts has a fk to clients, and clients has a fk to cities, and cities has a fk to states.

Now the database will not allow you to run your test without setting up data in four tables that aren't related to your test.

There are at least two possible perspectives on this:

  • "That's a good thing: your test should be realistic, and those data constraints will exist in production."
  • "That's a bad thing: you should be able to unit test pieces of the system without involving other pieces. You can add integration tests for the system as a whole."

It may also be possible to temporarily turn off foreign key checks while running tests. MySQL, at least, supports this.


@imphasing - this is exactly the kind of mindset that causes maintenance nightmares.

Why oh why would you ignore declarative referential integrity, where the data can be guaranteed to be at least consistent, in favour of so called "software enforcement" which is a weak preventative measure at best.


Like many things, it's a tradeoff. It's a question of where you want to do the work to verify the data integrity:

(1) use a foreign key (a single point to configure for a table, feature is already implemented, tested, proven to work)

(2) leave it to the users of the database (possible multiple users/apps updating the same table (s) meaning more potential points of failure and increased complexity in testing).

It's more efficient for the database to do (2), easier to maintain and less risk with (1).


I agree with the previous answers in that they are useful to mantain data consistency. However, there was an interesting post by Jeff Atwood some weeks ago that discussed the pros and cons of normalized and consistent data.

In a few words, a denormalized database can be faster when handling huge amounts of data; and you may not care about precise consistency depending on the application, but it forces you to be much more careful when dealing with data, as the DB won't be.


"They can make deleting records more cumbersome - you can't delete the "master" record where there are records in other tables where foreign keys would violate that constraint."

It's important to remember that the SQL standard defines actions that are taken when a foreign key is deleted or updated. The ones I know of are:

  • ON DELETE RESTRICT - Prevents any rows in the other table that have keys in this column from being deleted. This is what Ken Ray described above.
  • ON DELETE CASCADE - If a row in the other table is deleted, delete any rows in this table that reference it.
  • ON DELETE SET DEFAULT - If a row in the other table is deleted, set any foreign keys referencing it to the column's default.
  • ON DELETE SET NULL - If a row in the other table is deleted, set any foreign keys referencing it in this table to null.
  • ON DELETE NO ACTION - This foreign key only marks that it is a foreign key; namely for use in OR mappers.

These same actions also apply to ON UPDATE.

The default seems to depend on which server you're using.


I echo the answer by Dmitriy - very well put.

For those who are worried about the performance overhead FK's often bring, there's a way (in Oracle) you can get the query optimiser advantage of the FK constraint without the cost overhead of constraint validation during insert, delete or update. That is to create the FK constraint with the attributes RELY DISABLE NOVALIDATE. This means the query optimiser ASSUMES that the constraint has been enforced when building queries, without the database actually enforcing the constraint. You have to be very careful here to take the responsibility when you populate a table with an FK constraint like this to make absolutely sure you don't have data in your FK column(s) that violate the constraint, as if you do so you could get unreliable results from queries that involve the table this FK constraint is on.

I usually use this strategy on some tables in my data mart schema, but not in my integrated staging schema. I make sure the tables I am copying data from already have the same constraint enforced, or the ETL routine enforces the constraint.


I also think that foreign keys are a necessity in most databases. The only drawback (besides the performance hit that comes with having enforced consistence) is that having a foreign key allows people to write code that assumes there is a functional foreign key. That should never be allowed.

For example, I've seen people write code that inserts into the referenced table and then attempts inserts into the referencing table without verifying the first insert was successful. If the foreign key is removed at a later time, that results in an inconsistent database.

You also don't have the option of assuming a specific behavior on update or delete. You still need to write your code to do what you want regardless of whether there is a foreign key present. If you assume deletes are cascaded when they are not, your deletes will fail. If you assume updates to the referenced columns are propogated to the referencing rows when they are not, your updates will fail. For the purposes of writing code, you might as well not have those features.

If those features are turned on, then your code will emulate them anyway and you'll lose a little performance.

So, the summary.... Foreign keys are essential if you need a consistent database. Foreign keys should never be assumed to be present or functional in code that you write.


I'm sure there are plenty of applications where you can get away with it, but it's not the best idea. You can't always count on your application to properly manage your database, and frankly managing the database should not be of very much concern to your application.

If you are using a relational database then it seems you ought to have some relationships defined in it. Unfortunately this attitude (you don't need foreign keys) seems to be embraced by a lot of application developers who would rather not be bothered with silly things like data integrity (but need to because their companies don't have dedicated database developers). Usually in databases put together by these types you are lucky just to have primary keys ;)


Many of the people answering here get too hung up on the importance of referential integrity implemented via referential constraints. Working on large databases with referential integrity just does not perform well. Oracle seems particularly bad at cascading deletes. My rule of thumb is that applications should never update the database directly and should be via a stored procedure. This keeps the code base inside the database, and means that the database maintains its integrity.

Where many applications may be accessing the database, problems do arise because of referential integrity constraints but this is down to a control.

There is a wider issue too in that, application developers may have very different requirements that database developers may not necessarily be that familiar with.


"They can make deleting records more cumbersome - you can't delete the "master" record where there are records in other tables where foreign keys would violate that constraint."

It's important to remember that the SQL standard defines actions that are taken when a foreign key is deleted or updated. The ones I know of are:

  • ON DELETE RESTRICT - Prevents any rows in the other table that have keys in this column from being deleted. This is what Ken Ray described above.
  • ON DELETE CASCADE - If a row in the other table is deleted, delete any rows in this table that reference it.
  • ON DELETE SET DEFAULT - If a row in the other table is deleted, set any foreign keys referencing it to the column's default.
  • ON DELETE SET NULL - If a row in the other table is deleted, set any foreign keys referencing it in this table to null.
  • ON DELETE NO ACTION - This foreign key only marks that it is a foreign key; namely for use in OR mappers.

These same actions also apply to ON UPDATE.

The default seems to depend on which server you're using.


Like many things, it's a tradeoff. It's a question of where you want to do the work to verify the data integrity:

(1) use a foreign key (a single point to configure for a table, feature is already implemented, tested, proven to work)

(2) leave it to the users of the database (possible multiple users/apps updating the same table (s) meaning more potential points of failure and increased complexity in testing).

It's more efficient for the database to do (2), easier to maintain and less risk with (1).


One time when an FK might cause you a problem is when you have historical data that references the key (in a lookup table) even though you no longer want the key available.
Obviously the solution is to design things better up front, but I am thinking of real world situations here where you don't always have control of the full solution.
For example: perhaps you have a look up table customer_type that lists different types of customers - lets say you need to remove a certain customer type, but (due to business restraints) aren't able to update the client software, and nobody invisaged this situation when developing the software, the fact that it is a foreign key in some other table may prevent you from removing the row even though you know the historical data that references it is irrelevant.
After being burnt with this a few times you probably lean away from db enforcement of relationships.
(I'm not saying this is good - just giving a reason why you may decide to avoid FKs and db contraints in general)


I can see a few reasons to use foreign keys (Orphaned rows, as someone mentioned, are annoying) but I never use them either. With a relatively sane DB schema, I don't think they are 100% needed. Constraints are good, but enforcing them via software is a better method, I think.

Alex


Reasons to use Foreign Keys:

  • you won't get Orphaned Rows
  • you can get nice "on delete cascade" behavior, automatically cleaning up tables
  • knowing about the relationships between tables in the database helps the Optimizer plan your queries for most efficient execution, since it is able to get better estimates on join cardinality.
  • FKs give a pretty big hint on what statistics are most important to collect on the database, which in turn leads to better performance
  • they enable all kinds of auto-generated support -- ORMs can generate themselves, visualization tools will be able to create nice schema layouts for you, etc.
  • someone new to the project will get into the flow of things faster since otherwise implicit relationships are explicitly documented

Reasons not to use Foreign Keys:

  • you are making the DB work extra on every CRUD operation because it has to check FK consistency. This can be a big cost if you have a lot of churn
  • by enforcing relationships, FKs specify an order in which you have to add/delete things, which can lead to refusal by the DB to do what you want. (Granted, in such cases, what you are trying to do is create an Orphaned Row, and that's not usually a good thing). This is especially painful when you are doing large batch updates, and you load up one table before another, with the second table creating consistent state (but should you be doing that sort of thing if there is a possibility that the second load fails and your database is now inconsistent?).
  • sometimes you know beforehand your data is going to be dirty, you accept that, and you want the DB to accept it
  • you are just being lazy :-)

I think (I am not certain!) that most established databases provide a way to specify a foreign key that is not enforced, and is simply a bit of metadata. Since non-enforcement wipes out every reason not to use FKs, you should probably go that route if any of the reasons in the second section apply.


There's one good reason not to use them: If you don't understand their role or how to use them.

In the wrong situations, foreign key constraints can lead to waterfall replication of accidents. If somebody removes the wrong record, undoing it can become a mammoth task.

Also, conversely, when you need to remove something, if poorly designed, constraints can cause all sorts of locks that prevent you.


This is an issue of upbringing. If somewhere in your educational or professional career you spent time feeding and caring for databases (or worked closely with talented folks who did), then the fundamental tenets of entities and relationships are well-ingrained in your thought process. Among those rudiments is how/when/why to specify keys in your database (primary, foreign and perhaps alternate). It's second nature.

If, however, you've not had such a thorough or positive experience in your past with RDBMS-related endeavors, then you've likely not been exposed to such information. Or perhaps your past includes immersion in an environment that was vociferously anti-database (e.g., "those DBAs are idiots - we few, we chosen few java/c# code slingers will save the day"), in which case you might be vehemently opposed to the arcane babblings of some dweeb telling you that FKs (and the constraints they can imply) really are important if you'd just listen.

Most everyone was taught when they were kids that brushing your teeth was important. Can you get by without it? Sure, but somewhere down the line you'll have less teeth available than you could have if you had brushed after every meal. If moms and dads were responsible enough to cover database design as well as oral hygiene, we wouldn't be having this conversation. :-)


I have to second most of the comments here, Foreign Keys are necessary items to ensure that you have data with integrity. The different options for ON DELETE and ON UPDATE will allow you to get around some of the "down falls" that people mention here regarding their use.

I find that in 99% of all my projects I will have FK's to enforce the integrity of the data, however, there are those rare occasions where I have clients that MUST keep their old data, regardless of how bad it is....but then I spend a lot of time writing code that goes in to only get the valid data anyway, so it becomes pointless.


Examples related to database

Implement specialization in ER diagram phpMyAdmin - Error > Incorrect format parameter? Authentication plugin 'caching_sha2_password' cannot be loaded Room - Schema export directory is not provided to the annotation processor so we cannot export the schema SQL Query Where Date = Today Minus 7 Days MySQL Error: : 'Access denied for user 'root'@'localhost' SQL Server date format yyyymmdd How to create a foreign key in phpmyadmin WooCommerce: Finding the products in database TypeError: tuple indices must be integers, not str

Examples related to database-design

What are OLTP and OLAP. What is the difference between them? How to create a new schema/new user in Oracle Database 11g? What are the lengths of Location Coordinates, latitude and longitude? cannot connect to pc-name\SQLEXPRESS SQL ON DELETE CASCADE, Which Way Does the Deletion Occur? What are the best practices for using a GUID as a primary key, specifically regarding performance? "Prevent saving changes that require the table to be re-created" negative effects Difference between scaling horizontally and vertically for databases Using SQL LOADER in Oracle to import CSV file What is cardinality in Databases?

Examples related to foreign-keys

Migration: Cannot add foreign key constraint The ALTER TABLE statement conflicted with the FOREIGN KEY constraint Can a foreign key refer to a primary key in the same table? Introducing FOREIGN KEY constraint may cause cycles or multiple cascade paths - why? MySQL Error 1215: Cannot add foreign key constraint MySQL Cannot Add Foreign Key Constraint Delete rows with foreign key in PostgreSQL How to remove constraints from my MySQL table? MySQL - Cannot add or update a child row: a foreign key constraint fails How to remove foreign key constraint in sql server?

Examples related to referential-integrity

Create unique constraint with null columns What's wrong with foreign keys?

Examples related to data-integrity

What's wrong with foreign keys?