[sql] What is the difference between a candidate key and a primary key?

Is it that a primary key is the selected candidate key chosen for a given table?

This question is related to sql relational-database

The answer is


There is no difference. A primary key is a candidate key. By convention one candidate key in a relation is usually chosen to be the "primary" one but the choice is essentially arbitrary and a matter of convenience for database users/designers/developers. It doesn't make a "primary" key fundamentally any different to any other candidate key.


A table can have so many column which can uniquely identify a row. This columns are referred as candidate keys, but primary key should be one of them because one primary key is enough for a table. So selection of primary key is important among so many candidate key. Thats the main difference.


If superkey is a big set than candidate key is some smaller set inside big set and primary key any one element(one at a time or for a table) in candidate key set.


Think of a table of vehicles with an integer Primary Key.

The registration number would be a candidate key.

In the real world registration numbers are subject change so it depends somewhat on the circumstances what might qualify as a candidate key.


Primary key -> Any column or set of columns that can uniquely identify a record in the table is a primary key. (There can be only one Primary key in the table)

Candidate key -> Any column or set of columns that are candidate to become primary key are Candidate key. (There can be one or more candidate key(s) in the table, if there is only one candidate key, it can be chosen as Primary key)


A primary key is a column (or columns) in a table that uniquely identifies the rows in that table.

CUSTOMERS


CustomerNo  FirstName   LastName
1   Sally   Thompson
2   Sally   Henderson
3   Harry   Henderson
4   Sandra  Wellington

For example, in the table above, CustomerNo is the primary key.

The values placed in primary key columns must be unique for each row: no duplicates can be tolerated. In addition, nulls are not allowed in primary key columns.

So, having told you that it is possible to use one or more columns as a primary key, how do you decide which columns (and how many) to choose?

Well there are times when it is advisable or essential to use multiple columns. However, if you cannot see an immediate reason to use multiple columns, then use one. This isn't an absolute rule, it is simply advice. However, primary keys made up of single columns are generally easier to maintain and faster in operation. This means that if you query the database, you will usually get the answer back faster if the tables have single column primary keys.

Next question — which column should you pick? The easiest way to choose a column as a primary key (and a method that is reasonably commonly employed) is to get the database itself to automatically allocate a unique number to each row.

In a table of employees, clearly any column like FirstName is a poor choice since you cannot control employee's first names. Often there is only one choice for the primary key, as in the case above. However, if there is more than one, these can be described as 'candidate keys' — the name reflects that they are candidates for the responsible job of primary key.


First you have to know what is a determinant? the determinant is an attribute that used to determine another attribute in the same table. SO the determinant must be a candidate key. And you can have more than one determinant. But primary key is used to determine the whole record and you can have only one primary key. Both primary and candidate key can consist of one or more attributes


Primary key -> Any column or set of columns that can uniquely identify a record in the table is a primary key. (There can be only one Primary key in the table) and the candidate key-> the same as Primary key but the Primary Key chosen by DB administrator's prospective for example(the primary key the least candidate key in size)


A Primary key is a special kind of index in that:

there can be only one;
it cannot be nullable
it must be unique.

Candidate keys are selected from the set of super keys, the only thing we take care while selecting the candidate key is: It should not have any redundant attribute.

Example of an Employee table: Employee ( Employee ID, FullName, SSN, DeptID )

  1. Candidate Key: are individual columns in a table that qualifies for the uniqueness of all the rows. Here in Employee table EmployeeID & SSN are Candidate keys.

  2. Primary Key: are the columns you choose to maintain uniqueness in a table. Here in Employee table, you can choose either EmployeeID or SSN columns, EmployeeID is a preferable choice, as SSN is a secure value.

  3. Alternate Key: Candidate column other the Primary column, like if EmployeeID is PK then SSN would be the Alternate key.

  4. Super Key: If you add any other column/attribute to a Primary Key then it becomes a super key, like EmployeeID + FullName, is a Super Key.

  5. Composite Key: If a table does not have a single column that qualifies for a Candidate key, then you have to select 2 or more columns to make a row unique. Like if there is no EmployeeID or SSN columns, then you can make FullName + DateOfBirth as Composite primary Key. But still, there can be a narrow chance of duplicate row.


John Woo's answer is correct, as far as it goes. Here are a few additional points.

A primary key is always one of the candidate keys. Fairly often, it's the only candidate.

A table with no candidate keys does not represent a relation. If you're using the relational model to help you build a good database, then every table you design will have at least one candidate key.

The relational model would be complete without the concept of primary key. It wasn't in the original presentation of the relational model. As a practical matter, the use of foreign key references without a declared primary key leads to a mess. It could be a logically correct mess, but it's a mess nonetheless. Declaring a primary key lets the DBMS help you enforce the data rules. Most of the time, having the DBMS help you enforce the data rules is a good thing, and well worth the cost.

Some database designers and some users have some mental confusion about whether the primary key identifies a row (record) in a table or an instance of an entity in the subject matter that the table represents. In an ideal world, it's supposed to do both, and there should be a one-for-one correspondence between rows in an entity table and instances of the corresponding entity.

In the real world, things get screwed up. Somebody enters the same new employee twice, and the employee ends up with two ids. Somebody gets hired, but the data entry slips through the cracks in some manual process, and the employee doesn't get an id, until the omission is corrected. A database that does not collapse the first time things get screwed up is more robust than one that does.