[sql] How to create a unique index on a NULL column?

I am using SQL Server 2005. I want to constrain the values in a column to be unique, while allowing NULLS.

My current solution involves a unique index on a view like so:

CREATE VIEW vw_unq WITH SCHEMABINDING AS
    SELECT Column1
      FROM MyTable
     WHERE Column1 IS NOT NULL

CREATE UNIQUE CLUSTERED INDEX unq_idx ON vw_unq (Column1)

Any better ideas?

This question is related to sql sql-server indexing constraints unique

The answer is


Pretty sure you can't do that, as it violates the purpose of uniques.

However, this person seems to have a decent work around: http://sqlservercodebook.blogspot.com/2008/04/multiple-null-values-in-unique-index-in.html


The calculated column trick is widely known as a "nullbuster"; my notes credit Steve Kass:

CREATE TABLE dupNulls (
pk int identity(1,1) primary key,
X  int NULL,
nullbuster as (case when X is null then pk else 0 end),
CONSTRAINT dupNulls_uqX UNIQUE (X,nullbuster)
)

It is possible to use filter predicates to specify which rows to include in the index.

From the documentation:

WHERE <filter_predicate> Creates a filtered index by specifying which rows to include in the index. The filtered index must be a nonclustered index on a table. Creates filtered statistics for the data rows in the filtered index.

Example:

CREATE TABLE Table1 (
  NullableCol int NULL
)

CREATE UNIQUE INDEX IX_Table1 ON Table1 (NullableCol) WHERE NullableCol IS NOT NULL;

Using SQL Server 2008, you can create a filtered index: http://msdn.microsoft.com/en-us/library/cc280372.aspx. (I see Simon added this as a comment, but thought it deserved its own answer as the comment is easily missed.)

Another option is a trigger to check uniqueness, but this could affect performance.


Strictly speaking, a unique nullable column (or set of columns) can be NULL (or a record of NULLs) only once, since having the same value (and this includes NULL) more than once obviously violates the unique constraint.

However, that doesn't mean the concept of "unique nullable columns" is valid; to actually implement it in any relational database we just have to bear in mind that this kind of databases are meant to be normalized to properly work, and normalization usually involves the addition of several (non-entity) extra tables to establish relationships between the entities.

Let's work a basic example considering only one "unique nullable column", it's easy to expand it to more such columns.

Suppose we the information represented by a table like this:

create table the_entity_incorrect
(
  id integer,
  uniqnull integer null, /* we want this to be "unique and nullable" */
  primary key (id)
);

We can do it by putting uniqnull apart and adding a second table to establish a relationship between uniqnull values and the_entity (rather than having uniqnull "inside" the_entity):

create table the_entity
(
  id integer,
  primary key(id)
);

create table the_relation
(
  the_entity_id integer not null,
  uniqnull integer not null,

  unique(the_entity_id),
  unique(uniqnull),
  /* primary key can be both or either of the_entity_id or uniqnull */
  primary key (the_entity_id, uniqnull), 
  foreign key (the_entity_id) references the_entity(id)
);

To associate a value of uniqnull to a row in the_entity we need to also add a row in the_relation.

For rows in the_entity were no uniqnull values are associated (i.e. for the ones we would put NULL in the_entity_incorrect) we simply do not add a row in the_relation.

Note that values for uniqnull will be unique for all the_relation, and also notice that for each value in the_entity there can be at most one value in the_relation, since the primary and foreign keys on it enforce this.

Then, if a value of 5 for uniqnull is to be associated with an the_entity id of 3, we need to:

start transaction;
insert into the_entity (id) values (3); 
insert into the_relation (the_entity_id, uniqnull) values (3, 5);
commit;

And, if an id value of 10 for the_entity has no uniqnull counterpart, we only do:

start transaction;
insert into the_entity (id) values (10); 
commit;

To denormalize this information and obtain the data a table like the_entity_incorrect would hold, we need to:

select
  id, uniqnull
from
  the_entity left outer join the_relation
on
  the_entity.id = the_relation.the_entity_id
;

The "left outer join" operator ensures all rows from the_entity will appear in the result, putting NULL in the uniqnull column when no matching columns are present in the_relation.

Remember, any effort spent for some days (or weeks or months) in designing a well normalized database (and the corresponding denormalizing views and procedures) will save you years (or decades) of pain and wasted resources.


The calculated column trick is widely known as a "nullbuster"; my notes credit Steve Kass:

CREATE TABLE dupNulls (
pk int identity(1,1) primary key,
X  int NULL,
nullbuster as (case when X is null then pk else 0 end),
CONSTRAINT dupNulls_uqX UNIQUE (X,nullbuster)
)

It is possible to use filter predicates to specify which rows to include in the index.

From the documentation:

WHERE <filter_predicate> Creates a filtered index by specifying which rows to include in the index. The filtered index must be a nonclustered index on a table. Creates filtered statistics for the data rows in the filtered index.

Example:

CREATE TABLE Table1 (
  NullableCol int NULL
)

CREATE UNIQUE INDEX IX_Table1 ON Table1 (NullableCol) WHERE NullableCol IS NOT NULL;

Strictly speaking, a unique nullable column (or set of columns) can be NULL (or a record of NULLs) only once, since having the same value (and this includes NULL) more than once obviously violates the unique constraint.

However, that doesn't mean the concept of "unique nullable columns" is valid; to actually implement it in any relational database we just have to bear in mind that this kind of databases are meant to be normalized to properly work, and normalization usually involves the addition of several (non-entity) extra tables to establish relationships between the entities.

Let's work a basic example considering only one "unique nullable column", it's easy to expand it to more such columns.

Suppose we the information represented by a table like this:

create table the_entity_incorrect
(
  id integer,
  uniqnull integer null, /* we want this to be "unique and nullable" */
  primary key (id)
);

We can do it by putting uniqnull apart and adding a second table to establish a relationship between uniqnull values and the_entity (rather than having uniqnull "inside" the_entity):

create table the_entity
(
  id integer,
  primary key(id)
);

create table the_relation
(
  the_entity_id integer not null,
  uniqnull integer not null,

  unique(the_entity_id),
  unique(uniqnull),
  /* primary key can be both or either of the_entity_id or uniqnull */
  primary key (the_entity_id, uniqnull), 
  foreign key (the_entity_id) references the_entity(id)
);

To associate a value of uniqnull to a row in the_entity we need to also add a row in the_relation.

For rows in the_entity were no uniqnull values are associated (i.e. for the ones we would put NULL in the_entity_incorrect) we simply do not add a row in the_relation.

Note that values for uniqnull will be unique for all the_relation, and also notice that for each value in the_entity there can be at most one value in the_relation, since the primary and foreign keys on it enforce this.

Then, if a value of 5 for uniqnull is to be associated with an the_entity id of 3, we need to:

start transaction;
insert into the_entity (id) values (3); 
insert into the_relation (the_entity_id, uniqnull) values (3, 5);
commit;

And, if an id value of 10 for the_entity has no uniqnull counterpart, we only do:

start transaction;
insert into the_entity (id) values (10); 
commit;

To denormalize this information and obtain the data a table like the_entity_incorrect would hold, we need to:

select
  id, uniqnull
from
  the_entity left outer join the_relation
on
  the_entity.id = the_relation.the_entity_id
;

The "left outer join" operator ensures all rows from the_entity will appear in the result, putting NULL in the uniqnull column when no matching columns are present in the_relation.

Remember, any effort spent for some days (or weeks or months) in designing a well normalized database (and the corresponding denormalizing views and procedures) will save you years (or decades) of pain and wasted resources.


Using SQL Server 2008, you can create a filtered index: http://msdn.microsoft.com/en-us/library/cc280372.aspx. (I see Simon added this as a comment, but thought it deserved its own answer as the comment is easily missed.)

Another option is a trigger to check uniqueness, but this could affect performance.


Examples related to sql

Passing multiple values for same variable in stored procedure SQL permissions for roles Generic XSLT Search and Replace template Access And/Or exclusions Pyspark: Filter dataframe based on multiple conditions Subtracting 1 day from a timestamp date PYODBC--Data source name not found and no default driver specified select rows in sql with latest date for each ID repeated multiple times ALTER TABLE DROP COLUMN failed because one or more objects access this column Create Local SQL Server database

Examples related to sql-server

Passing multiple values for same variable in stored procedure SQL permissions for roles Count the Number of Tables in a SQL Server Database Visual Studio 2017 does not have Business Intelligence Integration Services/Projects ALTER TABLE DROP COLUMN failed because one or more objects access this column Create Local SQL Server database How to create temp table using Create statement in SQL Server? SQL Query Where Date = Today Minus 7 Days How do I pass a list as a parameter in a stored procedure? SQL Server date format yyyymmdd

Examples related to indexing

numpy array TypeError: only integer scalar arrays can be converted to a scalar index How to print a specific row of a pandas DataFrame? What does 'index 0 is out of bounds for axis 0 with size 0' mean? How does String.Index work in Swift Pandas KeyError: value not in index Update row values where certain condition is met in pandas Pandas split DataFrame by column value Rebuild all indexes in a Database How are iloc and loc different? pandas loc vs. iloc vs. at vs. iat?

Examples related to constraints

How to trap on UIViewAlertForUnsatisfiableConstraints? trying to animate a constraint in swift Remove all constraints affecting a UIView "Insert if not exists" statement in SQLite Unique Key constraints for multiple columns in Entity Framework SQL Server 2008- Get table constraints How to remove constraints from my MySQL table? Creating layout constraints programmatically Display names of all constraints for a table in Oracle SQL Add primary key to existing table

Examples related to unique

Count unique values with pandas per groups Find the unique values in a column and then sort them How can I check if the array of objects have duplicate property values? Firebase: how to generate a unique numeric ID for key? pandas unique values multiple columns Select unique values with 'select' function in 'dplyr' library Generate 'n' unique random numbers within a range SQL - select distinct only on one column Can I use VARCHAR as the PRIMARY KEY? Count unique values in a column in Excel