[sql-server] What datatype should be used for storing phone numbers in SQL Server 2005?

I need to store phone numbers in a table. Please suggest which datatype should I use? Wait. Please read on before you hit reply..

This field needs to be indexed heavily as Sales Reps can use this field for searching (including wild character search).

As of now, we are expecting phone numbers to come in a number of formats (from an XML file). Do I have to write a parser to convert to a uniform format? There could be millions of data (with duplicates) and I dont want to tie up the server resources (in activities like preprocessing too much) every time some source data comes through..

Any suggestions are welcome..

Update: I have no control over source data. Just that the structure of xml file is standard. Would like to keep the xml parsing to a minimum. Once it is in database, retrieval should be quick. One crazy suggestion going on around here is that it should even work with Ajax AutoComplete feature (so Sales Reps can see the matching ones immediately). OMG!!

This question is related to sql-server indexing

The answer is


We use varchar(15) and certainly index on that field.

The reason being is that International standards can support up to 15 digits

Wikipedia - Telephone Number Formats

If you do support International numbers, I recommend the separate storage of a World Zone Code or Country Code to better filter queries by so that you do not find yourself parsing and checking the length of your phone number fields to limit the returned calls to USA for example


Use SSIS to extract and process the information. That way you will have the processing of the XML files separated from SQL Server. You can also do the SSIS transformations on a separate server if needed. Store the phone numbers in a standard format using VARCHAR. NVARCHAR would be unnecessary since we are talking about numbers and maybe a couple of other chars, like '+', ' ', '(', ')' and '-'.


It is fairly common to use an "x" or "ext" to indicate extensions, so allow 15 characters (for full international support) plus 3 (for "ext") plus 4 (for the extension itself) giving a total of 22 characters. That should keep you safe.

Alternatively, normalise on input so any "ext" gets translated to "x", giving a maximum of 20.


SQL Server 2005 is pretty well optimized for substring queries for text in indexed varchar fields. For 2005 they introduced new statistics to the string summary for index fields. This helps significantly with full text searching.


Normalise the data then store as a varchar. Normalising could be tricky.

That should be a one-time hit. Then as a new record comes in, you're comparing it to normalised data. Should be very fast.


Use a varchar field with a length restriction.


I'm probably missing the obvious here, but wouldn't a varchar just long enough for your longest expected phone number work well?

If I am missing something obvious, I'd love it if someone would point it out...


We use varchar(15) and certainly index on that field.

The reason being is that International standards can support up to 15 digits

Wikipedia - Telephone Number Formats

If you do support International numbers, I recommend the separate storage of a World Zone Code or Country Code to better filter queries by so that you do not find yourself parsing and checking the length of your phone number fields to limit the returned calls to USA for example


Since you need to accommodate many different phone number formats (and probably include things like extensions etc.) it may make the most sense to just treat it as you would any other varchar. If you could control the input, you could take a number of approaches to make the data more useful, but it doesn't sound that way.

Once you decide to simply treat it as any other string, you can focus on overcoming the inevitable issues regarding bad data, mysterious phone number formating and whatever else will pop up. The challenge will be in building a good search strategy for the data and not how you store it in my opinion. It's always a difficult task having to deal with a large pile of data which you had no control over collecting.


We use varchar(15) and certainly index on that field.

The reason being is that International standards can support up to 15 digits

Wikipedia - Telephone Number Formats

If you do support International numbers, I recommend the separate storage of a World Zone Code or Country Code to better filter queries by so that you do not find yourself parsing and checking the length of your phone number fields to limit the returned calls to USA for example


SQL Server 2005 is pretty well optimized for substring queries for text in indexed varchar fields. For 2005 they introduced new statistics to the string summary for index fields. This helps significantly with full text searching.


nvarchar with preprocessing to standardize them as much as possible. You'll probably want to extract extensions and store them in another field.


Use data type long instead.. dont use int because it only allows whole numbers between -32,768 and 32,767 but if you use long data type you can insert numbers between -2,147,483,648 and 2,147,483,647.


Use SSIS to extract and process the information. That way you will have the processing of the XML files separated from SQL Server. You can also do the SSIS transformations on a separate server if needed. Store the phone numbers in a standard format using VARCHAR. NVARCHAR would be unnecessary since we are talking about numbers and maybe a couple of other chars, like '+', ' ', '(', ')' and '-'.


using varchar is pretty inefficient. use the money type and create a user declared type "phonenumber" out of it, and create a rule to only allow positive numbers.

if you declare it as (19,4) you can even store a 4 digit extension and be big enough for international numbers, and only takes 9 bytes of storage. Also, indexes are speedy.


I would use a varchar(22). Big enough to hold a north american phone number with extension. You would want to strip out all the nasty '(', ')', '-' characters, or just parse them all into one uniform format.

Alex


Use SSIS to extract and process the information. That way you will have the processing of the XML files separated from SQL Server. You can also do the SSIS transformations on a separate server if needed. Store the phone numbers in a standard format using VARCHAR. NVARCHAR would be unnecessary since we are talking about numbers and maybe a couple of other chars, like '+', ' ', '(', ')' and '-'.


Normalise the data then store as a varchar. Normalising could be tricky.

That should be a one-time hit. Then as a new record comes in, you're comparing it to normalised data. Should be very fast.


Use a varchar field with a length restriction.


Since you need to accommodate many different phone number formats (and probably include things like extensions etc.) it may make the most sense to just treat it as you would any other varchar. If you could control the input, you could take a number of approaches to make the data more useful, but it doesn't sound that way.

Once you decide to simply treat it as any other string, you can focus on overcoming the inevitable issues regarding bad data, mysterious phone number formating and whatever else will pop up. The challenge will be in building a good search strategy for the data and not how you store it in my opinion. It's always a difficult task having to deal with a large pile of data which you had no control over collecting.


Use CHAR(10) if you are storing US Phone numbers only. Remove everything but the digits.


I would use a varchar(22). Big enough to hold a north american phone number with extension. You would want to strip out all the nasty '(', ')', '-' characters, or just parse them all into one uniform format.

Alex


Normalise the data then store as a varchar. Normalising could be tricky.

That should be a one-time hit. Then as a new record comes in, you're comparing it to normalised data. Should be very fast.


I'm probably missing the obvious here, but wouldn't a varchar just long enough for your longest expected phone number work well?

If I am missing something obvious, I'd love it if someone would point it out...


Use CHAR(10) if you are storing US Phone numbers only. Remove everything but the digits.


Use a varchar field with a length restriction.


Use CHAR(10) if you are storing US Phone numbers only. Remove everything but the digits.


Normalise the data then store as a varchar. Normalising could be tricky.

That should be a one-time hit. Then as a new record comes in, you're comparing it to normalised data. Should be very fast.


SQL Server 2005 is pretty well optimized for substring queries for text in indexed varchar fields. For 2005 they introduced new statistics to the string summary for index fields. This helps significantly with full text searching.


I realize this thread is old, but it's worth mentioning an advantage of storing as a numeric type for formatting purposes, specifically in .NET framework.

IE

.DefaultCellStyle.Format = "(###)###-####" // Will not work on a string

I'm probably missing the obvious here, but wouldn't a varchar just long enough for your longest expected phone number work well?

If I am missing something obvious, I'd love it if someone would point it out...


It is always better to have separate tables for multi valued attributes like phone number.

As you have no control on source data so, you can parse the data from XML file and convert it into the proper format so that there will not be any issue with formats of a particular country and store it in a separate table so that indexing and retrieval both will be efficient.

Thank you.


It is fairly common to use an "x" or "ext" to indicate extensions, so allow 15 characters (for full international support) plus 3 (for "ext") plus 4 (for the extension itself) giving a total of 22 characters. That should keep you safe.

Alternatively, normalise on input so any "ext" gets translated to "x", giving a maximum of 20.


Use SSIS to extract and process the information. That way you will have the processing of the XML files separated from SQL Server. You can also do the SSIS transformations on a separate server if needed. Store the phone numbers in a standard format using VARCHAR. NVARCHAR would be unnecessary since we are talking about numbers and maybe a couple of other chars, like '+', ' ', '(', ')' and '-'.


Use data type long instead.. dont use int because it only allows whole numbers between -32,768 and 32,767 but if you use long data type you can insert numbers between -2,147,483,648 and 2,147,483,647.


Use a varchar field with a length restriction.


nvarchar with preprocessing to standardize them as much as possible. You'll probably want to extract extensions and store them in another field.


It is always better to have separate tables for multi valued attributes like phone number.

As you have no control on source data so, you can parse the data from XML file and convert it into the proper format so that there will not be any issue with formats of a particular country and store it in a separate table so that indexing and retrieval both will be efficient.

Thank you.


nvarchar with preprocessing to standardize them as much as possible. You'll probably want to extract extensions and store them in another field.


We use varchar(15) and certainly index on that field.

The reason being is that International standards can support up to 15 digits

Wikipedia - Telephone Number Formats

If you do support International numbers, I recommend the separate storage of a World Zone Code or Country Code to better filter queries by so that you do not find yourself parsing and checking the length of your phone number fields to limit the returned calls to USA for example


I'm probably missing the obvious here, but wouldn't a varchar just long enough for your longest expected phone number work well?

If I am missing something obvious, I'd love it if someone would point it out...


I realize this thread is old, but it's worth mentioning an advantage of storing as a numeric type for formatting purposes, specifically in .NET framework.

IE

.DefaultCellStyle.Format = "(###)###-####" // Will not work on a string

nvarchar with preprocessing to standardize them as much as possible. You'll probably want to extract extensions and store them in another field.


I would use a varchar(22). Big enough to hold a north american phone number with extension. You would want to strip out all the nasty '(', ')', '-' characters, or just parse them all into one uniform format.

Alex


SQL Server 2005 is pretty well optimized for substring queries for text in indexed varchar fields. For 2005 they introduced new statistics to the string summary for index fields. This helps significantly with full text searching.


Since you need to accommodate many different phone number formats (and probably include things like extensions etc.) it may make the most sense to just treat it as you would any other varchar. If you could control the input, you could take a number of approaches to make the data more useful, but it doesn't sound that way.

Once you decide to simply treat it as any other string, you can focus on overcoming the inevitable issues regarding bad data, mysterious phone number formating and whatever else will pop up. The challenge will be in building a good search strategy for the data and not how you store it in my opinion. It's always a difficult task having to deal with a large pile of data which you had no control over collecting.


using varchar is pretty inefficient. use the money type and create a user declared type "phonenumber" out of it, and create a rule to only allow positive numbers.

if you declare it as (19,4) you can even store a 4 digit extension and be big enough for international numbers, and only takes 9 bytes of storage. Also, indexes are speedy.