What are the main performance differences between varchar and nvarchar SQL Server data types

Question

I m working on a database for a small web app at my school using SQL Server 2005  I see a couple of schools of thought on the issue of varchar vs nvarchar    Use varchar unless you deal with a lot of internationalized data  then use nvarchar  Just use nvarchar for everything    I m beginning to see the merits of view 2   I know that nvarchar does take up twice as much space  but that isn t necessarily a huge deal since this is only going to store data for a few hundred students   To me it seems like it would be easiest not to worry about it and just allow everything to use nvarchar   Or is there something I m missing

User · Answer

For your application  nvarchar is fine because the database size is small   Saying  always use nvarchar  is a vast oversimplification   If you re not required to store things like Kanji or other crazy characters  use VARCHAR  it ll use a lot less space   My predecessor at my current job designed something using NVARCHAR when it wasn t needed   We recently switched it to VARCHAR and saved 15 GB on just that table  it was highly written to    Furthermore  if you then have an index on that table and you want to include that column or make a composite index  you ve just made your index file size larger   Just be thoughtful in your decision  in SQL development and data definitions there seems to rarely be a  default answer   other than avoid cursors at all costs  of course

User · Answer

Always use nvarchar    You may never need the double-byte characters for most applications   However  if you need to support double-byte languages and you only have single-byte support in your database schema it s really expensive to go back and modify throughout your application     The cost of migrating one application from varchar to nvarchar will be much more than the little bit of extra disk space you ll use in most applications

User · Answer

If you are using NVARCHAR just because a system stored procedure requires it  the most frequent occurrence being inexplicably sp executesql  and your dynamic SQL is very long  you would be better off from performance perspective doing all string manipulations  concatenation  replacement etc   in VARCHAR then converting the end result to NVARCHAR and feeding it into the proc parameter  So no  do not always use NVARCHAR

User · Answer

Generally speaking  Start out with the most expensive datatype that has the least constraints  Put it in production  If performance starts to be an issue  find out what s actually being stored in those nvarchar columns  Is there any characters in there that wouldn t fit into varchar  If not  switch to varchar  Don t try to pre-optimize before you know where the pain is  My guess is that the choice between nvarchar varchar is not what s going to slow down your application in the foreseable future  There will be other parts of the application where performance tuning will give you much more bang for the bucks

User · Answer

I can speak from experience on this  beware of nvarchar   Unless you absolutely require it this data field type destroys performance on larger database   I inherited a database that was hurting in terms of performance and space   We were able to reduce a 30GB database in size by 70   There were some other modifications made to help with performance but I m sure the varchar s helped out significantly with that as well   If your database has the potential for growing tables to a million   records stay away from nvarchar at all costs

User · Answer

There ll be exceptional instances when you ll want to deliberately restrict the data type to ensure it doesn t contain characters from a certain set  For example  I had a scenario where I needed to store the domain name in a database  Internationalisation for domain names wasn t reliable at the time so it was better to restrict the input at the base level  and help to avoid any potential issues

User · Answer

I hesitate to add yet another answer here as there are already quite a few  but a few points need to be made that have either not been made or not been made clearly   First  Do not always use NVARCHAR  That is a very dangerous  and often costly  attitude   approach  And it is no better to say  Never use cursors  since they are sometimes the most efficient means of solving a particular problem  and the common work-around of doing a WHILE loop will almost always be slower than a properly done Cursor   The only time you should use the term  always  is when advising to  always do what is best for the situation   Granted that is often difficult to determine  especially when trying to balance short-term gains in development time  manager   we need this feature -- that you didn t know about until just now -- a week ago    with long-term maintenance costs  manager who initially pressured team to complete a 3-month project in a 3-week sprint   why are we having these performance problems  How could we have possibly done X which has no flexibility  We can t afford a sprint or two to fix this  What can we get done in a week so we can get back to our priority items  And we definitely need to spend more time in design so this doesn t keep happening       Second   gbn s answer touches on some very important points to consider when making certain data modeling decisions when the path isn t 100  clear  But there is even more to consider    size of transaction log files time it takes to replicate  if using replication  time it takes to ETL  if ETLing  time it takes to ship logs to a remote system and restore  if using Log Shipping  size of backups length of time it takes to complete the backup length of time it takes to do a restore  this might be important some day  -  size needed for tempdb performance of triggers  for inserted and deleted tables that are stored in tempdb  performance of row versioning  if using SNAPSHOT ISOLATION  since the version store is in tempdb  ability to get new disk space when the CFO says that they just spent  1 million on a SAN last year and so they will not authorize another  250k for additional storage length of time it takes to do INSERT and UPDATE operations length of time it takes to do index maintenance etc  etc  etc    Wasting space has a huge cascade effect on the entire system  I wrote an article going into explicit detail on this topic  Disk Is Cheap  ORLY   free registration required  sorry I don t control that policy    Third  While some answers are incorrectly focusing on the  this is a small app  aspect  and some are correctly suggesting to  use what is appropriate   none of the answers have provided real guidance to the O P  An important detail mentioned in the Question is that this is a web page for their school  Great  So we can suggest that    Fields for Student and or Faculty names should probably be NVARCHAR since  over time  it is only getting more likely that names from other cultures will be showing up in those places  But for street address and city names  The purpose of the app was not stated  it would have been helpful  but assuming the address records  if any  pertain to just to a particular geographical region  i e  a single language   culture   then use VARCHAR with the appropriate Code Page  which is determined from the Collation of the field   If storing State and or Country ISO codes  no need to store INT   TINYINT since ISO codes are fixed length  human readable  and well  standard    use CHAR 2  for two letter codes and CHAR 3  if using 3 letter codes  And consider using a binary Collation such as Latin1 General 100 BIN2  If storing postal codes  i e  zip codes   use VARCHAR since it is an international standard to never use any letter outside of A-Z  And yes  still use VARCHAR even if only storing US zip codes and not INT since zip codes are not numbers  they are strings  and some of them have a leading  0   And consider using a binary Collation such as Latin1 General 100 BIN2  If storing email addresses and or URLs  use NVARCHAR since both of those can now contain Unicode characters  and so on       Fourth  Now that you have NVARCHAR data taking up twice as much space than it needs to for data that fits nicely into VARCHAR   fits nicely    doesn t turn into      and somehow  as if by magic  the application did grow and now there are millions of records in at least one of these fields where most rows are standard ASCII but some contain Unicode characters so you have to keep NVARCHAR  consider the following    If you are using SQL Server 2008 - 2016 RTM and are on Enterprise Edition  OR if using SQL Server 2016 SP1  which made Data Compression available in all editions  or newer  then you can enable Data Compression  Data Compression can  but won t  always   compress Unicode data in NCHAR and NVARCHAR fields  The determining factors are    NCHAR 1 - 4000  and NVARCHAR 1 - 4000  use the Standard Compression Scheme for Unicode  but only starting in SQL Server 2008 R2  AND only for IN ROW data  not OVERFLOW  This appears to be better than the regular ROW   PAGE compression algorithm  NVARCHAR MAX  and XML  and I guess also VARBINARY MAX   TEXT  and NTEXT  data that is IN ROW  not off row in LOB or OVERFLOW pages  can at least be PAGE compressed  but not ROW compressed  Of course  PAGE compression depends on size of the in-row value  I tested with VARCHAR MAX  and saw that 6000 character byte rows would not compress  but 4000 character byte rows did  Any OFF ROW data  LOB or OVERLOW   No Compression For You   If using SQL Server 2005  or 2008 - 2016 RTM and not on Enterprise Edition  you can have two fields  one VARCHAR and one NVARCHAR  For example  let s say you are storing URLs which are mostly all base ASCII characters  values 0 - 127  and hence fit into VARCHAR  but sometimes have Unicode characters  Your schema can include the following 3 fields           URLa VARCHAR 2048  NULL    URLu NVARCHAR 2048  NULL    URL AS  ISNULL CONVERT NVARCHAR  URLa      URLu       CONSTRAINT  CK TableName OneUrlMax  CHECK                         URLa  IS NOT NULL OR  URLu  IS NOT NULL                  AND   URLa  IS NULL OR  URLu  IS NULL        In this model you only SELECT from the  URL  computed column  For inserting and updating  you determine which field to use by seeing if converting alters the incoming value  which has to be of NVARCHAR type   INSERT INTO TableName       URLa  URLu  VALUES               IIF  CONVERT VARCHAR 2048    URL     URL   URL  NULL           IIF  CONVERT VARCHAR 2048    URL   lt  gt   URL  NULL   URL             You can GZIP incoming values into VARBINARY MAX  and then unzip on the way out    For SQL Server 2005 - 2014  you can use SQLCLR  SQL   a SQLCLR library that I wrote  comes with Util GZip and Util GUnzip in the Free version For SQL Server 2016 and newer  you can use the built-in COMPRESS and DECOMPRESS functions  which are also GZip   If using SQL Server 2017 or newer  you can look into making the table a Clustered Columnstore Index  While this is not a viable option yet  SQL Server 2019 introduces native support for UTF-8 in VARCHAR   CHAR datatypes  There are currently too many bugs with it for it to be used  but if they are fixed  then this is an option for some scenarios  Please see my post   Native UTF-8 Support in SQL Server 2019  Savior or False Prophet    for a detailed analysis of this new feature

User · Answer

For that last few years all of our projects have used NVARCHAR for everything  since all of these projects are multilingual  Imported data from external sources  e g  an ASCII file  etc   is up-converted to Unicode before being inserted into the database   I ve yet to encounter any performance-related issues from the larger indexes  etc  The indexes do use more memory  but memory is cheap   Whether you use stored procedures or construct SQL on the fly ensure that all string constants are prefixed with N  e g  SET  foo   N Hello world     so the constant is also Unicode  This avoids any string type conversion at runtime   YMMV

User · Answer

Be consistent  JOIN-ing a VARCHAR to NVARCHAR has a big performance hit

User · Answer

Disk space is not the issue    but memory and performance will be  Double the page reads  double index size  strange LIKE and   constant behaviour etc  Do you need to store Chinese etc script  Yes or no     And from MS BOL  Storage and Performance Effects of Unicode    Edit   Recent SO question highlighting how bad nvarchar performance can be     SQL Server uses high CPU when searching inside nvarchar strings

User · Answer

nvarchar is going to have significant overhead in memory  storage  working set and indexing  so if the specs dictate that it really will never be necessary  don t bother   I would not have a hard and fast  always nvarchar  rule because it can be a complete waste in many situations - particularly ETL from ASCII EBCDIC or identifiers and code columns which are often keys and foreign keys   On the other hand  there are plenty of cases of columns  where I would be sure to ask this question early and if I didn t get a hard and fast answer immediately  I would make the column nvarchar

User · Answer

I deal with this question at work often    FTP feeds of inventory and pricing - Item descriptions and other text were in nvarchar when varchar worked fine  Converting these to varchar reduced file size almost in half and really helped with uploads  The above scenario worked fine until someone put a special character in the item description  maybe trademark  can t remember    I still do not use nvarchar every time over varchar  If there is any doubt or potential for special characters  I use nvarchar  I find I use varchar mostly when I am in 100  control of what is populating the field

User · Answer

Since your application is small  there is essentially no appreciable cost increase to using nvarchar over varchar  and you save yourself potential headaches down the road if you have a need to store unicode data

User · Answer

Why  in all this discussion  has there been no mention of UTF-8    Being able to store the full unicode span of characters does not mean one has to always allocate two-bytes-per-character  or  code point  to use the UNICODE term    All of ASCII is UTF-8    Does SQL Server check for VARCHAR   fields that the text is strict ASCII  i e  top byte bit zero    I would hope not   If then you want to store unicode and want compatibility with older ASCII-only applications   I would think using VARCHAR   and UTF-8 would be the magic bullet   It only uses more space when it needs to   For those of you unfamiliar with UTF-8  might I recommend a primer

[sql-server] What are the main performance differences between varchar and nvarchar SQL Server data types?

Examples related to sql-server

Examples related to sql-server-2005

Examples related to storage

Examples related to varchar

Examples related to nvarchar