[sql-server] What are the main performance differences between varchar and nvarchar SQL Server data types?

I hesitate to add yet another answer here as there are already quite a few, but a few points need to be made that have either not been made or not been made clearly.

First: Do not always use NVARCHAR. That is a very dangerous, and often costly, attitude / approach. And it is no better to say "Never use cursors" since they are sometimes the most efficient means of solving a particular problem, and the common work-around of doing a WHILE loop will almost always be slower than a properly done Cursor.

The only time you should use the term "always" is when advising to "always do what is best for the situation". Granted that is often difficult to determine, especially when trying to balance short-term gains in development time (manager: "we need this feature -- that you didn't know about until just now -- a week ago!") with long-term maintenance costs (manager who initially pressured team to complete a 3-month project in a 3-week sprint: "why are we having these performance problems? How could we have possibly done X which has no flexibility? We can't afford a sprint or two to fix this. What can we get done in a week so we can get back to our priority items? And we definitely need to spend more time in design so this doesn't keep happening!").

Second: @gbn's answer touches on some very important points to consider when making certain data modeling decisions when the path isn't 100% clear. But there is even more to consider:

  • size of transaction log files
  • time it takes to replicate (if using replication)
  • time it takes to ETL (if ETLing)
  • time it takes to ship logs to a remote system and restore (if using Log Shipping)
  • size of backups
  • length of time it takes to complete the backup
  • length of time it takes to do a restore (this might be important some day ;-)
  • size needed for tempdb
  • performance of triggers (for inserted and deleted tables that are stored in tempdb)
  • performance of row versioning (if using SNAPSHOT ISOLATION, since the version store is in tempdb)
  • ability to get new disk space when the CFO says that they just spent $1 million on a SAN last year and so they will not authorize another $250k for additional storage
  • length of time it takes to do INSERT and UPDATE operations
  • length of time it takes to do index maintenance
  • etc, etc, etc.

Wasting space has a huge cascade effect on the entire system. I wrote an article going into explicit detail on this topic: Disk Is Cheap! ORLY? (free registration required; sorry I don't control that policy).

Third: While some answers are incorrectly focusing on the "this is a small app" aspect, and some are correctly suggesting to "use what is appropriate", none of the answers have provided real guidance to the O.P. An important detail mentioned in the Question is that this is a web page for their school. Great! So we can suggest that:

  • Fields for Student and/or Faculty names should probably be NVARCHAR since, over time, it is only getting more likely that names from other cultures will be showing up in those places.
  • But for street address and city names? The purpose of the app was not stated (it would have been helpful) but assuming the address records, if any, pertain to just to a particular geographical region (i.e. a single language / culture), then use VARCHAR with the appropriate Code Page (which is determined from the Collation of the field).
  • If storing State and/or Country ISO codes (no need to store INT / TINYINT since ISO codes are fixed length, human readable, and well, standard :) use CHAR(2) for two letter codes and CHAR(3) if using 3 letter codes. And consider using a binary Collation such as Latin1_General_100_BIN2.
  • If storing postal codes (i.e. zip codes), use VARCHAR since it is an international standard to never use any letter outside of A-Z. And yes, still use VARCHAR even if only storing US zip codes and not INT since zip codes are not numbers, they are strings, and some of them have a leading "0". And consider using a binary Collation such as Latin1_General_100_BIN2.
  • If storing email addresses and/or URLs, use NVARCHAR since both of those can now contain Unicode characters.
  • and so on....

Fourth: Now that you have NVARCHAR data taking up twice as much space than it needs to for data that fits nicely into VARCHAR ("fits nicely" = doesn't turn into "?") and somehow, as if by magic, the application did grow and now there are millions of records in at least one of these fields where most rows are standard ASCII but some contain Unicode characters so you have to keep NVARCHAR, consider the following:

  1. If you are using SQL Server 2008 - 2016 RTM and are on Enterprise Edition, OR if using SQL Server 2016 SP1 (which made Data Compression available in all editions) or newer, then you can enable Data Compression. Data Compression can (but won't "always") compress Unicode data in NCHAR and NVARCHAR fields. The determining factors are:

    1. NCHAR(1 - 4000) and NVARCHAR(1 - 4000) use the Standard Compression Scheme for Unicode, but only starting in SQL Server 2008 R2, AND only for IN ROW data, not OVERFLOW! This appears to be better than the regular ROW / PAGE compression algorithm.
    2. NVARCHAR(MAX) and XML (and I guess also VARBINARY(MAX), TEXT, and NTEXT) data that is IN ROW (not off row in LOB or OVERFLOW pages) can at least be PAGE compressed, but not ROW compressed. Of course, PAGE compression depends on size of the in-row value: I tested with VARCHAR(MAX) and saw that 6000 character/byte rows would not compress, but 4000 character/byte rows did.
    3. Any OFF ROW data, LOB or OVERLOW = No Compression For You!
  2. If using SQL Server 2005, or 2008 - 2016 RTM and not on Enterprise Edition, you can have two fields: one VARCHAR and one NVARCHAR. For example, let's say you are storing URLs which are mostly all base ASCII characters (values 0 - 127) and hence fit into VARCHAR, but sometimes have Unicode characters. Your schema can include the following 3 fields:

      ...
      URLa VARCHAR(2048) NULL,
      URLu NVARCHAR(2048) NULL,
      URL AS (ISNULL(CONVERT(NVARCHAR([URLa])), [URLu])),
      CONSTRAINT [CK_TableName_OneUrlMax] CHECK (
                        ([URLa] IS NOT NULL OR [URLu] IS NOT NULL)
                    AND ([URLa] IS NULL OR [URLu] IS NULL))
    );
    

    In this model you only SELECT from the [URL] computed column. For inserting and updating, you determine which field to use by seeing if converting alters the incoming value, which has to be of NVARCHAR type:

    INSERT INTO TableName (..., URLa, URLu)
    VALUES (...,
            IIF (CONVERT(VARCHAR(2048), @URL) = @URL, @URL, NULL),
            IIF (CONVERT(VARCHAR(2048), @URL) <> @URL, NULL, @URL)
           );
    
  3. You can GZIP incoming values into VARBINARY(MAX) and then unzip on the way out:

    • For SQL Server 2005 - 2014: you can use SQLCLR. SQL# (a SQLCLR library that I wrote) comes with Util_GZip and Util_GUnzip in the Free version
    • For SQL Server 2016 and newer: you can use the built-in COMPRESS and DECOMPRESS functions, which are also GZip.
  4. If using SQL Server 2017 or newer, you can look into making the table a Clustered Columnstore Index.

  5. While this is not a viable option yet, SQL Server 2019 introduces native support for UTF-8 in VARCHAR / CHAR datatypes. There are currently too many bugs with it for it to be used, but if they are fixed, then this is an option for some scenarios. Please see my post, "Native UTF-8 Support in SQL Server 2019: Savior or False Prophet?", for a detailed analysis of this new feature.

Examples related to sql-server

Passing multiple values for same variable in stored procedure SQL permissions for roles Count the Number of Tables in a SQL Server Database Visual Studio 2017 does not have Business Intelligence Integration Services/Projects ALTER TABLE DROP COLUMN failed because one or more objects access this column Create Local SQL Server database How to create temp table using Create statement in SQL Server? SQL Query Where Date = Today Minus 7 Days How do I pass a list as a parameter in a stored procedure? SQL Server date format yyyymmdd

Examples related to sql-server-2005

Add a row number to result set of a SQL query SQL Server : Transpose rows to columns Select info from table where row has max date How to query for Xml values and attributes from table in SQL Server? How to restore SQL Server 2014 backup in SQL Server 2008 SQL Server 2005 Using CHARINDEX() To split a string Is it necessary to use # for creating temp tables in SQL server? SQL Query to find the last day of the month JDBC connection to MSSQL server in windows authentication mode How to convert the system date format to dd/mm/yy in SQL Server 2008 R2?

Examples related to storage

AWS EFS vs EBS vs S3 (differences & when to use?) "Not allowed to load local resource: file:///C:....jpg" Java EE Tomcat How to get my Android device Internal Download Folder path How to Get True Size of MySQL Database? Android saving file to external storage Django CharField vs TextField What column type/length should I use for storing a Bcrypt hashed password in a Database? How to create a file on Android Internal Storage? What are Transient and Volatile Modifiers? Android: Storing username and password?

Examples related to varchar

SQL Server date format yyyymmdd What does it mean when the size of a VARCHAR2 in Oracle is declared as 1 byte? Difference between VARCHAR and TEXT in MySQL PostgreSQL: ERROR: operator does not exist: integer = character varying Can I use VARCHAR as the PRIMARY KEY? Is the LIKE operator case-sensitive with MSSQL Server? How to convert Varchar to Double in sql? SQL Server : error converting data type varchar to numeric What is the MySQL VARCHAR max size? SQL Server Convert Varchar to Datetime

Examples related to nvarchar

Convert NVARCHAR to DATETIME in SQL Server 2008 nvarchar(max) vs NText When must we use NVARCHAR/NCHAR instead of VARCHAR/CHAR in SQL Server? What is the difference between varchar and nvarchar? What are the main performance differences between varchar and nvarchar SQL Server data types?