When To Use NVARCHAR

The Short Answer

Only when you need to store multi-lingual (unicode) data. 

The Nitty Gritty

The fundamental unit of storage in a SQL Server database is an 8 KB page.  To paraphrase the MSDN, SQL Server reads and writes whole data pages and as a result, the maximum amount of data that can be contained within a single row on a page is 8,060 bytes (8 KB). 

The page size in turn limits the maximum size of VARCHAR, a variable-length non-Unicode character datatype, to 8,000 bytes.  Take note!  The operative word here is non-Unicode.  A non-Unicode character is stored as UTF-8 (8 bits) and requires one byte of storage per character, meaning that the VARCHAR datatype may contain at most 8,000 characters. 

In contrast, NVARCHAR is a variable-length Unicode datatype.  And unlike non-Unicode characters, Unicode characters are stored as UTF-16 (16 bits) and require two bytes of storage per character.   Because of the SQL Server page size, NVARCHAR datatypes have the same length restrictions as their VARCHAR cousins, 8,000 bytes.  This means that an NVARCHAR datatype may contain, at most, 4,000 characters.  The net result is NVARCHAR dataypes take up twice as much space as a VARCHAR datatype.  Said another way, NVARCHAR(4000) is the same size as VARCHAR(8000).

But the NVARCHAR datatype shouldn’t be dismissed entirely.  The English language can be encoded in UTF-8, making VARCHAR the datatype of choice.  However, other languages, such as Japanese, Hebrew, Arabic, etc., have an extended set of character codes that are only found in UTF-16.  In order to store data in these languages you must use the NVARCHAR datatype, but that should be the only time.

Warehouse Beware

The rule for when to use an NVARCHAR datatype applies to both OLTP systems and OLAP systems.  If your OLTP system is riddled with unnecessary NVARCHAR datatypes you may want to consider converting them to VARCHAR in the data warehouse to save space.  It’s tempting just to convert all NVARCHARS to VARCHARS but you always have to ask yourself, was the column defined as an NVARCHAR for a reason?  This is a question you’ll need to have answered by the business users.  Is your company planning to expand their business globally?  If so, it might make sense to keep the NVARCHARs, but only for columns where it really matters.

This entry was posted in SQL Server and tagged , , , , , , . Bookmark the permalink.

4 Responses to When To Use NVARCHAR

  1. Hi Melinda!

    I’m so glad you posted your thoughts on this. I totally agree – but since it results in some SSIS conversions I second guessed myself quite recently. Am happy to have heard your opinion on it.

    Take care!
    Melissa

    • mcole says:

      Hi Melissa! :)

      I’ve seen many cases where a field is defined as NVARCHAR in a dimension simply because that is the way it is defined in the source. I am of the very strong opinion that the dimension transform logic, whether it be contained within SQL, SSIS, or something else, is your place to make things “right” with the source data, and that includes using appropriate datatypes.

  2. Pingback: Link Resource # 30 : Nov 01 – Nov 29 « Dactylonomy of Web Resource

  3. Pingback: Link Resource # 33 : DESC 03-DESC 08 « Dactylonomy of Web Resource

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>