UTF – Agile BI http://agilebi.com A community for sharing ideas about business intelligence development using agile methods. Sat, 14 Jan 2012 02:02:24 +0000 en-US hourly 1 https://wordpress.org/?v=6.2.2 When To Use NVARCHAR http://agilebi.com/blog/2011/09/09/when-to-use-nvarchar/ Thu, 08 Sep 2011 18:27:27 +0000 http://7.115 Continue reading ]]> The Short Answer

Only when you need to store multi-lingual (unicode) data. 

The Nitty Gritty

The fundamental unit of storage in a SQL Server database is an 8 KB page.  To paraphrase the MSDN, SQL Server reads and writes whole data pages and as a result, the maximum amount of data that can be contained within a single row on a page is 8,060 bytes (8 KB). 

The page size in turn limits the maximum size of VARCHAR, a variable-length non-Unicode character datatype, to 8,000 bytes.  Take note!  The operative word here is non-Unicode.  A non-Unicode character is stored as UTF-8 (8 bits) and requires one byte of storage per character, meaning that the VARCHAR datatype may contain at most 8,000 characters. 

In contrast, NVARCHAR is a variable-length Unicode datatype.  And unlike non-Unicode characters, Unicode characters are stored as UTF-16 (16 bits) and require two bytes of storage per character.   Because of the SQL Server page size, NVARCHAR datatypes have the same length restrictions as their VARCHAR cousins, 8,000 bytes.  This means that an NVARCHAR datatype may contain, at most, 4,000 characters.  The net result is NVARCHAR dataypes take up twice as much space as a VARCHAR datatype.  Said another way, NVARCHAR(4000) is the same size as VARCHAR(8000).

But the NVARCHAR datatype shouldn’t be dismissed entirely.  The English language can be encoded in UTF-8, making VARCHAR the datatype of choice.  However, other languages, such as Japanese, Hebrew, Arabic, etc., have an extended set of character codes that are only found in UTF-16.  In order to store data in these languages you must use the NVARCHAR datatype, but that should be the only time.

Warehouse Beware

The rule for when to use an NVARCHAR datatype applies to both OLTP systems and OLAP systems.  If your OLTP system is riddled with unnecessary NVARCHAR datatypes you may want to consider converting them to VARCHAR in the data warehouse to save space.  It’s tempting just to convert all NVARCHARS to VARCHARS but you always have to ask yourself, was the column defined as an NVARCHAR for a reason?  This is a question you’ll need to have answered by the business users.  Is your company planning to expand their business globally?  If so, it might make sense to keep the NVARCHARs, but only for columns where it really matters.

]]>