I found one single character from the French language that its binary representation is not the same for Unicode and Windows-1252. The oe ligature. Let me rant about it...
create table test_table
(name1 varchar (1) ,
name2 nvarchar (1) )
insert test_table
values ('œ', 'œ')
select * from test_table
select ASCII(name1), ASCII(name2), Unicode(name1), unicode(name2) from test_table
select COL_LENGTH('test_table','name1') as Length1, COL_LENGTH('test_table','name2') as Length2 from test_table
select char(156), char(339)
drop table test_table
This character is used in the word "eggs" in French
'des œufs' means some eggs...
These first two sets of characters from the Latin alphabet have identical Unicode and Windows-1252 (ASCII) character code.
(Basic Latin)
http://en.wikipedia.org/wiki/Basic_Latin_Unicode_block
and
Latin-1:
http://en.wikipedia.org/wiki/Latin-1_Supplement_Unicode_block
Their binary representation is the same as they all fit in one single byte or octet.
However, the extended Latin characters do not have the same Windows-1252 (ASCII) code and Unicode code.
http://en.wikipedia.org/wiki/Latin_Extended-A_Unicode_block
From this character set, French only uses the oe ligature though.
French accents and ligatures and how to type them with the number pad:
a with grave accent
à ALT + 133 À ALT + 0192
a with circumflex
â ALT + 131 Â ALT + 0194
a with tréma
ä ALT + 132 Ä ALT + 142
a e ligature
æ ALT + 145 Æ ALT + 146
c with cedilla
ç ALT + 135 Ç ALT + 128
e with acute accent
é ALT + 130 É ALT + 144
e with grave accent
è ALT + 138 È ALT + 0200
e with circumflex
ê ALT + 136 Ê ALT + 0202
e with tréma
ë ALT + 137 Ë ALT + 0203
i with circumflex
î ALT + 140 Î ALT + 0206
i with tréma
ï ALT + 139 Ï ALT + 0207
o with circumflex
ô ALT + 147 Ô ALT + 0212
o e ligature
œ ALT + 0156 Œ ALT + 0140
u with grave accent
ù ALT + 151 Ù ALT + 0217
u with circumflex
û ALT + 150 Û ALT + 0219
u with tréma
ü ALT + 129 Ü ALT + 154
French quotation marks
« ALT + 174 » ALT + 175
Euro symbol
€ ALT + 0128The Windows-1252 encoding can be seen here:
http://www.visibone.com/htmlref/char/webascii.htm
For more, see
Joel on Software rant :-p
As most questions in technology, the answer is: it depends.
http://stackoverflow.com/questions/35366/varchar-vs-nvarchar-performance
Performance wise, varchar is more efficient, less memory space, 20% to 30% smaller indexes.
Most database drivers will interpret the incoming stream and convert to Windows-1252 encoding, if the server code page is Windows 1252.
If you use way too many characters in the extended Latin group, you have no choice but using nvarchar...if you deal with other languages that are not Romance Languages, you have no choice but nvarchar.
Cheers!
Labels: globalization, localization