To nvarchar or to varchar in SQL Server to accept French text

I found one single character from the French language that its binary representation is not the same for Unicode and Windows-1252. The oe ligature. Let me rant about it…

create table test_table
(name1 varchar (1) ,
name2 nvarchar (1) )

insert test_table
values (‘œ’, ‘œ’)

select * from test_table

select ASCII(name1), ASCII(name2), Unicode(name1), unicode(name2) from test_table

select COL_LENGTH(‘test_table’,’name1′) as Length1, COL_LENGTH(‘test_table’,’name2′) as Length2 from test_table

select char(156), char(339)

drop table test_table

This character is used in the word “eggs” in French
‘des œufs’ means some eggs…

These first two sets of characters from the Latin alphabet have identical Unicode and Windows-1252 (ASCII) character code.
(Basic Latin)
http://en.wikipedia.org/wiki/Basic_Latin_Unicode_block
and
Latin-1:
http://en.wikipedia.org/wiki/Latin-1_Supplement_Unicode_block

Their binary representation is the same as they all fit in one single byte or octet.

However, the extended Latin characters do not have the same Windows-1252 (ASCII) code and Unicode code.
http://en.wikipedia.org/wiki/Latin_Extended-A_Unicode_block

From this character set, French only uses the oe ligature though.

French accents and ligatures and how to type them with the number pad:
a with grave accent
à ALT + 133 À ALT + 0192

a with circumflex
â ALT + 131 Â ALT + 0194

a with tréma
ä ALT + 132 Ä ALT + 142

a e ligature
æ ALT + 145 Æ ALT + 146

c with cedilla
ç ALT + 135 Ç ALT + 128

e with acute accent
é ALT + 130 É ALT + 144

e with grave accent
è ALT + 138 È ALT + 0200

e with circumflex
ê ALT + 136 Ê ALT + 0202

e with tréma
ë ALT + 137 Ë ALT + 0203

i with circumflex
î ALT + 140 Î ALT + 0206

i with tréma
ï ALT + 139 Ï ALT + 0207

o with circumflex
ô ALT + 147 Ô ALT + 0212

o e ligature
œ ALT + 0156 Œ ALT + 0140

u with grave accent
ù ALT + 151 Ù ALT + 0217

u with circumflex
û ALT + 150 Û ALT + 0219

u with tréma
ü ALT + 129 Ü ALT + 154

French quotation marks
« ALT + 174 » ALT + 175

Euro symbol
€ ALT + 0128

The Windows-1252 encoding can be seen here:
http://www.visibone.com/htmlref/char/webascii.htm

For more, see Joel on Software rant :-p

As most questions in technology, the answer is: it depends.

http://stackoverflow.com/questions/35366/varchar-vs-nvarchar-performance

Performance wise, varchar is more efficient, less memory space, 20% to 30% smaller indexes.
Most database drivers will interpret the incoming stream and convert to Windows-1252 encoding, if the server code page is Windows 1252.
If you use way too many characters in the extended Latin group, you have no choice but using nvarchar…if you deal with other languages that are not Romance Languages, you have no choice but nvarchar.

Cheers!

3 thoughts on “To nvarchar or to varchar in SQL Server to accept French text”

  1. I certainly must to get out of my apartment considerably more and cease from wasting a large percentage of of my day practicing poker on myspace or facebook because as of late i have been missing about this. Although I can’t say that I 100% go along with every single concept, its excellent to start reading some smart commentary in place of the normal spam I see floating all over the internet. This is a great post on the subject.

  2. This is a really good read for me. Must admit that you are one of the best bloggers I ever saw. Thanks for posting this informative article.

  3. I have got 1 suggestion for your blog site. It seems like right now there are a number of cascading stylesheet troubles while launching a number of webpages in google chrome and firefox. It is working okay in internet explorer. Probably you can double check that.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.