To nvarchar or to varchar in SQL Server to accept French text

I found one single character from the French language that its binary representation is not the same for Unicode and Windows-1252. The oe ligature. Let me rant about it…

create table test_table
(name1 varchar (1) ,
name2 nvarchar (1) )

insert test_table
values (‘œ’, ‘œ’)

select * from test_table

select ASCII(name1), ASCII(name2), Unicode(name1), unicode(name2) from test_table

select COL_LENGTH(‘test_table’,’name1′) as Length1, COL_LENGTH(‘test_table’,’name2′) as Length2 from test_table

select char(156), char(339)

drop table test_table

This character is used in the word “eggs” in French
‘des œufs’ means some eggs…

These first two sets of characters from the Latin alphabet have identical Unicode and Windows-1252 (ASCII) character code.
(Basic Latin)
http://en.wikipedia.org/wiki/Basic_Latin_Unicode_block
and
Latin-1:
http://en.wikipedia.org/wiki/Latin-1_Supplement_Unicode_block

Their binary representation is the same as they all fit in one single byte or octet.

However, the extended Latin characters do not have the same Windows-1252 (ASCII) code and Unicode code.
http://en.wikipedia.org/wiki/Latin_Extended-A_Unicode_block

From this character set, French only uses the oe ligature though.

French accents and ligatures and how to type them with the number pad:
a with grave accent
à ALT + 133 À ALT + 0192

a with circumflex
â ALT + 131 Â ALT + 0194

a with tréma
ä ALT + 132 Ä ALT + 142

a e ligature
æ ALT + 145 Æ ALT + 146

c with cedilla
ç ALT + 135 Ç ALT + 128

e with acute accent
é ALT + 130 É ALT + 144

e with grave accent
è ALT + 138 È ALT + 0200

e with circumflex
ê ALT + 136 Ê ALT + 0202

e with tréma
ë ALT + 137 Ë ALT + 0203

i with circumflex
î ALT + 140 Î ALT + 0206

i with tréma
ï ALT + 139 Ï ALT + 0207

o with circumflex
ô ALT + 147 Ô ALT + 0212

o e ligature
œ ALT + 0156 Œ ALT + 0140

u with grave accent
ù ALT + 151 Ù ALT + 0217

u with circumflex
û ALT + 150 Û ALT + 0219

u with tréma
ü ALT + 129 Ü ALT + 154

French quotation marks
« ALT + 174 » ALT + 175

Euro symbol
€ ALT + 0128

The Windows-1252 encoding can be seen here:
http://www.visibone.com/htmlref/char/webascii.htm

For more, see Joel on Software rant :-p

As most questions in technology, the answer is: it depends.

http://stackoverflow.com/questions/35366/varchar-vs-nvarchar-performance

Performance wise, varchar is more efficient, less memory space, 20% to 30% smaller indexes.
Most database drivers will interpret the incoming stream and convert to Windows-1252 encoding, if the server code page is Windows 1252.
If you use way too many characters in the extended Latin group, you have no choice but using nvarchar…if you deal with other languages that are not Romance Languages, you have no choice but nvarchar.

Cheers!