Encoding troubles, wait, your ANSI file is not the same as my ANSI file

Last week we made a utility for the release team to convert all the t-sql script files from any encoding to ANSI. Now we convert any encoding to Unicode, but the original request was to use ANSI encoding.

The .NET code we used basically opens with a StreamReader that detects encoding, opens a StreamWriter to a new file with Encoding.Default (now Encoding.Unicode) and writes the content read by the StreamReader.

The problem started when some developers submitted files saved with ANSI encoding. The tool always detected the encoding as US-ASCII, which has only 7 bits for character representation, while the file had accented letters that were lost in the conversion.

I was blaming StreamReader for not detecting the encoding properly until I found the article below on http://weblogs.asp.net/ahoffman/archive/2004/01/19/60094.aspx

A question posted on the Australian DOTNET Developer Mailing List …

Im having a character encoding problem that surprises me. In my C# code I have a string ” 2004″ (thats a copyright/space/2/0/0/4). When I convert this string to bytes using the ASCIIEncoding.GetBytes method I get (in hex):

3F 20 32 30 30 34

The first character (the copyright) is converted into a literal ‘?’ question mark. I need to get the result 0xA92032303034, which has 0xA9 for the copyright, just as happens when the text is saved in notepad

An ASCII encoding provides for 7 bit characters and therefore only supports the first 128 unicode characters. All characters outside that range will display an unknown symbol – typically a “?” (0x3f) or “|” (0x7f) symbol.

That explains the first byte returned using ASCIIEncoding.GetBytes()

> 3F 20 32 30 30 34

What your trying to achieve is an ANSI encoding of the string. To get an ANSI encoding you need to specify a “code page” which prescribes the characters from 128 on up. For example, the following code will produce the result you expect…

string s = ” 2004″;
Encoding targetEncoding = Encoding.GetEncoding(1252);
foreach (byte b in targetEncoding.GetBytes(s))
Console.Write(“{0:x} “, b);

> a9 20 32 30 30 34

1252 represents the code page for Western European (Windows) which is probably what your using (Encoding.Default.EncodingName). Specifying a different code page say for Simplified Chinese (54936) will produce a different result.

Ideally you should use the code page actually in use on the system as follows…

string s = ” 2004″;
Encoding targetEncoding = Encoding.Default;
foreach (byte b in targetEncoding.GetBytes(s))
Console.Write(“{0:x} “, b);

> (can depend on where you are!)

All this is particularly important if your application uses streams to write to disk. Unless care is taken, someone in another country (represented by a different code page) could write text to disk via a Stream within your application and get unexpected results when reading back the text.

In short,always specify an encoding when creating a StreamReader or StreamWriter – for example…

Our code was initially as follows:

StreamReader SR = new StreamReader(myfile, true);
String Contents = SR.ReadToEnd();
SR.Close();

The StreamReader always detected US-ASCII as the file encoding when the file was saved with ANSI encoding, so the text lost all of the accented characters once it was read by the StreamReader. The StreamReader worked fine in detecting the encoding if the encoding was different that ANSI. This might be due to the different code pages used for the different ANSI encodings…

We changed the code not to trust on the StreamReader’s ability to detect the ANSI code page:

Encoding e = GetFileEncoding(myfile);
StreamReader SR = new StreamReader(myfile, e,true);
String Contents = SR.ReadToEnd();
SR.Close();

Where GetFileEncoding was published on this post

Note that on the code above, any ANSI encoded file is defaulted to the local ANSI encoding (default). If the file was saved on a machine with an ANSI code page different than the ANSI code page where the program is running, you might still have unexpected results.

Yay!!!! The team is on TechNet Innovation Awards 2008

Hi Developer that wanders the internet searching for the solution to your bug. Take a brief moment and vote for us, bunch of developers who also wander the internet to search solutions for our bugs and blog about them to help others 🙂

Microsoft Canada and TechNet Innovation Code Awards

We’ll be good and post more on our blog, interesting, good stuff :-p
Kidding, let the code prevail!

Oh, we’re the Tablet PC team, good stuff with SQL Server 2005 and Smart Clients 😉

My short personal guide to CAB

We’re involved in extending an application with WinForms 2.0 and CAB. I thought I would put the resources I used to learn about this framework in a single place, my blog :-p, as opposed to have the URLs saved on my bookmarks.

Here it goes, in other of preference:

Later on, the online reference links for each pattern used in CAB.

Cheers!

Debugging Javascript in Visual Studio

I got this from a colleague:

  1. Open Microsoft Internet Explorer.
  2. On the Tools menu, click Internet Options.
  3. On the Advanced tab, locate the Browsing section, and uncheck the Disable script debugging check box, and then click OK.
  4. Close Internet Explorer.
  5. In your JavasSript function add the keyword debugger . This causes VS.NET to switch to debug mode when it runs that line.
EX : –
function OnLookup()
{
debugger;
var xr = new XMLHttpRequest();

6. Run your ASP.Net application in debug mode.

Regular expressions for validating currency

Embedded as I am in globalizing applications I thought I would create this entry for future reference.

More than once I have to validate user input and more than once that input is a currency value. The currency symbols are not part of the expressions here.

For en-US or en-CA:

^(\d{1,3})(,\d{3})*(\.\d{2})?$

fr-CA currency formatting regular expression:

^(\d{1,3})(\s{1}\d{3})*(,\d{2})?$

Ttyl!

The mysterious get_aspx_ver.aspx page in Visual Studio 2003

I am currently working on a VS 2003 solution using VB.NET. Each time one of the team members loaded the web project our exception handling block would log a weird exception: the page requested get_aspx_ver.aspx. The exception tried to load our default error page and emailed us the proper notification.
Weird… we are not executing any code when loading the project… wrong, this mysterious page is called…

Visual Studio .NET 2003 makes an http request to the non existent web page: http://yourserver/yourweb/get_aspx_ver.aspx in order to determine the ASP.NET version installed on your web server (this information is returned in the http headers for the request). Since the page does not exist on your web server, your error handler is called.

One possible work around is to have your error handler ignore failures to this page, which is exactly what we did.

Related problems are described here:
http://support.microsoft.com/default.aspx?scid=kb;en-us;825792

and on this post in aspnetresources.com.

TTYL!

Localization Management Toolkit (beta)

A colleague and I wrote an article describing a toolkit we developed. This toolkit automates the management process of resources for localization with .NET 2.0. The article made it to the home page in DevX as featured article.
The permalink is here
http://www.devx.com/dotnet/Article/33422
We are both eager to know if someone is using a similar tool or plans to use this Toolkit.
Thanks!

More on Oracle Client and the different providers for .NET

Remember next time you write a web service or DAL class that connects to an Oracle database. If using the version 1.1 of the Framework and the Microsoft Oracle Provider, this provider has a known bug when the service name is longer than 16 characters:

The exception that you will get when trying to open the connection is:
ORA-00162: external dbid length xx is greater than maximum (16)

more on the topic can be found here

The workarounds are:
1) putting the whole tnsnames.ora entry in the DataSource entry of your connection string or
2) use Oracle’s data provider (ODP.NET)
or
3) set the TNSADMIN environment variable to point to a different directory
where you keep a modified tnsnames.ora file.

more on the troubles with the Oracle Native Provider for .NET (ODP.NET) whenever i have a chance to write again…