Thursday, January 17, 2008

Merge replication issues in SQL Server 2005 and comments on MS employee's blog

Hi all,
As I mentioned in my previous post, i would provide details of the data loss scenarios in Merge Replication topologies with SQL Server 2005. The details were published in this addendum to the original DevX article:

UPDATED SQL Server 2005 Bug Alert: Data Loss in Merge Replication

the article is basically two repros to illustrate the data loss when the partition groups are not use in the publication. The workaround is not published and the definite solution should be available on SQL SErver 2005 SP3 or the upcoming cumulative update in February 2008.

Today I came across a blog post regarding SQL Server bugs and how to provide the information to tech support or to the Connects program.

Please read this blog post if you are posting/investigating bugs. The more information you provide, the faster the bug might be scheduled for a fix:

Getting Your "Favorite" SQL Server Bug Fixed

What is interesting on that blog post is one of the comments from a MS employee "anna":

I'm sometimes amazed how personal people takes the bug issue. If you look at users you can sometimes think that the users think we have a complete bug list or that we have super powers to figure out what the problem are. And believing that a rotten attitude gets the problem fixed faster is so stupid.

But if you look at it from the other perspective, I often find developers taking pride in classifying something as a bug. In these days of agile and customer driven development, why taking so much pride into saying if something is a bug or a change request.

During the past two years we've gotten two synchronization bugs fixed in SQL Server 2005. My tips: be honest, give all information you have, understand that everyone wants to fix the bugs and don't forget that the guys fixing and confirming the stuff are people. Often really nice people. And also remember that reporting a bug is like going to the ER: sometimes there are people who are sicker than you and they need help first.


I know there is no excuse to use fault language or be rude to have your bug scheduled first, however, there is also no excuse to have data loss due to insufficient testing or a newly introduced bug.
I can only speak for myself, but in our case the merge topology had worked fault free for over a year before we upgraded to the 2005 version.
We are humans and we err, but we are also accountable for our code and our testing procedures. It's not a matter of taking pride on pointing out a fault. I would rather not have had the data loss issue at all and the overtime and stress that happened after in order to recover the data and avoid future losses.

Why taking so much pride in the work we do or the code we write? Maybe it's better to ask why not?

Labels: