Here I am still ranting about database design and PK selection

Following the thread from previous post (and mostly to have this as a personal reminder) I came across this article
Identity and Primary Keys @ sqlteam.com
The idea behind this article is to encourage Database developers to use Natural Keys as the Primary Keys on their entities design. I have seen so many designs like that and have been so much against them. I wanted to double check and posted a thread on sql-server-l at groups.ittoolbox.com.

Some of the brief comments here:
Natural key is not good as a primary key at all. It is usually much longer
than int identity(1,1) 4 bytes.

Natural key is very cumbersome to use as a foreign key.

I am amazed that this became a debate here.

Ahem, it might be a newbie question alright!

I also got another comment on how a uniqueidentifier field cannot guarantee uniqueness:

I have seen plenty of people create databases where the only unique ident=
ifier was a surrogate key, such as a sql server id column. These people o=
ften show up here asking how to delete duplicates. (haven’t you seen a to=
n of them here Kalman?) Duplicates? How can there be any duplicates if yo=
u have a sql server ID column as a primary key? LOL

This dba proposed to use Natural Keys as PK and got this reply back:

Take my advice all: int identity(1,1) is the best primary key. You can put
unique contraint or unique index on the natural key column.

Back in the glory days of sybase it was a pain to construct your own
identity(1,1) through the use of the max function.

But the best of all is this comment that was inserted into the original article. Original comment here (page two of the comments, nickname quazibubble)I know the comment is a little bit rude, but I liked how clear that explanation is:

A solid, mature OLTP database architecture has entities with a single identity or GUID PK defined, usually as a clustered index, that by definition is totally meaningless except for building relational structures behind the scenes. That is it’s ONLY purpose. Think of it as an undercover system column that has nothing to do with the entity it belongs to, you just happen to be able to see it as a developer. You’ll use design systems in the future that don’t even give YOU, the developer, access to this column or it’s values, but it will be there (they’ve already existed as CASE tools and their ancestors, as well as their simpler, more useful prodigy). Your job should be designing ENTITIES AND RELATIONSHIP, not PKs and FKs. If you find yourself pondering what to use for a PK (beyond the question of “should it be an identity or GUID?”), you are doing something terribly wrong and will be costing a whole team a great deal of money and headache, even long after you a fired.

Furthermore, if a PK is not meaningless–e.g. it is any attribute remotely helping to add human meaning to the entity (SSN, network login, etc.), there WILL be cascading updates throughout the entire system, many people are going to be very pissed, and you will look like a jacka**. Users make typos, people change their names. Any data that was entered by or derived from a human at any point is subject to change. To be perfectly compliant with the forms of normalization, the value in a PK should not even be used as an account number. But giving a system-defined value that later becomes meaningful to a human is not nearly as harmful as using an existing value of meaning for a PK.

End of reminder

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.