More about temporary tables and TempDB in SQL Server

Following the trend of my previous post I found this explanation on Kimberly Tripp’s blog and it sure cleared out a few things for me:

…understanding [of] TempDB – what goes there?

  • Internal temporary objects needed by SQL Server in the midst of other complex operations. For example, worktables created by a hash aggregate will be stored in TempDB or interim tables uses in hash joins (almost anything that shows as “hash” something in your query plan output is likely to go to tempdb).
  • User objects created with either # (for local temporary objects), ## (globabl temporary objects) or @ (table variables)
    • # = Local temporary object
      Local temp objects are objects accessible ONLY in the session that created it. These objects are also removed automatically when the session that created it ends (unless manually dropped).
    • ## = Globabl temporary object
      Global temporary objects are objects that are accessible to ANYONE who can login to your SQL Server. They will only persist as long as the user that created it lasts (unless manually dropped) but anyone who logs in during that time can directly query, modify or drop these temporary objects. These objects are also removed automatically when the session that created it ends (unless manually dropped) OR if being used by another session when the session that created it ends, when the session using it finishes using it (and it’s only as long as any locks are held). If other sessions need more permanent use of a temporary object you should consider creating a permanent objects and dropping it manually.
    • @ = User-defined Table Variable
      User-defined Table Variables were introduced in SQL Server 2000 (or, wow – was it 7.0?) and provide an alternative to temporary tables by allowing you to create a variable defined as type TABLE and then you can populate and use it in a variety of ways. There has been A LOT of debate over whether or not you should always use table variables or always use temp tables. My response is that I ALWAYS avoid the word always! My point is that table variables are NOT always better nor are temp tables always better. There are key uses to each. I tend to like temp tables in scenarios where the object is used over a longer period of time – I can create non-key indexes on it and it’s more flexible to create to begin with (SELECT INTO can be used to create the temp table). I also have the ability to use the temporary table in nested subprocedures because it’s not local to the procedure in which it was created. However, if you don’t need any of those things then a table variable might be better. When it is likely to be better – when you have smaller objects that don’t need to be accessed outside of the procedure in which it was created and when you only need KEY indexes (a table variable ONLY supports the indexes created by a create table statement – meaning PRIMARY KEY and UNIQUE KEY).

The origininal post is here
I’ve been following up the series SQLSkills is publishing on Sql Server 2005…
More will come up soon folks!

Temporary tables/views go light weight in T-SQL 2005

I promised myself that i would be blogging about CTEs in a near future.

Well, here I am, once again leaving a trace of what I’ve learned so far so I can use it as reference later.
Unfortunately for the T-SQL lovers that come from the Oracle world, the T-SQL enhancements
in SQL Server 2005 do not include arrays among the data structures, the workaround for array use is
to get hands on the SQLCLR and use the .NET framework…
Ok, but this post was about Common Table Expressions, what does arrays have to do with it?
Most of the time, T-SQL developers would use temporary tables whenever they could have used arrays,
to somehow keep a set of values for future use inside the batch.
The way of building temporary structures, name it tables, in your sprocs has evolved from sql server 7.0.
Initially you had the temporary tables such as #people:

Local Temporary Tables

CREATE TABLE #people
(
id INT,
name VARCHAR(32)
)

A temporary table is created and populated on disk, in the system database tempdb — with a
session-specific identifier packed onto the name, to differentiate between similarly-named #temp tables
created from other sessions. The data in this #temp table (in fact, the table itself) is visible only to the current scope
(usually a stored procedure, or a set of nested stored procedures). The table gets cleared up automatically when
the current procedure goes out of scope, but you should manually clean up the data when you’re done with it:

DROP TABLE #people

Or the

Global Temporary Tables

CREATE TABLE ##people
(
id INT,
name VARCHAR(32)
)

Global temporary tables operate much like local temporary tables; they are created in tempdb and cause less locking and logging than permanent tables. However, they are visible to all sessions, until the creating session goes out of scope (and the global ##temp table is no longer being referenced by other sessions). If two different sessions try the above code, if the first is still active, the second will receive the following:

Server: Msg 2714, Level 16, State 6, Line 1
There is already an object named ‘##people’ in the database.

I have yet to see a valid justification for the use of a global ##temp table. If the data needs to persist to multiple users, then it makes much more sense, at least to me, to use a permanent table. You can make a global ##temp table slightly more permanent by creating it in an autostart procedure, but I still fail to see how this is advantageous over a permanent table. With a permanent table, you can deny permissions; you cannot deny users from a global ##temp table.

In sql server 2000 it was introduced the concept of table variables:

Table Variables

DECLARE @people TABLE
(
id INT,
name VARCHAR(32)
)

A table variable is created in memory, and so performs slightly better than #temp tables (also because there is even less locking and logging in a table variable). A table variable might still perform I/O to tempdb (which is where the performance issues of #temp tables make themselves apparent), though the documentation is not very explicit about this.

Table variables are automatically cleared when the procedure or function goes out of scope, so you don’t have to remember to drop or clear the data (which can be a good thing or a bad thing; remember “release early”?). The tempdb transaction log is less impacted than with #temp tables; table variable log activity is truncated immediately, while #temp table log activity persists until the log hits a checkpoint, is manually truncated, or when the server restarts.

Then why not using always tale variables… Performance is the answer. I’m not going to go deep into this now but just post the url for the official white paper:
Limitations of table variables http://support.microsoft.com/default.aspx/kb/305977

And finally SQL Server 2005 Common Table Expressions, I guess I could call them the light weight temporary tables. their syntax is even lighter than
the syntax in table variables.

Pseudocode and Semantics

The recursive CTE structure must contain at least one anchor member and one recursive member. The following pseudocode shows the components of a simple recursive CTE that contains a single anchor member and single recursive member.

WITH cte_name ( column_name [,…n] )

AS

(

CTE_query_definition –- Anchor member is defined.

UNION ALL

CTE_query_definition –- Recursive member is defined referencing cte_name.

)

— Statement using the CTE

SELECT *

FROM cte_name

The semantics of the recursive execution is as follows:

1. Split the CTE expression into anchor and recursive members.
2. Run the anchor member(s) creating the first invocation or base result set (T0).
3. Run the recursive member(s) with Ti as an input and Ti+1 as an output.
4. Repeat step 3 until an empty set is returned.
5. Return the result set. This is a UNION ALL of T0 to Tn.

A simple CTE declaration:

USE AdventureWorks;
GO
WITH DirReps(ManagerID, DirectReports) AS
(
SELECT ManagerID, COUNT(*)
FROM HumanResources.Employee AS e
WHERE ManagerID IS NOT NULL
GROUP BY ManagerID
)

GO

An even simpler use:

USE AdventureWorks;
GO
SELECT ManagerID, DirectReports
FROM DirReps
ORDER BY ManagerID;
GO

So with CTEs there is no need of formally declaring your column types, only the column names, as the types are assumed from the
query definition. The syntax is for sure lighter. I’m not aware of performance issues related to CTEs, probably some kind soul could ad a comment here and let me know.

We so far cannot pass table variables and CTEs as parameters in stored procedures, this is definitely something that should come in future versions.

Bye for now.

This is a work for Mensa!

A few years ago, there was a Mensa convention in San Francisco and a bunch of Mensa members were lunching at a local café. They discovered that their salt shaker contained pepper and their pepper shaker was full of salt. How could they swap the contents of the bottles without spilling, and using only the implements at hand? Clearly this was a job for Mensa! The group debated and presented ideas, and finally came up with a brilliant solution involving a napkin, a straw, and an empty saucer. They called the waitress over to dazzle her with their solution.

“Ma’am,” they said, “We couldn’t help but notice that pepper shaker contains salt and the salt shaker–”

“Oh,” the waitress interrupted. “Sorry about that.” She unscrewed the caps of both bottles, switched them, and said, “Will that be one check or separate?”

XML and SQL Server 2000 don’t get along too well

(I had this post as a draft since January, it’s time to get over the blogging lazyness)

I remember when sql server 2k came out that the main ad campaigns were about it’s xml support. I remember that, at the time, I didn’t know what the heck xml was, and why all the buzz about it.
I corrected part of my ignorance right after, note, part of my ignorance…
Anyways, apart from retrieving resulsets in xml format, with who knows what schema, I hadn’t used xml in sql server much, and thought that *that* xml support was all a programmer gal could wish for.
Oh, I was wrong!
My last project involved a great deal of importing xml into sql server 2000 and it was indeed interesting.
I noticed a few things I would like to blog about so I could keep a reminder for future projects.

  1. sql server 2000 won’t import xml using a DTS package unless you have the “proper mapping” of your xml schema and your database schema. I was never able to tickle the proper mapping… My first idea was infering an xsd file out of the xml (I created that with a visual studio console app, loading the xml into a dataset and infering schema, but that didn’t seem to work as the proper mapping) See this article Importing XML into sql server 2000.
  2. what about Excel? you theoretically can import xml into Excel, associate an xsd file to your spreadsheet and have it all imported into sql server 2000. Voila! it should be just a couple of clicks. Don’t run that fast. Excel won’t import complex xml, this is an xml with a complex schema, where there are several child nodes with more child nodes (complex nested types indicating a one to many relationships). I even tried to separate the xml for importing one complex type or table at a time and realized I had to save all of my .xls files as csv (comma separated text) to have them imported properly.

Ok, now that we ended up writing a a dot net utility to use ADO.NET, now that we load the XML and synchronize the dataset with the table in sql server 2k, lets check if I can retrieve the xml out of the database to be able to work with smaller chunks of data in xml format.

  1. This part of the project was also fun indeed. The T-SQL clause for XML does not seem to work when some of the columns in your table are defined as text and will truncate the resulset over a maximun length.
  2. SQL server 2000 is unable to retrive the xml with the schema you would like. Apart from the data truncation, you should make deep xslt with the data afterwards to have it properly complying with the schema you want.

Hopefully sql server 2005 will answer some of our prayers. There is a xml data type, so worse case scenario, you can load the whole document into a column in a temporary table(I know this is not a good practice to have a huge document in a table column), apply XQuery to it and be able to retrieve the chunk of information you do want to import into your tables. DTS packages for importing xml should be better, old sql server 2000 relied on scripting language for everything with the known drawback that scripting languages are typeless.

Said all this, I might blog about sql server 2k5 in a next post (I really like the CTE Common table expressions), …if I get over this blogging lazyness.
That’s all folks!

Here I am still ranting about database design and PK selection

Following the thread from previous post (and mostly to have this as a personal reminder) I came across this article
Identity and Primary Keys @ sqlteam.com
The idea behind this article is to encourage Database developers to use Natural Keys as the Primary Keys on their entities design. I have seen so many designs like that and have been so much against them. I wanted to double check and posted a thread on sql-server-l at groups.ittoolbox.com.

Some of the brief comments here:
Natural key is not good as a primary key at all. It is usually much longer
than int identity(1,1) 4 bytes.

Natural key is very cumbersome to use as a foreign key.

I am amazed that this became a debate here.

Ahem, it might be a newbie question alright!

I also got another comment on how a uniqueidentifier field cannot guarantee uniqueness:

I have seen plenty of people create databases where the only unique ident=
ifier was a surrogate key, such as a sql server id column. These people o=
ften show up here asking how to delete duplicates. (haven’t you seen a to=
n of them here Kalman?) Duplicates? How can there be any duplicates if yo=
u have a sql server ID column as a primary key? LOL

This dba proposed to use Natural Keys as PK and got this reply back:

Take my advice all: int identity(1,1) is the best primary key. You can put
unique contraint or unique index on the natural key column.

Back in the glory days of sybase it was a pain to construct your own
identity(1,1) through the use of the max function.

But the best of all is this comment that was inserted into the original article. Original comment here (page two of the comments, nickname quazibubble)I know the comment is a little bit rude, but I liked how clear that explanation is:

A solid, mature OLTP database architecture has entities with a single identity or GUID PK defined, usually as a clustered index, that by definition is totally meaningless except for building relational structures behind the scenes. That is it’s ONLY purpose. Think of it as an undercover system column that has nothing to do with the entity it belongs to, you just happen to be able to see it as a developer. You’ll use design systems in the future that don’t even give YOU, the developer, access to this column or it’s values, but it will be there (they’ve already existed as CASE tools and their ancestors, as well as their simpler, more useful prodigy). Your job should be designing ENTITIES AND RELATIONSHIP, not PKs and FKs. If you find yourself pondering what to use for a PK (beyond the question of “should it be an identity or GUID?”), you are doing something terribly wrong and will be costing a whole team a great deal of money and headache, even long after you a fired.

Furthermore, if a PK is not meaningless–e.g. it is any attribute remotely helping to add human meaning to the entity (SSN, network login, etc.), there WILL be cascading updates throughout the entire system, many people are going to be very pissed, and you will look like a jacka**. Users make typos, people change their names. Any data that was entered by or derived from a human at any point is subject to change. To be perfectly compliant with the forms of normalization, the value in a PK should not even be used as an account number. But giving a system-defined value that later becomes meaningful to a human is not nearly as harmful as using an existing value of meaning for a PK.

End of reminder

ok, another database design task and I found the reference article I had talked about

A few posts ago I mentioned why Primary Keys in a table shouldn’t have business meaning. I didn’t quote the article I usually re-read for reference. I ran into this article while looking for another subject in www.agiledata.org today.
The section of this article I like the most is:

There are two strategies for assigning keys to tables. The first is to simply use a natural key, one or more existing data attributes that are unique to the business concept. For the Customer table there was two candidate keys, in this case CustomerNumber and SocialSecurityNumber. The second strategy is to introduce a new column to be used as a key. This new column is called a surrogate key, a key that has no business meaning, an example of which is the AddressID column of the Address table… Addresses don’t have an “easy” natural key because you would need to use all of the columns of the Address table to form a key for itself, therefore introducing a surrogate key is a much better option in this case.

The primary advantage of natural keys is that they exist already, you don’t need to introduce a new “unnatural” value to your data schema. However, the primary disadvantage of natural keys is that because they have business meaning it is possible that they may need to change if your business requirement change. For example, if your users decide to make CustomerNumber alphanumeric instead of numeric then in addition to updating the schema for the Customer table (which is unavoidable) you would have to change every single table where CustomerNumber is used as a foreign key.

If the Customer table instead used a surrogate key then the change would have been localized to just the Customer table itself (CustomerNumber in this case would just be a non-key column of the table). Naturally, if you needed to make a similar change to your surrogate key strategy, perhaps adding a couple of extra digits to your key values because you’ve run out of values, then you would have the exact same problem.

The fundamental problem is that keys are a significant source of coupling within a relational schema, and as a result they are difficult to change. The implication is that you want to avoid keys with business meaning because business meaning changes.

And the article can be found at Data Modeling 101

Enjoy!

I’m a MCP :-p

Today is my first day at the office after a nice week of vacation. I had such an hectic schedule last month that I spent a lots of time without blogging. The main reason was the time I spent studying for the cert exam 70-305. So yes, I passed the exam.
Advice for those who want to write the exam, try to make as many preparation tests as you can, the exam cram book is a good guide, take the official documentation as a reference, but it shouldn’t be the only reference you use. Most of the questions will have slight tiny details on each possible answer, some of them will offer more than two correct answers but you need to select the most efficient solution, so you basically have to train yourself apart from knowing the subject.
Ok, now I’m back to a hectic schedule again but promise to sleep more than 5 hours a day 😉

if only SQL Server 2005 would bring this feature…

Ok, add this to my Christmas’ wish list. SQL Server 2000 doesn’t allow where … in clauses with multiple fields, so the following construction fails:

DELETE FROM temp1
WHERE name, description IN
(SELECT name,description
FROM temp2
WHERE name LIKE ‘%computers%’)

The solution is a not so pretty concatenation, if fields are strings

DELETE FROM temp1
WHERE name + description IN
(SELECT name + description
FROM temp2
WHERE name LIKE ‘%computers%’)

But you are forced to use joins if the field is numerical.

I’ve been receiving a training from INETA as part of SQL 2005 launch this November, I’m planning to ask if they have included any extra features in the new T-SQL, that one would be pretty handy.

In the emergency room

I phone my dad yesterday, and tried to explain him how was my day while working as Tech Support on Saturdays. He was concerned for all those hours of work, as he knows I work Mon-Fri as well. I said, well, times flies so quickly you have no time to find out what time is it, plus you learn a lot, despite of how hectic things can be, you see lots of real life troubles and you have to find a solution even if you haven’t heard about the problem before.
I tried to explain him, well, people host their websites with us, some of them a pretty complex applications, they use who knows what scripts and we give support for about 4 programming languages, let alone three types of databases. The other day a customer was complaining his Ruby installation wouldn’t work, so boss said, Lizet, write a hello world program in Ruby and show him his installation is working. But boss! Ruby? all I’ve seen is a Ruby site with funny comics, and all I remember is the comics…You remember more than I do, so give it a try…Ok, hello world should always be easy.
I also had to debug some perl scripts for a customer or show him the lines with errors, I had to import several databases, watch for server overload a couple of times. It was fun Dad!
I haven’t mentioned my dad is a MD, and although he wouldn’t work “on call” anymore he used to make a shift a week when he was my age. In the fam we all know more than a story about it. He replied: “It seems to me you’re working on an emergency room”
Hrm, I said, yup, I suppose it feels the same 🙂

This is from Joel on Software and yes, I had to laugh

“Custom development is that murky world where a customer tells you what to build, and you say, ‘are you sure?’ and they say yes, and you make an absolutely beautiful spec, and say, ‘is this what you want?’ and they say yes, and you make them sign the spec in indelible ink, nay, blood, and they do, and then you build that thing they signed off on, promptly, precisely and exactly, and they see it and they are horrified and shocked, and you spend the rest of the week reading up on whether your E&O insurance is going to cover the legal fees for the lawsuit you’ve gotten yourself into or merely the settlement cost. Or, if you’re really lucky, the customer will smile wanly and put your code in a drawer and never use it again and never call you back.”

Check link here

It had happened to me, hrm, lemme think, probably more than three times. One of the projects was a really expensive one and right after the very few deployments it was never used again. For me that was deeply saddening as we had invested so many hours of work in it, the customer did pay, but sometimes you prefer seeing your software being used, even abused, rather than gone to the oblivion.