Wednesday, March 29, 2006

More about temporary tables and TempDB in SQL Server

Following the trend of my previous post I found this explanation on Kimberly Tripp's blog and it sure cleared out a few things for me:

...understanding [of] TempDB - what goes there?
  • Internal temporary objects needed by SQL Server in the midst of other complex operations. For example, worktables created by a hash aggregate will be stored in TempDB or interim tables uses in hash joins (almost anything that shows as "hash" something in your query plan output is likely to go to tempdb).
  • User objects created with either # (for local temporary objects), ## (globabl temporary objects) or @ (table variables)
    • # = Local temporary object
      Local temp objects are objects accessible ONLY in the session that created it. These objects are also removed automatically when the session that created it ends (unless manually dropped).
    • ## = Globabl temporary object
      Global temporary objects are objects that are accessible to ANYONE who can login to your SQL Server. They will only persist as long as the user that created it lasts (unless manually dropped) but anyone who logs in during that time can directly query, modify or drop these temporary objects. These objects are also removed automatically when the session that created it ends (unless manually dropped) OR if being used by another session when the session that created it ends, when the session using it finishes using it (and it's only as long as any locks are held). If other sessions need more permanent use of a temporary object you should consider creating a permanent objects and dropping it manually.
    • @ = User-defined Table Variable
      User-defined Table Variables were introduced in SQL Server 2000 (or, wow - was it 7.0?) and provide an alternative to temporary tables by allowing you to create a variable defined as type TABLE and then you can populate and use it in a variety of ways. There has been A LOT of debate over whether or not you should always use table variables or always use temp tables. My response is that I ALWAYS avoid the word always! My point is that table variables are NOT always better nor are temp tables always better. There are key uses to each. I tend to like temp tables in scenarios where the object is used over a longer period of time - I can create non-key indexes on it and it's more flexible to create to begin with (SELECT INTO can be used to create the temp table). I also have the ability to use the temporary table in nested subprocedures because it's not local to the procedure in which it was created. However, if you don't need any of those things then a table variable might be better. When it is likely to be better - when you have smaller objects that don't need to be accessed outside of the procedure in which it was created and when you only need KEY indexes (a table variable ONLY supports the indexes created by a create table statement - meaning PRIMARY KEY and UNIQUE KEY).
The origininal post is here
I've been following up the series SQLSkills is publishing on Sql Server 2005...
More will come up soon folks!

Thursday, March 23, 2006

Temporary tables/views go light weight in T-SQL 2005

I promised myself that i would be blogging about CTEs in a near future.

Well, here I am, once again leaving a trace of what I've learned so far so I can use it as reference later.
Unfortunately for the T-SQL lovers that come from the Oracle world, the T-SQL enhancements
in SQL Server 2005 do not include arrays among the data structures, the workaround for array use is
to get hands on the SQLCLR and use the .NET framework...
Ok, but this post was about Common Table Expressions, what does arrays have to do with it?
Most of the time, T-SQL developers would use temporary tables whenever they could have used arrays,
to somehow keep a set of values for future use inside the batch.
The way of building temporary structures, name it tables, in your sprocs has evolved from sql server 7.0.
Initially you had the temporary tables such as #people:

Local Temporary Tables

CREATE TABLE #people
(
id INT,
name VARCHAR(32)
)

A temporary table is created and populated on disk, in the system database tempdb — with a
session-specific identifier packed onto the name, to differentiate between similarly-named #temp tables
created from other sessions. The data in this #temp table (in fact, the table itself) is visible only to the current scope
(usually a stored procedure, or a set of nested stored procedures). The table gets cleared up automatically when
the current procedure goes out of scope, but you should manually clean up the data when you're done with it:

DROP TABLE #people

Or the

Global Temporary Tables

CREATE TABLE ##people
(
id INT,
name VARCHAR(32)
)

Global temporary tables operate much like local temporary tables; they are created in tempdb and cause less locking and logging than permanent tables. However, they are visible to all sessions, until the creating session goes out of scope (and the global ##temp table is no longer being referenced by other sessions). If two different sessions try the above code, if the first is still active, the second will receive the following:

Server: Msg 2714, Level 16, State 6, Line 1
There is already an object named '##people' in the database.

I have yet to see a valid justification for the use of a global ##temp table. If the data needs to persist to multiple users, then it makes much more sense, at least to me, to use a permanent table. You can make a global ##temp table slightly more permanent by creating it in an autostart procedure, but I still fail to see how this is advantageous over a permanent table. With a permanent table, you can deny permissions; you cannot deny users from a global ##temp table.


In sql server 2000 it was introduced the concept of table variables:

Table Variables

DECLARE @people TABLE
(
id INT,
name VARCHAR(32)
)

A table variable is created in memory, and so performs slightly better than #temp tables (also because there is even less locking and logging in a table variable). A table variable might still perform I/O to tempdb (which is where the performance issues of #temp tables make themselves apparent), though the documentation is not very explicit about this.

Table variables are automatically cleared when the procedure or function goes out of scope, so you don't have to remember to drop or clear the data (which can be a good thing or a bad thing; remember "release early"?). The tempdb transaction log is less impacted than with #temp tables; table variable log activity is truncated immediately, while #temp table log activity persists until the log hits a checkpoint, is manually truncated, or when the server restarts.



Then why not using always tale variables... Performance is the answer. I'm not going to go deep into this now but just post the url for the official white paper:
Limitations of table variables http://support.microsoft.com/default.aspx/kb/305977

And finally SQL Server 2005 Common Table Expressions, I guess I could call them the light weight temporary tables. their syntax is even lighter than
the syntax in table variables.

Pseudocode and Semantics

The recursive CTE structure must contain at least one anchor member and one recursive member. The following pseudocode shows the components of a simple recursive CTE that contains a single anchor member and single recursive member.

WITH cte_name ( column_name [,...n] )

AS

(

CTE_query_definition –- Anchor member is defined.

UNION ALL

CTE_query_definition –- Recursive member is defined referencing cte_name.

)

-- Statement using the CTE

SELECT *

FROM cte_name

The semantics of the recursive execution is as follows:

1. Split the CTE expression into anchor and recursive members.
2. Run the anchor member(s) creating the first invocation or base result set (T0).
3. Run the recursive member(s) with Ti as an input and Ti+1 as an output.
4. Repeat step 3 until an empty set is returned.
5. Return the result set. This is a UNION ALL of T0 to Tn.


A simple CTE declaration:

USE AdventureWorks;
GO
WITH DirReps(ManagerID, DirectReports) AS
(
SELECT ManagerID, COUNT(*)
FROM HumanResources.Employee AS e
WHERE ManagerID IS NOT NULL
GROUP BY ManagerID
)

GO


An even simpler use:

USE AdventureWorks;
GO
SELECT ManagerID, DirectReports
FROM DirReps
ORDER BY ManagerID;
GO

So with CTEs there is no need of formally declaring your column types, only the column names, as the types are assumed from the
query definition. The syntax is for sure lighter. I'm not aware of performance issues related to CTEs, probably some kind soul could ad a comment here and let me know.

We so far cannot pass table variables and CTEs as parameters in stored procedures, this is definitely something that should come in future versions.


Bye for now.