I promised myself that i would be blogging about CTEs in a near future.
Well, here I am, once again leaving a trace of what I’ve learned so far so I can use it as reference later.
Unfortunately for the T-SQL lovers that come from the Oracle world, the T-SQL enhancements
in SQL Server 2005 do not include arrays among the data structures, the workaround for array use is
to get hands on the SQLCLR and use the .NET framework…
Ok, but this post was about Common Table Expressions, what does arrays have to do with it?
Most of the time, T-SQL developers would use temporary tables whenever they could have used arrays,
to somehow keep a set of values for future use inside the batch.
The way of building temporary structures, name it tables, in your sprocs has evolved from sql server 7.0.
Initially you had the temporary tables such as #people:
Local Temporary Tables
CREATE TABLE #people
A temporary table is created and populated on disk, in the system database tempdb — with a
session-specific identifier packed onto the name, to differentiate between similarly-named #temp tables
created from other sessions. The data in this #temp table (in fact, the table itself) is visible only to the current scope
(usually a stored procedure, or a set of nested stored procedures). The table gets cleared up automatically when
the current procedure goes out of scope, but you should manually clean up the data when you’re done with it:
DROP TABLE #people
Global Temporary Tables
CREATE TABLE ##people
Global temporary tables operate much like local temporary tables; they are created in tempdb and cause less locking and logging than permanent tables. However, they are visible to all sessions, until the creating session goes out of scope (and the global ##temp table is no longer being referenced by other sessions). If two different sessions try the above code, if the first is still active, the second will receive the following:
Server: Msg 2714, Level 16, State 6, Line 1
There is already an object named ‘##people’ in the database.
I have yet to see a valid justification for the use of a global ##temp table. If the data needs to persist to multiple users, then it makes much more sense, at least to me, to use a permanent table. You can make a global ##temp table slightly more permanent by creating it in an autostart procedure, but I still fail to see how this is advantageous over a permanent table. With a permanent table, you can deny permissions; you cannot deny users from a global ##temp table.
In sql server 2000 it was introduced the concept of table variables:
DECLARE @people TABLE
A table variable is created in memory, and so performs slightly better than #temp tables (also because there is even less locking and logging in a table variable). A table variable might still perform I/O to tempdb (which is where the performance issues of #temp tables make themselves apparent), though the documentation is not very explicit about this.
Table variables are automatically cleared when the procedure or function goes out of scope, so you don’t have to remember to drop or clear the data (which can be a good thing or a bad thing; remember “release early”?). The tempdb transaction log is less impacted than with #temp tables; table variable log activity is truncated immediately, while #temp table log activity persists until the log hits a checkpoint, is manually truncated, or when the server restarts.
Then why not using always tale variables… Performance is the answer. I’m not going to go deep into this now but just post the url for the official white paper:
Limitations of table variables http://support.microsoft.com/default.aspx/kb/305977
And finally SQL Server 2005 Common Table Expressions, I guess I could call them the light weight temporary tables. their syntax is even lighter than
the syntax in table variables.
Pseudocode and Semantics
The recursive CTE structure must contain at least one anchor member and one recursive member. The following pseudocode shows the components of a simple recursive CTE that contains a single anchor member and single recursive member.
WITH cte_name ( column_name [,…n] )
CTE_query_definition –- Anchor member is defined.
CTE_query_definition –- Recursive member is defined referencing cte_name.
— Statement using the CTE
The semantics of the recursive execution is as follows:
1. Split the CTE expression into anchor and recursive members.
2. Run the anchor member(s) creating the first invocation or base result set (T0).
3. Run the recursive member(s) with Ti as an input and Ti+1 as an output.
4. Repeat step 3 until an empty set is returned.
5. Return the result set. This is a UNION ALL of T0 to Tn.
A simple CTE declaration:
WITH DirReps(ManagerID, DirectReports) AS
SELECT ManagerID, COUNT(*)
FROM HumanResources.Employee AS e
WHERE ManagerID IS NOT NULL
GROUP BY ManagerID
An even simpler use:
SELECT ManagerID, DirectReports
ORDER BY ManagerID;
So with CTEs there is no need of formally declaring your column types, only the column names, as the types are assumed from the
query definition. The syntax is for sure lighter. I’m not aware of performance issues related to CTEs, probably some kind soul could ad a comment here and let me know.
We so far cannot pass table variables and CTEs as parameters in stored procedures, this is definitely something that should come in future versions.
Bye for now.