Performance Woes & Wows #6: I Pity The Spool

Actually, I pity the SQL Server that has to create Spools in order to handle nasty queries. But that is too long for a title. The title is a lie. With that being said, spools may appear in specific situations, e.g. where some repetitive operations are done, or Halloween Protection is required. There are four types of spools: Table Spools, RowCount Spools, Window Spools and Index Spools. In this post, we’ll discuss the latter.

Anger – Use It, But Don’t Lose It!

When SQL Server needs to perform a repetitve action over a table, it might choose to cache its own temporary copy, using the relevant columns to that specific action. Usually to make this happen, you need to ensure the following pre-requisites:

  1. Fail successfully to create the appropriate indexes to support the operation.
  2. The operation implies that the rows from the spool will be consumed more than once.

I think we all can handle the first one. The second can be trickier because it’s bounded to the nature of the operation itself, but its absence will incite SQL Server to avoid the spool. If both premises are assured and the Optimizer chooses to create an Index Spool, it’s probably a good call for preventing a much greater catastrophe. Nevertheless, this has some major drawbacks, for example:

  • Execution will be blocked while the Index Spool is built – single-threaded.
  • Index Spools are created and disposed on each execution, which is very inneficient resource-wise and may incur in significant overhead for the tempdb.

That’s not cool. The good news is that this problem often comes with an easy solution.

I Got No Time For The Jibba-Jabba.

In order to see an Index Spool in the wild, we’ll execute the following query, in the StackOverflow2010 database:

CREATE INDEX IX_CloseDate_INC_OwnerUserId
ON Posts (ClosedDate) INCLUDE(OwnerUserId)
GO
SELECT 
	p.OwnerUserId,
	x.LastBadge
FROM Posts p
CROSS APPLY
	(
		SELECT MAX(Id) LastBadge
		From Badges b
		WHERE b.UserId = p.OwnerUserId
		AND b.Date < p.ClosedDate
	) x
WHERE p.ClosedDate IS NOT NULL

The underlying execution plan will present the star of the show:

Single-Threaded Beauty

Just by looking to the plan, it’s perceptible that the Index Spool consumes most of the execution time. Digging deeper into the properties, it’s possible to confirm the serial zone associated to the Index Spool (presented in the lower right corner of the image), despite the parallel icon in the operator. The IO and time statistics will be relevant for further comparison: 3 394 189 logical reads, 3.25 seconds of CPU time and 3.07 seconds of elapsed time.

Between the grids described previously, it’s also possible to see the properties regarding Rebinds and Rewinds: 2 and 39 523 respectively, which is a pretty nice ratio (read this article for more details on that).

I Don’t Like Magic – But I’ve Made Spools Disappear

As stated earlier, an Index Spool is kind of the poor man’s index, so let’s try to figure that out. Hovering the operator in the execution plan will show that the columns of the table Badges being used in the spool are UserId and Date.

So, let’s create a proper index to provide SQL Server a more efficient structure to rely on:

CREATE INDEX IX_UserId_Date ON Badges (UserId,Date)

With that, we’re ready to re-execute our query.

I Love When a Plan Comes Together.

This time, there’s no spool in the execution plan:

the Clustered Index Scan plus the Index Spool is replaced by an Index Seek, which is much more effective returning the data and avoids the serial zone in the plan. This instance, our query resulted in only 128 931 logical reads, 436 milliseconds of CPU and 387 milliseconds of elapsed time.

I love when a plan comes together.

Performance Woes & Wows #5: IN vs UNION

Querying the Same Thing. Faster.

In a previous post, we’ve seen that equivalent queries can produce distinct execution plans – Yes, how nice and intuitive of you SQL Server. But this happens for a logical reason: The Query Optimizer hates you doesn’t have the means to explore every possible semantic equivalence. Hence, the shape of the execution plan will be intrisically bounded to the way we write our queries.

In this post, we’ll extend this notion to a more real-world like example, which I found very interesting because it seemed kind of counterintuitive to me.

Querying

Let’s assume we want to count every Question, Wiki and TagWikiExerpt post, in the Posts table of the StackOverflow2010 database. In order to achieve that we can create the following index:

CREATE INDEX IX_PostTypeId ON Posts (PostTypeId)

And then execute this query:

SELECT COUNT(1) 
FROM Posts 
WHERE PostTypeId IN (1,3,4)

We can confirm that the execution plan is comprised by a scan to the previously created index and has an estimated cost of 3.27277. The IO statistics show 1915 logical reads and regarding CPU time and duration, 129ms and 123ms, respectively. These time statistics remained very identical on all the performed executions.

the Same Thing.

Now let’s do the same thing, expecting different results. Well, almost the same thing. Let’s just change the previous query a little bit, by replacing the IN operator for this logical equivalent, but uglier and more complex version:

SELECT SUM(x.c) FROM
(
	SELECT COUNT(1) c FROM Posts WHERE PostTypeId = 1
	UNION ALL
	SELECT COUNT(1) c FROM Posts WHERE PostTypeId = 3
	UNION ALL
	SELECT COUNT(1) c FROM Posts WHERE PostTypeId = 4
) x

These changes are reflected in the execution plan, as expected:

I’m so ugly // But that’s okay, ‘cause so are you

Faster.

The really interesting part about this plan is… That it’s better than the first one. Don’t believe me? Check for yourself:

This query only consumed 47ms of CPU time (-63.6%) and ran for a total of 76ms (-38.2%). The difference was not quite significant regarding logical reads, nevertheless a lower value was achieved with 1912. Accordingly, the estimated plan cost was also lower, 2.68443. Sure, some assumptions behind the cost calculation may be outdated and not entirely accurate, but this time, the prediction was entirely correct.

So What?

Rewriting queries using equivalent syntax seems like a valid approach when trying to improve query performance. Nevertheless, I don’t know any set of rules or even guidelines that may help in this decision process. It’s more like an empiric procedure, one day on the threshold of despair you try a thing that unexpectedly works and later in the future when a similar pattern emerges, you’ll know that the time has come. Why it works? Because of reasons. But if it works, it ain’t stupid. Best advice of the year.