Actually, I pity the SQL Server that has to create Spools in order to handle nasty queries. But that is too long for a title. The title is a lie. With that being said, spools may appear in specific situations, e.g. where some repetitive operations are done, or Halloween Protection is required. There are four types of spools: Table Spools, RowCount Spools, Window Spools and Index Spools. In this post, we’ll discuss the latter.
Anger – Use It, But Don’t Lose It!
When SQL Server needs to perform a repetitve action over a table, it might choose to cache its own temporary copy, using the relevant columns to that specific action. Usually to make this happen, you need to ensure the following pre-requisites:
- Fail successfully to create the appropriate indexes to support the operation.
- The operation implies that the rows from the spool will be consumed more than once.
I think we all can handle the first one. The second can be trickier because it’s bounded to the nature of the operation itself, but its absence will incite SQL Server to avoid the spool. If both premises are assured and the Optimizer chooses to create an Index Spool, it’s probably a good call for preventing a much greater catastrophe. Nevertheless, this has some major drawbacks, for example:
- Execution will be blocked while the Index Spool is built – single-threaded.
- Index Spools are created and disposed on each execution, which is very inneficient resource-wise and may incur in significant overhead for the tempdb.
That’s not cool. The good news is that this problem often comes with an easy solution.
I Got No Time For The Jibba-Jabba.
In order to see an Index Spool in the wild, we’ll execute the following query, in the StackOverflow2010 database:
CREATE INDEX IX_CloseDate_INC_OwnerUserId ON Posts (ClosedDate) INCLUDE(OwnerUserId) GO SELECT p.OwnerUserId, x.LastBadge FROM Posts p CROSS APPLY ( SELECT MAX(Id) LastBadge From Badges b WHERE b.UserId = p.OwnerUserId AND b.Date < p.ClosedDate ) x WHERE p.ClosedDate IS NOT NULL
The underlying execution plan will present the star of the show:
Just by looking to the plan, it’s perceptible that the Index Spool consumes most of the execution time. Digging deeper into the properties, it’s possible to confirm the serial zone associated to the Index Spool (presented in the lower right corner of the image), despite the parallel icon in the operator. The IO and time statistics will be relevant for further comparison: 3 394 189 logical reads, 3.25 seconds of CPU time and 3.07 seconds of elapsed time.
Between the grids described previously, it’s also possible to see the properties regarding Rebinds and Rewinds: 2 and 39 523 respectively, which is a pretty nice ratio (read this article for more details on that).
I Don’t Like Magic – But I’ve Made Spools Disappear
As stated earlier, an Index Spool is kind of the poor man’s index, so let’s try to figure that out. Hovering the operator in the execution plan will show that the columns of the table Badges being used in the spool are UserId and Date.
So, let’s create a proper index to provide SQL Server a more efficient structure to rely on:
CREATE INDEX IX_UserId_Date ON Badges (UserId,Date)
With that, we’re ready to re-execute our query.
I Love When a Plan Comes Together.
This time, there’s no spool in the execution plan:
the Clustered Index Scan plus the Index Spool is replaced by an Index Seek, which is much more effective returning the data and avoids the serial zone in the plan. This instance, our query resulted in only 128 931 logical reads, 436 milliseconds of CPU and 387 milliseconds of elapsed time.
I love when a plan comes together.