Performance Woes & Wows #4: Filtered Indexes

Generally speaking, an index can improve the efficiency of read operations, by providing a data structure that SQL Server can use to seek or scan specific data pages, while skipping non-relevant pages in the process.

Regarding Non-Clustered Indexes, since they typically represent narrower copies of the underlying table, even if a complete index scan is required, the amount of data pages that end up being read can be significantly smaller in comparison to an equivalent Clustered Index scan.

Filtered indexes can improve this process even further, simply by reducing the amount of data comprehended by the index, hence reducing the number of nodes and data pages to be traversed.

In order to illustrate a scenario where the usage of a filtered index can be beneficial, let’s create a store procedure in the StackOverflow2010 database, that will retrieve all questions (PostTypeId=1) with a creation date and a score greater than or equal to the inputted values:

CREATE OR ALTER PROCEDURE GetQuestionsByDateAndMinimumScore
(@Score int,@CreationDate datetime) AS
BEGIN
SELECT 
	p.Id,p.creationDate,u.DisplayName, 
        p.title,p.Score,p.ViewCount,p.AnswerCount
FROM 
	Posts p
INNER JOIN 
	Users u ON u.id = p.OwnerUserId
WHERE 
	p.CreationDate >= @CreationDate AND
	Score >= @Score AND
	PostTypeId = 1 
ORDER BY 
	CreationDate DESC
END


Next, we’ll create a non-filtered index on the table Posts, in order to improve the performance of our query.

CREATE INDEX IX_CreationDate_PostTypeId_Score 
ON Posts (CreationDate,Score,PostTypeId)

After its creation, it’s time to execute the store procedure and analyze our baseline values.

EXEC GetQuestionsByMinimumScoreAndDate 
              500, '2009-07-31 00:00:00.000'

In the execution plan, we are able to see that SQL Server used our index to seek the Posts table and ended up reading 2 809 841 rows, which translated into 12 000 logical reads.

Now, the fun part. Since we care only about question posts and they only represent about one third of the table, we could create a filtered index instead.

CREATE INDEX IX_CreationDate_PostTypeId_Score_Filtered
ON Posts (CreationDate,Score,PostTypeId)
WHERE (PostTypeId = 1)

Finally, after executing the procedure with the same parameters, we should be able to see some improvements.

And indeed we do. This time SQL Server ended up reading 875 361 rows and the IO statistics show 5676 logical reads on the Posts table. Noice.

I’ll Only Create Filtered Indexes From Now On

I mean, you could. You probably shouldn’t though.

In our example we were interested only in querying a specific PostTypeId, but if we needed to address another type, e.g. Answers, our index would be pretty useless and even considering a filtered index would make no sense, because in that particular case, both types represent almost the entire table.

The takeaway is that filtered indexes can be great for performance, but we must always keep in mind the selectivity of what we want to filter and the gains versus costs, because although filtered indexes are lighter by concept, they aren’t exactly free.



Categorias:#sqlserverperformance

Deixe uma Resposta

Preencha os seus detalhes abaixo ou clique num ícone para iniciar sessão:

Logótipo da WordPress.com

Está a comentar usando a sua conta WordPress.com Terminar Sessão /  Alterar )

Google photo

Está a comentar usando a sua conta Google Terminar Sessão /  Alterar )

Imagem do Twitter

Está a comentar usando a sua conta Twitter Terminar Sessão /  Alterar )

Facebook photo

Está a comentar usando a sua conta Facebook Terminar Sessão /  Alterar )

Connecting to %s

<span>%d</span> bloggers like this: