Querying the Same Thing. Faster.
In a previous post, we’ve seen that equivalent queries can produce distinct execution plans – Yes, how nice and intuitive of you SQL Server. But this happens for a logical reason: The Query Optimizer
hates you doesn’t have the means to explore every possible semantic equivalence. Hence, the shape of the execution plan will be intrisically bounded to the way we write our queries.
In this post, we’ll extend this notion to a more real-world like example, which I found very interesting because it seemed kind of counterintuitive to me.
Let’s assume we want to count every Question, Wiki and TagWikiExerpt post, in the Posts table of the StackOverflow2010 database. In order to achieve that we can create the following index:
CREATE INDEX IX_PostTypeId ON Posts (PostTypeId)
And then execute this query:
SELECT COUNT(1) FROM Posts WHERE PostTypeId IN (1,3,4)
We can confirm that the execution plan is comprised by a scan to the previously created index and has an estimated cost of 3.27277. The IO statistics show 1915 logical reads and regarding CPU time and duration, 129ms and 123ms, respectively. These time statistics remained very identical on all the performed executions.
the Same Thing.
Now let’s do the same thing, expecting different results. Well, almost the same thing. Let’s just change the previous query a little bit, by replacing the IN operator for this logical equivalent, but uglier and more complex version:
SELECT SUM(x.c) FROM ( SELECT COUNT(1) c FROM Posts WHERE PostTypeId = 1 UNION ALL SELECT COUNT(1) c FROM Posts WHERE PostTypeId = 3 UNION ALL SELECT COUNT(1) c FROM Posts WHERE PostTypeId = 4 ) x
These changes are reflected in the execution plan, as expected:
The really interesting part about this plan is… That it’s better than the first one. Don’t believe me? Check for yourself:
This query only consumed 47ms of CPU time (-63.6%) and ran for a total of 76ms (-38.2%). The difference was not quite significant regarding logical reads, nevertheless a lower value was achieved with 1912. Accordingly, the estimated plan cost was also lower, 2.68443. Sure, some assumptions behind the cost calculation may be outdated and not entirely accurate, but this time, the prediction was entirely correct.
Rewriting queries using equivalent syntax seems like a valid approach when trying to improve query performance. Nevertheless, I don’t know any set of rules or even guidelines that may help in this decision process. It’s more like an empiric procedure, one day on the threshold of despair you try a thing that unexpectedly works and later in the future when a similar pattern emerges, you’ll know that the time has come. Why it works? Because of reasons. But if it works, it ain’t stupid. Best advice of the year.