The Execution Plan Saga #1: Getting Started

When we need to troubleshoot a query that is having performance issues, there are two possible routes: one is to empirically modify the query, pushing buttons and knobs until, eventually, things get better. The other is to analyze the execution plan and take specific, focused actions, based on the identified problematic areas. Sometimes both approaches can lead to similar outcomes and the first can even be faster, especially when we’re not used to analyze execution plans. Nevertheless, the latter is a much more consistent approach, simply because there will be times when blindly applying modifications won’t fix the problem and you’ll need to figure out the root cause, in order to solve it.

An execution plan is a blueprint, a set of algorithms and properties that SQL Server processes in order to retrieve the requested data. If we are able to understand these algorithms and their inherent properties, putting them into context, we will be incredibly more efficient solving problems.

Elements of an Execution Plan

As already mentioned, an execution plan comprehends operators, with their respective properties and data flows. Operators represent specific algorithms chosen by SQL Server to perform the logical operations that materialize the specified query. Hence, they serve multiple purposes:

  • Read Data
  • Combine Data
  • Group / Sort Data
  • Manipulate Data
  • Modify Data
  • Performance-Related

On the other hand, arrows represent the data flow between operators. But just these operators and arrows alone tell us little, we must analyze properties to have a proper understanding of what’s happening. Each operator has its own specificities, so, inevitably, they will hold distinct properties, that are relevant in the context of that specific operation.

Despite that, we can make some conclusions just by looking at the execution plan e.g. the width of data flow arrows, the cost of each operator, or the time spent on each operation. 

It’s important to understand that we can look to the plan in distinct moments:

  • Before Execution: Estimated Execution Plan
  • After / During Execution: Actual Execution Plan

The names may suggest that two separate plans are created, but that’s not the case. It’s actually the same plan, the only thing that differs is the amount of information available.

In the Estimated Execution Plan, we can see, for example, how many rows SQL Server estimates that a particular operator will return, but as this is prior to execution, we obviously won’t be able to see the actual number of rows returned.

To compare the estimated vs actual row count, we’ll need to look into the Actual Execution Plan. Since it comprehends runtime statistics, it’s evidently preferable to use it rather than the estimated plan to troubleshoot problems, however, in those dark moments when for some reason we’re unable to get the Actual Execution Plan, having the Estimate Execution Plan is much better than nothing.

Troubleshooting Guidelines

Examining an execution plan can be overwhelming if we don’t know what to look for. The following guidelines are based on a set of sequential steps proposed by Grant Fritchey on his book “SQL Server Execution Plans”

Warnings

When a yellow warning sign or a red exclamation mark arises attached to an operator, it means SQL Server is telling us that there is something we should check. Despite the fact that sometimes the behavior related to the warning message is not the root cause of the problem we’re trying to solve, often it is and it’s just there at plain sight – easy buck.

To view the warning message, you can hover the operator’s icon or open its properties.

First Operator

The first operator (typically SELECT, UPDATE, INSERT or DELETE) holds very relevant information about the entire plan. The following list include some properties that will frequently be helpful to us:

  • QueryTimeStats: Values for CPU time and elapsed time (in milliseconds).
  • MemoryGrantInfo: Information about query memory requirements. More details about each property can be viewed in a previous post.
  • WaitStats: Top wait stats on the query execution, including the number of waits (WaitCount) and the total wait time (WaitTimeMs).
  • Parameter List: All parameters, including data type, value at compile time (if the parameter was sniffed), and value at runtime (in an actual execution plan). These values will help us to understand how parameter sniffing impacted the query.
  • QueryHash: Hash value resultant from the query text. It is used by SQL Server to find plans for similar or equal queries.
  • QueryPlanHash: Hash value resultant from the execution plan. It is used by SQL Server to find plans for similar or equal execution plans.
  • Set Options: Values of specific SET options at compilation time. It’s important to look at these options because different value combinations generate different plans. Sometimes these variations are innocuous and almost invisible, but other times they might shape a completely different plan. For example, ARITHABORT is one of those options that can trigger very distinct behaviors on the execution plan, depending on its value.
  • Type of optimization: Can either be “TRIVIAL” or “FULL”. A “TRIVIAL” optimization means that there were no meaningful execution alternatives for the optimizer to evaluate. It is important to stress that this not related to workload volume. If we execute SELECT * FROM HugeTable, the query can run for minutes, still the optimization level will show “TRIVIAL”, just because SQL Server hadn’t any other choice but to do a full table/index scan. In a “FULL” optimization SQL Server will evaluate the maximum alternatives possible, given the time and resources limitations. This optimization level also unlocks other properties, e.g.:
    • Reason For Early Termination Of Statement Optimization: Specifies why the optimizer gave up on optimizing the plan. The possible values are:
      • Good Enough Plan Found: The Optimizer found a plan with a cost below the threshold.
      • Time Out: Compilation time exceeded the maximum optimization time, defined at the start of the optimization phase.
      • Memory Limit Exceeded: Compilation stopped due to memory shortage.

Operator Cost

Commonly, we’ll see expensive operators standing out from the rest. When the cost is mainly in one or a couple operators, if we manage to improve those specific areas, it will have a substantial impact on the whole execution. Just remember that these costs provide guidelines and aren’t always accurate, some operators even have a fixed cost, which can be very misleading. Another visual hint that is quite useful in the Actual Execution Plan is the time displayed below each operator. This value can be cumulative (including time spent on child operators) or individual when dealing with Batch Mode operators.

Data Flow

Still regarding these visual hints, the thickness of data flow arrows is one of the coolest. Their width is proportional to the row count (based on actual row count when available, or on estimated row count otherwise).

Additionally, we must analyze abrupt transitions, i.e. where wide arrows become narrow, because that may indicate that the filtering is occurring too late. We want to “push” filters as far to the right side of the plan as possible.

Estimates Vs Actuals

As already covered in a previous post, statistics are absolutely vital in the process of creating an execution plan. Estimates underpin some algorithm choices and consequently, bad estimates can lead to bad choices. A particular algorithm might be efficient when dealing with few rows, but extremely inefficient when dealing with a huge number of rows. Nested Loops is a great example of that. That’s why it is so important to keep an eye on estimated vs actual row counts. If a large gap between estimated and actual row count led to a catastrophic algorithm choice, we’ll need to figure out how to improve those estimates or how to hint SQL Server to choose differently.    

Read Operators

Analyzing read operations is an important part of troubleshooting execution plans. Althought there is no such thing as an evil operator and we can’t state that a scan is a bad thing or a seek is always good, patterns like a scan that returns few rows or a seek that returns a huge amount of rows might be worth checking.

Posteriorly, we’ll analyze read data operators in greater detail.🖖

Aventura-profissional PNG from pngtree.com

Performance Woes & Wows #2: LIKE ‘%Make This Query Go Slower Please%’

Sargability is a funny word that’s used to characterize the ability of the engine to take advantage of and index, in order to speed up the execution of a query. The legend says that it is derived from a contraction of Search ARGument ABLE.

Usually, when some writes a query, the goal it’s to retrieve the desired information, as fast as possible. What’s interesting is that “as fast as possible” can either refer to “execution time” or “time I will spend writing this query” and we all know which one is, most of the times. As a rule of thumb, queries must be written in the simplest way possible, being the underlying reason that if it’s easy for you to read, it will also be easier for the Optimizer to read and process it.

But that’s not always true, for example, we use functions because they save us effort and improve query readability, but typically they don’t make Optimizer’s life easier. Au contraire, because they tend to make predicates non-sargable. On other occasions, bighearted developers decide to allow the customer to search on every damn existing column, using LIKE with leading and trailing % wildcards and that’s also a sargable capital sin, as we’re going to look further.

Non-Sargable Predicates

So, let’s talk a bit more about bad ideas. Not those that start with “Hold my beer and watch this”, but instead “It runs fast on my machine, let’s take it to Production”.

Non-sargable predicates have a negative impact on query performance, e.g. by causing poor cardinality estimates or inability to perform index seeks, which consequently will lead to inappropriate plan choices and increased CPU consumption.

The following list include some of the most common patterns that kill predicate sargability. We’ll very briefly overview them and posteriorly focus on one specific case.

ColumnX = Function(ColumnX): As already pointed out, functions can be naughty. One might be tempted to use functions because they encapsulate logic in a very practical way to us humans, but this often implies performance loss. The query presented below, where we extract the year from a date column, comparing it against a specific value, represents one of those cases. SQL Server won’t be able to use any index efficiently, even if we have an index on that field. We’ll shortly discuss why this happens.

SELECT COUNT(1) FROM Matches WHERE YEAR(Date) = 1999

ColumnX = ColumnY + ColumnZ: A similar behaviour is observed when we use the result of arithmetic operations between columns to retrieve results of a particular column. It’s important to stress that a predicate containing an arithmetic operation between a column and a constant literal value would also be non-sargable.

SELECT * 
FROM Matches 
WHERE FullTimeGoals = HalfTimeHomeGoals + HalfTimeAwayGoals

@Parameter1 IS NULL OR ColumnX = @Parameter1: The heart of that single query where search dreams come true and someone got really proud of it. It’s possible to search for everything. Slowly. In runtime SQL Server won’t be able to figure this out, hence the non-sargability. There are a couple of ways to get around this, being one of them to add OPTION(RECOMPILE). This hint enables Parameter Embedding Optimization, which gives the Parser the ability to replace query parameters with literal constant values during query parsing and enables further complex simplifications, like removing contradictions. Nevertheless, as this hint forces the recompilation of a statement, it introduces some overhead that needs to be taken into account.

CREATE OR ALTER PROCEDURE SearchNonFriendlyMatch 
(
    @Hometeam INT = NULL,
    @Awayteam INT = NULL,
    @Date DATE    = NULL,
    @FTG TINYINT  = NULL
)AS
BEGIN
	SELECT 
		HomeTeam,AwayTeam,Date, FullTimeGoals
	FROM 
		Matches
	WHERE
		League IS NOT NULL AND
		(@HomeTeam IS NULL OR HomeTeam = @HomeTeam) OR 
		(@AwayTeam IS NULL OR HomeTeam = @HomeTeam) OR 
		(@Date IS NULL OR [Date] = @Date) OR 
		(@FTG IS NULL OR FullTimeGoals >= @FTG)
END

ColumnX LIKE ‘%Make This Query Go Slower Please%’

When we have this wildcard prefixing and suffixing a string, we’re telling SQL Server to retrieve all records that contain the specified text, regardless of what comes before or after it.

SELECT  TOP 5 * FROM Teams WHERE Name LIKE '%ar%'

The execution of the previous query returns these results:

IdCountryName
7PortugalSC Beira-Mar
8EnglandCardiff City FC
23EnglandArsenal FC
28SpainFC Barcelona
68ItalyParma

But why is this bad for performance?

Let’s put ourselves in SQL Server shoes and pretend that we’ve received a list containing names of football teams, ordered alphabetically. We need to count how many of these names contain the letters “ar”. We roll up our sleeves and start analyzing all characters from the first record, then the second, then the third, and so on, until we reach the last item in the list. We had no alternative but to scan through all values, simply because the order of this list wasn’t helpful at all. Recalling the table above it’s easy to understand why: the substring can be everywhere, there is no shortcut, no possibility to discard any subset of these names. SQL Server will have the same exact struggle scanning through all the index represented by our list.

Now, let’s execute a similar query:

SELECT  COUNT(1) FROM Teams WHERE t.Name LIKE '%Benf%'

And look into its execution plan:

This is a pretty small table but it’s enough to prove the point. This query ran in about 600 miliseconds and IO statistics show 1 Scan Count and 1226 logical reads. The cost is almost entirely in the Index Scan, that was chosen even thought SQL Server estimated just 9 rows to be retrieved. Well, that wasn’t really a choice, in this case an Index Scan was made because the non-sargable search prevented SQL Server from using an Index Seek, as we already know.

With that being said, let’s remove the leading wildcard and execute the query:

SELECT  COUNT(1) FROM Teams t WHERE t.Name LIKE 'Benf%'

And this time, it is possible to confirm by the execution plan that the query was much more efficient using the Index Seek.

It has executed in virtually no time and IO statistics show 1 Scan Count and 3 logical reads.

If we return to our ordered list, it’s very easy to understand why the query executed so much faster: Now we can benefit from that alphabetical order, going directly to the letter B, looking through a few lines and quickly return the row that matches the search.

What Should I Do, Then?

We saw that the sargable version of our query ran in virtually no time, while the non-sargable version took about 600ms and even though proportionally this gap is huge, users don’t complain about half a second queries. Nevertheless, the real problem with this type of queries is they won’t scale. When you try to run this query against a table with millions of rows instead of a couple thousand, it will be much more painful. You’ve been warned.

But what can we do to avoid it?

The first and obvious option would be to analyze if that leading wildcard is really needed.
If you can simply get rid of it, problem solved! If not, there’s still hope…

One way to improve these searches is by using Full Text Search (FTS). FTS is able to quickly find all matches for a term search in a table, because it doesn’t have to scan rows. Instead it takes advantage of text indexes that store positional information for all terms found in the columns where we create the text index on. Instead of using LIKE operator, FTS enables the use of CONTAINS/CONTAINSTABLE or FREETEXT/FREETEXTTABLE, as exemplified below:

SELECT COUNT(1) FROM Teams t WHERE CONTAINS((t.Name), @Name) 

FTS has a few limitations and requires some setup effort, but I often work with it and it can be pretty awesome.

Additionally, there are newer, hotter technologies designed to solve these issues. For example ElasticSearch, that allows fast text searches in a very scalable way. It also allows pretty complex searches but, to me, the nicest advantage comparing to Full-Text Search is the possibility to take the heavy processing completely outside SQL Server. CPU cores in SQL Server can be very expensive and should be treated with respect, my boss says.

Now you now why you should avoid queries LIKE ‘%these%’ and you have some ideias to improve them. Hope this was helpful to you 🖖