It’s All About Stats!

Statistics shape the way executions plans are built and have a massive impact on query performance. In order to achieve a better understanding on this relation, we will start to briefly explore what happens when a query is executed and where do statistics come into play.

Query Execution Phases

When a query is executed, there’s quite a lot happening in the Relational Engine inside SQL Server. In a higher plan, this process can be divided into four sequential phases: Parsing, Binding, Optimization and Execution. Each phase requires an input that is consumed and transformed into an output that is forwarded to the next phase.

Parsing

In the first step, a syntactic validation of the T-SQL is done, also known as query parsing. The example provided in the image below, represents a query that failed to be parsed, in that case, because the SELECT clause was miswritten.

The output of this phase is a parse tree, which comprehends the logical steps required to perform the query.

Binding

When the query is successfully parsed, query binding takes place. This process, owned by the algebrizer will resolve the parse tree, trying to bind the respective strings to specific objects within the database. This translates to, for instance, verifying if referenced tables and columns exist under the current execution context. It’s important to mention that the query can use names that do not match directly to any database object, e.g. if aliases are being used. So, it’s also the algebrizer job to resolve these cases. Additionally, it will also identify datatypes and process aggregate functions to some extent (known as aggregate binding). The following example shows a query that could not be resolved, because it refers to a column that does not exist.

What do you mean, “invalid”? How am I supposed to fix slow queries?

The output of this phase is a query processor tree, that will be handed to the query optimizer.

Optimization

Well, in fact, the query processor tree isn’t the only input received by the optimizer. Building a new plan from scratch consumes resources, specifically time and CPU. In order to manage effectively this resources, SQL Server will keep plans in cache, whenever possible, in order to reuse them. So, the optimizer will also receive the hash code of the query currently being executed, as an input. This hash code will be used to probe the existence of a valid execution plan in cache, that matches the query. If there is a match with a valid plan, the optimization process stops and the cached plan is reused. However, if changes were made to any table referenced in the query, or statistics were updated in the meantime, the plan is considered invalid and must be rebuilt.

If no valid plan is found, the optimizer creates a new plan, which in essence is an analysis of many alternate ways to achieve the expectable result. The optimizer estimates a cost for each alternative approach and tries to find a plan, cheap enough, in an incredible short amount of time. Trivial plans (e.g. SELECT Col1 FROM Table1) don’t go through this optimization process, as they typically don’t have alternative approaches to be considered, due to their simplicity. More complex queries will be subject to a full cost-based optimization process, resulting in a cost-based plan, calculated according to the cardinality estimations. After the cheapest possible plan is chosen and built, it is handed to the Execution Engine.

Execution

Finally, in the execution phase, the Execution Engine will receive and process the execution plan. Validations on the plan are reinforced at runtime, to confirm that it is still valid. If something happened in the meanwhile that changed the initial state, for example because statistics were updated, execution is paused, the compilation process is invoked and a new plan is built for the affected statement(s).

The Importance of Statistics

Statistics provide information about the distribution and selectivity of values across columns, using a sample that will represent the entire universe of values. The selectivity of a value is the ratio between the rows that pass the selection criteria and the total number of rows. We can extrapolate this principle and think in the selectivity of a column as the ratio between the number of distinct values and the total number of rows. This basically defines the uniqueness of what you are trying to find and it will provide guidance to the optimizer on how to shape the plan. In a side note, keep in mind that statistics can be manually created, (whether directly or through the creation of indexes) or automatically created by the SQL Server when a query is executed, for the optimizer to consume them.

This process is fundamental because we want our queries to run as fast as possible, what wouldn’t happen if the optimizer had to scan all data in all referenced tables.

Going Under the Hood

For the purposes of this demo, we’ll only cover the usage of single column statistics, equality predicates and single table queries. Bear in mind that the behavior differs when using multiple column statistics, joining other tables or applying more complex predicates.

First, we’ll run the following query, which will retrieve the matches where Manchester City (Id=77) played at Etihad Stadium.

SELECT  * FROM Matches WHERE HomeTeam = 77

In the execution plan, we’ll see that SQL Server estimated, correctly, that 390 would be retrieved, from a total of 535 983 rows.

If we open the XML execution plan, we’ll be able to confirm which statistics were used in order to make the estimate, under “OptimizerStatsUsage“.

Next, we use DBCC SHOW_STATISTICS to view to view those statistics, related to HomeTeam column.

DBCC SHOW_STATISTICS(N'Matches', _WA_Sys_00000005_73BA3083)

The information that makes up statistics is divided in three sections:

  • Header: Metadata about the statistics.
  • Density Vector: Selectivity of the data, used to measure cross-column correlation.
  • Histogram: Distribution of values in the first key column of the statistics object.

A histogram is composed by the following columns:

  • RANGE_HI_KEY: Upper-bound value for a step (always based on leftmost column).
  • RANGE_ROWS: Number of rows with a value falling within a histogram step, excluding the upper bound.
  • EQ_ROWS: Number of rows whose value equals the RANGE_HI_KEY.
  • DISTINCT_RANGE_ROWS: Number of rows with a distinct column value within a histogram step, excluding the upper bound.
  • AVG_RANGE_ROWS: Average number of rows with duplicate column values within a histogram step, excluding the upper bound calculation (RANGE_ROWS / DISTINCT_RANGE_ROWS)

In this case, because 77 hits a RANGE_HI_KEY in the histogram, we’ll be able to notice that the estimated value is on the column “EQ_ROWS” of that row.

The formulas to calculate the estimated number of rows and selectivity, can be expressed as it follows:

Selectivity is the key to combine statistics. Next, we’ll combine two single column statistics, by adding another equality predicate do the WHERE clause of the previous query. It’s important to stress that the version of the Cardinality Estimator (CE) dictates how calculations are made. The legacy CE (prior to SQL Server 2014) assumes that there is no correlation between multiple columns. Therefore, the cardinality is calculated with the following formula, that comprehends these variables: estimated number of rows (E), selectivity of the first (S1) and second (S2) columns and the total number of rows (R).

Contrarily, the new CE assumes some correlation between columns and estimates the number of rows as it follows:

Now, let’s run a new query that retrieves the games where Manchester City played at home and the full-time result was a win to the away team.

SELECT  * FROM Matches WHERE HomeTeam = 77 AND FTR = 'A'

We already know the values related to the predicate HomeTeam = 77, so let’s focus on the full-time result column (FTR) and the specific value “A”. If we open the XML execution plan, we’ll notice that the statistics of another column were added.

We then repeat the DBCC SHOW_STATISTICS command, changing it to match the above highlighted stats, related to FTR column.

DBCC SHOW_STATISTICS(N'Matches',_WA_Sys_00000009_73BA3083)

These statistics will show us the same 535 983 total rows and 9597 occurences of “A” in the FTR column.

Ok, now we have all the data we need, so let’s calculate the estimated number of rows for our query.

If we open the execution plan, we’ll be able to confirm that the our calculations match the ones made by SQL Server.

And as expected, if we change to the legacy CE, we will observe that the estimate has changed.

Accordingly, if we apply the formula used by the legacy CE, our results will also match:

In this case, the new CE did a better job estimating the workload, nevertheless, it doesn’t mean that the legacy CE is ready for retirement. In fact, in many cases the tables will turn and grandpa CE will do a better job. But, generally speaking, you won’t solve that many problems just by changing the CE version, despite the fact that it can have an observable impact, depending on the query specificities.

Conclusion

We’ve briefly looked into the phases of query execution and stated that the optimizer will choose the lowest-cost plan, based on its estimated cost that largely depends on the cardinality estimations. This implies that the quality of the plan choice is highly correlated to the quality of the statistics used by the optimizer. In a future post we’ll see in practice the negative impacts of bad estimates and what can be done to avoid some tears.

Until then, stay safe 🖖

Performance Woes & Wows #1: Implicit Conversions

Implicit conversion… Sounds very clever, so it can’t be that bad, right? Well, in fact, implicit conversions can be huge performance killers and silent enough to stay off the radar, if we are not aware. The good news is that they tend to be very easy to solve, so let’s make an easy buck.

The story begins with mismatched data types on a JOIN operator or a WHERE clause that force SQL Server to implicitly convert values, on the fly. This can be as subtle as mistaking a VARCHAR for an NVARCHAR, although the consequences can be pretty awful. But, before diving into the problem itself, let’s talk about conversions and data type precedence.

Implicit vs Explicit Conversion

Data types can be either implicitly or explicitly converted.

Implicit conversions are nearly invisible to the user because they are automatically executed, i.e. they do not require the explicit use of any function.

Contrarily, in explicit conversions, CAST or CONVERT functions must be used. The recommendation is to use CAST function if ISO compliance is desirable and CONVERT function in order to take advantage of the style functionality.

Nonetheless, it’s not possible to convert every possible data type combination. The following matrix (provided by Microsoft here) shows all data type conversions for SQL Server system-supplied data types.

It’s important to notice that when an explicit conversion is done, the resulting data type is determined by the statement itself.

Implicit conversions like assignment statements, result in the data type that was defined by the variable declaration or column definition. For comparison operators or other expressions, the resulting data type depends on the rule of data type precedence.

Data Type Precedence

This rule simply states that, when an operator combines expressions of different data types, the data type with the lower precedence is converted to the data type with the higher precedence. The order is defined as follows:

1. user-defined data types (highest)11. real21. text
2. sql_variant12. decimal22. image
3. XML13. money23. timestamp
4. datetimeoffset14. smallmoney24. uniqueidentifier
5. datetime215. bigint25. nvarchar
6. datetime16. int26. nchar
7. smalldatetime17. smallint27. varchar
8. date18. tinyint28. char
9. time19. bit29. varbinary
10. float20. ntext30. binary (lowest)
Data type precedence (Transact-SQL)

How to Uncover Implicit Conversions

There are a couple of ways to uncover implicit conversions. The most usual is, probably, by analyzing the Execution Plan. If we’re inspecting the execution of a query where an implicit conversion that may affect cardinality estimate happened, a warning will be displayed. We’ll further look into this.

It’s also possible to capture these occurrences through the use of DMVs. I’ve seen some examples using sys.dm_exec_query_plan DMV in order to accomplish it, but we’re going to use the brand new and awesome sys.dm_exec_query_plan_stats DMV brought by SQL Server 2019. We’ll take a deeper look into it in a future post, but for now just keep in mind that this DMV will show us the last actual plan for a given query. Also, make sure to activate the scoped configuration as specified below:

ALTER DATABASE SCOPED CONFIGURATION SET LAST_QUERY_PLAN_STATS = ON;

--ImplicitConversions
DECLARE @DBName VARCHAR(20) = 'FStats';

SELECT TOP 30
	t.text AS 'Query',
	qs.total_worker_time/qs.execution_count AS 'Worker Time (Avg)', 
	qs.total_elapsed_time/qs.execution_count AS 'Elapsed Time (Avg)',
	qs.total_logical_reads/qs.execution_count AS 'Logical Reads (Avg)',
	qs.total_physical_reads/qs.execution_count AS 'Physical Reads (Avg)',
	qs.max_elapsed_time AS 'Max Elapsed Time',
	qs.max_logical_reads AS 'Max Logical Reads', 
	qs.max_physical_reads AS 'Max Physical Reads', 
	qs.execution_count AS 'Execution Count',
	qp.query_plan AS 'Query Plan'
FROM 
	sys.dm_exec_query_stats AS qs WITH (NOLOCK)
CROSS APPLY 
	sys.dm_exec_sql_text(plan_handle) AS t 
CROSS APPLY 
	sys.dm_exec_query_plan_stats(plan_handle) as qp
WHERE 
	t.text NOT LIKE '--ImplicitConversions%' AND
	CAST(query_plan AS NVARCHAR(MAX)) LIKE ('%PlanAffectingConvert%') AND
	t.[dbid] = DB_ID(@DBName)

Finally, another more comprehensive approach might be setting an Extedend Event to collect implicit conversions happening in the database. Creating a plan_affecting_convert event will allow us to uncover these CONVERT_IMPLICIT warnings. For the purposes of this demo it will be configured as it follows:

CREATE EVENT SESSION [xe_ImplicitConversions] ON SERVER 
	ADD EVENT sqlserver.plan_affecting_convert
	(
    	     ACTION(sqlserver.transaction_id)
    	     WHERE ([sqlserver].[database_name]=N'FStats')
        )
WITH (STARTUP_STATE=ON)

Implicit Conversions Woes

But why should we care about implicit conversions? Does it impact performance that much? Let’s find out! The following query retrieves the disciplinary sheet of football matches where 4 or more players were sent off.

DECLARE @NumRedCards tinyint = 4;
SELECT  
	MatchDetails, 
	HF HomeFouls, AF AwayFouls, 
	HY HomeYellowCards, AY AwayYellowCards, 
	HR HomeRedCards, AR AwayRedCards, TRC TotalRedCards
FROM 
	MatchesDetails 
WHERE 
	TRC >= @NumRedCards

Everything looks fine, right? Despite that, if you sense evil lurking, you’re absolutely correct. Using the previous xe_ImplicitConversions session, we’ll be able to see this:

Accordingly, the set of DMVs specified above, will reveal conversion problems that may affect cardinality estimate in the query plan choice.

If we click on the query plan and then hover on both operators, the following information will pop up:

Besides the obvious warnings, it’s clear that a nasty Clustered Index Scan was done and it’s quite strange that SQL Server didn’t cry for an index, isn’t it? Let’s create the bloody thing and ignore the implicit conversion, for now.

CREATE NONCLUSTERED INDEX IDX_MatchesDetails_TRC_Includes ON MatchesDetails(TRC)
INCLUDE(MatchDetails,HF,AF,HY,AY,AR,HR)
WITH(DATA_COMPRESSION=PAGE)

Noice! Now, with an unstoppable confidence, we will rerun the query and the plan will unequivocally show us that we’ve…

Failed. Miserably. An ugly Clustered Index Scan is still happening and the logical reads remain at 161 516.

Ok, so let’s change the approach here. First by dropping the index:

DROP INDEX IDX_MatchesDetails_TRC_Includes ON MatchesDetails

Then by analyzing the warning presented in the execution plan. According to it, there’s a data type mismatch between TRC and @NumRedCards. If we look into the definition of MatchesDetails table, we’ll confirm that TRC is, in fact, a varchar – genius.

Solving the Case

Elementary, my dear Watson.

Although the correct path here would be to alter the column data type, for simplicity’s sake we’ll just change the data type of @NumRedCards to varchar(2) and rerun the query.

DECLARE @NumRedCards varchar(2) = 4;
SELECT  
	MatchDetails, 
	HF HomeFouls, AF AwayFouls, 
	HY HomeYellowCards, AY AwayYellowCards, 
	HR HomeRedCards, AR AwayRedCards, TRC TotalRedCards
FROM 
	MatchesDetails 
WHERE 
	 TRC >= @NumRedCards

This time, although the Clustered Index Scan hasn’t vanished, we got a missing index hint that corresponds exactly to the one created earlier. After recreating it, and rerunning the last query, the result will be the following:

Success! Just like a rainy cloud, the warning has disappeared and the day is sunny again. We’re now efficiently using the index with an Index Seek and the logical reads dropped to a glorious 4.

Cry Me a River

There’s nothing much left to say, in this context, regarding how implicit conversions can mess up the cardinality estimates and consequently ruin SARGability, index usage, and burn CPU. Despite that, it’s relevant to mention that implicit conversions aren’t always a problem, even if SQL Server says so. One example of that is when the conversion occurs on the SELECT statement, after the rows are fetched. SQL Server will have an embarrassing outbreak, threatening you with the same old warnings, stating that cardinality estimate can be affected, but it’s all a lie, it certainly won’t. Nevertheless, stay alert for the real performance killers! 🖖

MemoryGrantInfo Info

As suggested by the title, I like redundancy, bad titles and we’re going to have a brief look into MemoryGrantInfo.

In a previous post, we’ve talked about an awesome feature from the new SQL Server 2019 – Row Mode Adaptive Memory Grant Feedback. We saw this feature in action and briefly explored MemoryGrantInfo, but some fields were not explained. So, let’s recall the MemoryGrantInfo section and analyze its properties:

DesiredMemory: Amount of memory (in kilobytes) estimated to be needed to run the execution plan, based on the cardinality and data size estimation. This value will be cached as part of the plan and will remain the same – if Adaptive Memory Grant Feedback is not enabled.

GrantedMemory: Amount of memory that was actually granted at runtime. This value can be lower than DesiredMemory if the server doesn’t have enough available memory at the moment of the grant. Due to this fact, it’s possible that its value differs across similar executions, depending on the server’s available workspace memory at the time. When the grant isn’t high enough to keep all data in memory it will write it into tempdb – this is also known as a spill and it can degrade query performance. This doesn’t mean that a lower grant will always cause a spill, nor that a spill will always have a substantial impact on performance, nevertheless, that’s definitely something to keep an eye on.

GrantWaitTime: Time (in seconds) that the query had to wait before it was granted the memory to execute. This will translate into RESOURCE_SEMAPHORE waits.

IsMemoryGrantFeedbackAdjusted: Current state of Adaptive Memory Grant Feedback, that will be represented by one of the following values: NoFirstExecution, YesAdjusting, YesStable, NoAccurateGrant, NoFeedbackDisabled. For more details, check this post about Adaptive Memory Grant Feedback.

LastRequestedMemory: Indicates RequestedMemory value from the previous execution. While trying to calculate the ideal memory grant, Adaptive Memory Grant Feedback will use the requested value from an execution in order to improve the following. This property will allow us to easily check how the grant is being adapted.

MaxQueryMemory: Maximum memory allowed for a single query, according to Resource Governor’s MAX_MEMORY_PERCENT configuration, which by default is 25% of total query memory. In a case where operators are spilling data into tempdb and estimates are fairly correct, we might be experiencing memory starvation.

MaxUsedMemory: Amount of memory actually used by the query during execution.

RequestedMemory: Amount of memory requested, based on RequiredMemory, DesiredMemory, SerialRequiredMemory, SerialDesiredMemory, and MaxQueryMemory. When any query is executed, SQL Server Engine checks if the desired memory exceeds the Max Query Memory or not, if it exceeds, then it reduces the requested memory by adjusting parameters as Degree Of Parallelism (DOP), until it fits in.

RequiredMemory: This property represents the required memory to create the respective internal data structures, for a given DOP, when the query runs in parallel. If the query is running in serial mode, RequiredMemory value will be the same as SerialRequiredMemory. It’s mandatory that this amount of memory is available, in order to the query to begin execution.

SerialDesiredMemory: Amount of memory estimated to be needed to run the execution plan on serial mode, without spilling to disk.

SerialRequiredMemory: Required memory for a serial query plan to execute. It’s mandatory that this amount of memory is available in order to the query to begin execution.

And that’s it.

Rowstore Adaptive Joins

In this day and age, join operations in SQL Server can be made using one of the following algorithm types: Nested Loops, Merge or Hash. In theory, each one provides better performance under different conditions (e.g. amount of data).

  • Nested Loops: One of the tables is designated as the outer and other as the inner table. Each row of the outer table, is validated against all rows from the inner table and if a match is found, the row is included in the result. Nested Loop is tipically preferred when tables are small. There are three variants for a Nested Loop Join:
    • Naive: Scans an entire table or index;
    • Index: Uses an existing index to seek data;
    • Temporary: Index is built as part of the query plan (and destroyed upon completion of the query);
  • Hash Match: Hash Match consumes two inputs, corresponding to the joined tables. A hash table is created using the first input (also known as Build), then the same is done for the second input (Probe). The process of creating a hash table starts with hashing joined key values of the Build table and placing them on the appropriate bucket, depending on the hash value. Afterwards, Probe is processed, same hash values are applied, target bucket is determined and finally the value is compared to the ones already inside. If there is a match and after SQL Server verifies that no hash collision happened, the row is returned. Build should always be smaller because it will be stored in memory.

And that’s it. No, I did not forget about Merge Join, it happens that Merge was just ignored (like Internet Explorer or Terms of Agreement). Adaptive Join allows SQL Server to choose between Nested Loops and Hash joins on the same execution plan, deferring this choice, so it can understand which one will be better, based on the query input.

So, if until now (at least in rowstore) there wasn’t a way to adapt to different workloads inside the same execution plan and specific join algorithms thrive under different conditions, how can, in theory, Adaptive Joins save the day?

The Horrors of Parameter Sniffing

Parameter sniffing is easily one of the most common problems that almost anyone who works with databases have to deal with. Well, to be fair, parameter sniffing is not a problem by itself. The problem is giving the power of choice to users. Choice is the root of all evil.

A store procedure gets executed for the first time, the plan is created based on its input and then is cached. If no recompile is done, the plan will remain the same and future executions will reuse it, regardless of the input. In theory that’s good, because bad things happen when there isn’t enough memory available – and creating new plans from scratch is quite expensive. The problem begins when, in a subsequent execution, the user changes the input in a way that the query returns a million rows, instead of ten. As mentioned in the previous section, each join type thrives under distinct conditions. If in the first execution, a small amount of data was scanned, SQL Server has potentially used Nested Loops and done something like an Index Seek plus a couple Key Lookups. In the later execution, when SQL Server tries to process a million rows using the same approach, it suddenly has a heart attack. Telephones start to ring and clients start o yell as its soul quickly ascends to heaven.

We can try defibrillating by using RECOMPILE, but if that query runs too often, CPU will quickly escalate to 100% and we lose the patient, again. We can also try to optimize the query to a particular case by specifying a join hint or using OPTIMIZE FOR hint, but even if we get a plan that fits appropriately to the majority of executions, the deviations will potentially cause SQL Server to die in agony. Dividing the store procedure into multiple store procedures according to the input, can do the trick, but in some cases, queries can be too complex, not to mention that business logic and magic numbers hardcoded in queries are generally not a good idea. So, what’s left besides a bunch of bad ideas?

Perhaps Adaptive Joins is the hero we deserve. Basically, SQL Server defines a threshold (related to the number of rows) that represents the tipping point from which becomes more efficient to use Nested Loops instead of Hash Match. Adaptive Join will start the execution as a Hash Join, then, after processing the input from the Build phase, it will compare the resulting number of rows to this threshold. If the number of rows is greater or equal, it will proceed execution, otherwise the algorithm type is change to Nested Loops.

Therefore, SQL Server becomes able to adapt a single plan to different workloads. If a large number of rows are being processed, it can use Hash Join, otherwise, if few rows are involved, Nested Loop Join will be chosen. No recompile. No hints. No Columnstore wizardry needed.

Now, let’s check if Adaptive Joins will live to the expectations.

Improvise Adapt Overcome

For the purposes of this demo, I’m using a sample of FootballData public stats. Additionally, I’ve added some data in order to facilitate the demonstration of some concepts and behaviors. Bear in mind that in Standard Edition you won’t be able to use this feature.

We’ll firstly set database compatibility level to 140 and execute a store procedure with two different parameters, so we can understand the behavior prior to SQL Server 2019.

ALTER DATABASE [FStats] SET COMPATIBILITY_LEVEL = 140
GO
SET STATISTICS IO ON
GO
CREATE OR ALTER PROCEDURE GetHomeVictoriesByTeamId      
	@TeamId INT      
AS          
BEGIN          
SELECT  
	m.Id,  
	m.FTHG HomeGoals,  
	m.FTAG AwayGoals,    
	d.Att Attendance,  
	d.Stadium,  
	d.Att  
FROM           
	Matches m          
JOIN           
	MatchesDetails d ON m.id = d.MatchDetails       
WHERE      
	m.HomeTeam = @TeamId AND 
	FTR = 'H' AND 
	d.Att > 20000           
ORDER BY       
	d.att DESC        
END 
GO
EXEC GetHomeVictoriesByTeamId 2
EXEC GetHomeVictoriesByTeamId 1

Ouch! Both execution plans used Nested Loops and Index Seek to retrieve results. The plan was created and cached in the first execution that retrieved 18 rows. Then SQL Server reused the plan to process 500304 victories from Benfica (Id = 1) and things got nasty. We can confirm in Messages that the second plan was not efficient.

By executing the same statement again with recompile, we can notice how a more appropriate plan differs. MatchesDetails’ Scan count dropped from 500304 to 5 and logical reads from 1597472 to 1539. That’s more like it! But, as we discussed, using option recompile is far from being always appropriate, due to the fact that it is a heavy resource consumer.

EXEC GetHomeVictoriesByTeamId 1 WITH RECOMPILE;

Next, we’ll set database compatibility level up to 150 and run the store procedure again, so we can see Adaptive Joins in action.

ALTER DATABASE [FStats] SET COMPATIBILITY_LEVEL = 150
GO
EXEC GetHomeVictoriesByTeamId 1;
EXEC GetHomeVictoriesByTeamId 2;
EXEC GetHomeVictoriesByTeamId 1;

By looking into the first execution plan, we’ll notice the “new” Adaptive Join operator (this looks intuitive), so it looks like we’re ready to adapt and overcome. There are a few things going on in this plan, so let’s analyze it step by step.

The properties of this operator will show us some interesting information, e.g. that the threshold for the algorithm to change to Nested Loops is 9607.83. We also can see that 500304 rows were returned, so the chosen join type was “HashMatch”. This is the expected behavior as the number of rows is higher than the threshold, hence not changing to Nested Loops.

If we recall the execution plan, it’s easy to corroborate this information. Since the number of rows exceeded the threshold and a Hash join was done, the execution went to the Index Scan branch. This can also be confirmed by the displayed number of rows that flowed through this operator, in this case 500304. Correspondingly, the lower Index Seek branch, had no rows.

The second plan, corresponding to an input that retrieves far less rows, shows a different execution branch. In the Adaptive Join operator info, it’s possible to notice that the chosen join algorithm type (NestedLoops) is appropriate to the number of rows – 18. It’s also visible that these rows were retrieved by an Index Seek instead of an Index Scan. So far, so good.

The third and last plan, with the same input as the first will be exactly the same, right? Well… Sadly no. Although the operators are the same and the execution went through the same branch, it looks like we landed on spillage village. Both Sort and Adaptive Join have warning signs on them, stating that data was spilled to tempdb and we can see that the query took longer to run. Despite that, IO statistics maintain the same values of the first execution.  Interestingly, this behavior is caused by another adaptive query processing feature that was explored in a previous post– Adaptive Memory Grant Feedback.

If you are lucky enough, after a couple executions, SQL Server will be able to adapt the memory grant to its ideal value and just like clouds, when these warnings disappear, it will be a beautiful day. Unfortunately, sometimes SQL Server isn’t able to do that and gives up on adapting the grant (for more details, check the mentioned post). Let’s execute the store procedure a couple more times and check the outcome.

EXEC GetHomeVictoriesByTeamId 1
GO 30

Under these circumstances, SQL Server wasn’t able to adapt the memory grant and we didn’t get the stable ending we were looking for. Nevertheless, things could be worse: The initial memory grant is fairly adequate, so no spills this time. Furthermore, the chosen join algorithm is appropriate, IO statistics and execution time are pretty similar too.

In conclusion, looks like Adaptive Joins can really improve the performance of our servers, with virtually no effort needed. Nonetheless, it’s probably not a good idea to go all-in and change your production server compatibility level to 150, hoping that the robots will do what they were programmed to do. You must have seen enough movies to know that it’s not how it ends. But seriously, Adaptive Joins won’t fix all problems related to parameter sniffing and can actually cause problems that aren’t easy to troubleshoot. So, it’s reasonable to proceed cautiously and make use of tools like Query Store to fix regressions.

Row Mode Adaptive Memory Grant Feedback

Memory Grants

Throughout the phases of query processing there are several memory consumers that the memory broker and the memory manager inside SQL Server try to handle efficiently.  From those, compile, cache and memory grant are amongst the most significant. 

Compile requires memory to build and search a proper plan out of all candidates, whereas cache represents the memory consumed to store compiled plans in cache for reuse. Lastly and most importantly, the memory grant represents a segment of server memory used to store temporary data while sorting and joining rows. The grant is reserved prior to the query execution and lasts for the query lifetime, its purpose is to improve query reliability under server load. The rationale is, a query with reserved memory is not only less likely to run out of memory while executing, but also to take the entire server memory.

Grants and Estimates

When SQL Server creates a compiled plan it calculates two parameters that will define the amount of requested memory:

  • Required memory: The memory required to create internal data structures that will handle sorts and hash joins. It is mandatory that this amount is available in order to the query to be executed. The wait type that is accumulated if memory isn’t available is RESOURCE_SEMAPHORE.
  • Additional memory: It’s used to store temporary rows in memory. This value depends intrinsically on the cardinality estimate of expected number of rows and their average size. It’s not required that this amount of memory is available, the data can be stored into disk instead, although depending on the quantity of data this can be as good as shooting yourself in the foot.

After deciding whether a memory grant is necessary or not, calculating the DOP and the memory limit for one query, these parameters are used to calculate the ideal query memory, i.e. the amount of memory the query needs to run properly, based on its cardinality estimate. Finally, SQL Server verifies if the ideal memory exceeds the per-query limit (reducing it if needed) and asks Resource Semaphore to grant the requested memory. 

Now, if memory grants are deeply related to cardinality estimate, what happens if the numbers we get are not accurate, for example due to outdated statistics? The answer is… Pain. But as some pains are worse than others, we also might be more or less miserable depending on whether SQL Server overestimates or underestimates the work it thinks it has to do. 

When SQL Server overestimates the work to be done in a particular query it will grant more memory than necessary. Although one might think that like money, there is no such thing as too much memory, that will come at a cost. This means we’re stealing memory from a limited source that feeds cache, buffer pool and other queries. In extreme cases this might cause queries to lack required memory, hence not executing. And to make things worse, until now, in row mode, this behavior would just keep repeating itself, execution after execution, until, potentially, the server dies in agony. 

On the other hand, if SQL Server underestimates the workload, it may not grant enough memory to keep things out of tempdb. If it was expecting a few rows and ended up having to deal with millions, it will spill data to disk in order to handle the mess and this may have a significant impact on performance. As in overestimates, SQL Server would not learn from its mistakes and data would just continuously be spilled to disk in each execution.

But… Could inadequate memory grants be the root of all evil? Probably not, but if wait types like IO_COMPLETION or RESOURCE_SEMAPHORE start to accumulate that might just be the case.

Doctor, My Memory Hurts

In such circumstances there are tools that can be used to diagnose and troubleshoot these issues. As SQL Server editions went by, the number of available tools also risen.  

In SQL Server 2012 Extended Events were added in order to help the diagnosis of these cases: sort_warning, hash_spill_details to improve tempdb spill diagnostics and query_memory_grant_usage to improve memory grant diagnostics. Also, two query memory grant options were made available (MIN_GRANT_PERCENT and MAX_GRANT_PERCENT) that allow to specify the minimum and maximum grant percentage (based on the memory grant that’s specified in the resource governor configuration) on a per query basis.

Later, SQL Server 2016 introduces the almighty Query Store and improves a couple of DMVs and Extended Events.

Lastly, in SQL Server 2017, the feature that is most closely related to the one we’re about to see was released: Batch Mode Adaptive Memory Grant Feedback. In fact, it is, basically, the same feature applied to a different context.  If you want to know more about Adaptive Memory Grant Feedback in Batch Mode check Niko’s explanation here

So, finally we’re in 2019 when SQL brought this feature to Row Mode. Let’s take a walk!

Row Mode Adaptive Memory Grant Feedback

For the purposes of this demo, I’m using a sample of FootballData public stats that I stored in a very simple and small database (no Contoso or AdventureWorks for you today). Also, keep in mind that this is an Enterprise Edition feature.

We’ll start with a lower compatibility level to confirm the behaviour prior to this feature.

ALTER DATABASE [FStats] SET COMPATIBILITY_LEVEL = 140

Next, we’ll create a store procedure to intentionally create an evil memory grant.

CREATE OR ALTER PROCEDURE GetHomeVictoriesByTeam  
@Team nvarchar(128)    
AS    
BEGIN    
 DECLARE @TeamID int = (SELECT Id FROM Teams WHERE Name = @Team)    
 SELECT     
    l.Country,l.Name, m.Date,   
    h.Name AS HomeTeam, a.Name as AwayTeam,   
    m.HTHG,m.HTAG,m.HTR,m.FTHG,m.FTAG,m.FTR,    
    d.Att, d.Ref, d.HS, d.[AS], d.HST, d.AST,   
    d.HHW, d.AHW, d.HC, d.AC, d.HF, d.AF,  
    d.HFKC, d.AFKC, d.HO, d.AO, d.HY,  
    d.AY, d.HR, d.AR, d.Preview    
 FROM     
    Matches m (NOLOCK)    
 INNER HASH JOIN     
    MatchesDetails d (NOLOCK) ON m.id = d.MatchDetails    
 INNER JOIN     
    Teams h (NOLOCK)  ON m.HomeTeam = h.id     
 INNER JOIN     
    Teams a (NOLOCK)   ON m.AwayTeam = a.Id    
 INNER JOIN     
    Stadiums s (NOLOCK)   ON s.Team = h.Id    
 INNER JOIN     
  Leagues l (NOLOCK)   ON m.League = l.Id    
 WHERE    
    m.hometeam = @TeamID AND m.FTR = 'H'    
 ORDER BY    
    l.Name, m.Date, m.FTHG, m.FTAG,    
    m.HTHG, m.HTAG, m.HTR, m.FTR, d.Preview    
END 

And execute it as follows.

USE FStats
EXEC GetHomeVictoriesByTeam 'Benfica'

When we open the query execution plan, it’s visible a warning indicating an underestimate that resulted on a spill to disk. In order to view the MemoryGrantInfo, right-click on the SELECT, then choose properties.

If we keep executing this statement, we’ll watch SQL Server suffering from anterograde amnesia, just like Dory. The value of DesiredMemory will not change and the query will spill to disk, execution after execution. Now, let’s set the compatibility level to 150 and see if things change.

ALTER DATABASE [FStats] SET COMPATIBILITY_LEVEL = 150
GO
EXEC GetHomeVictoriesByTeam 'Benfica'

At first glance everything looks similar, we get that same warning in the query execution plan, but in the MemoryGrantInfo a new field named IsMemoryGrantFeedbackAdjusted is now visible with the value “NoFirstExecution”. That looks promising…

And in fact, when we execute it again, black magic starts to happen. This time SQL Server remembers that the desired memory wasn’t enough to sort the huge amount of Benfica’s home victories (incomprehensible mistake though) and requests more memory. This avoids spilling out to tempdb, as we can verify by the absence of the previous warning. Nevertheless, SQL Server hasn’t yet found the ideal memory grant size and is still trying to get it, as we can see by the value of IsMemoryGrantFeedbackAdjusted – “YesAdjusting“.

After a couple executions, SQL Server is finally able to achieve the ideal memory grant as we can confirm by the value of IsMemoryGrantFeedbackAdjusted – “YesStable“. Awesome! From now on, executing this query is all about stability and happiness. Or is it?!

The premise here is that SQL Server will try to use the result from a previous execution in order to adjust the memory grant in the following. In the previous example that was rather simple, because we’ve used the same parameter across executions. Contrarily, the real world, full of evil users, tends to be more dynamic. So, what will happen if we change parameters between executions? In order to determine that we’ll intercalate the parameters of two executions and create an Extended Event to catch those spills. As Katy Perry said, “don’t be afraid to catch spills, ride Extended Events and chase thrills”.

CREATE EVENT SESSION DontBeAfraidToCatchSpills ON SERVER 
ADD EVENT sqlserver.sort_warning(
    ACTION(sqlos.task_time,sqlserver.database_name,sqlserver.plan_handle,sqlserver.sql_text))
ADD TARGET package0.ring_buffer
WITH (STARTUP_STATE=ON)
GO
--GOTO: Management > Extended Events > Sessions > DontBeAfraidToCatchSpills > Start Session
--GOTO: Management > Extended Events > Sessions > DontBeAfraidToCatchSpills > Watch Live Data

EXEC GetHomeVictoriesByTeam 'Benfica' 
EXEC GetHomeVictoriesByTeam 'Villareal'
GO 16

We’ve created an Extend Event using sqlserver.sort_warning because we know that, in this case, only the sort operation will spill to disk. Also note that we’re executing this block 16 times, so 32 executions of the store procedure are being done. The MemoryGrantInfo in the last execution plan will show us that SQL Server is still trying to adjust the memory grant.

Accordingly, the results of the Extended Event show queries spilling all over the place, as we can state by the various threads being displayed. When there isn’t enough memory on SQL Server to execute a sort operation within a query, the sort table will be written to disk and divided into multiple “passes”. Each pass represents an overhead and will increase the required time to complete the operation. The sort_warning_type indicates whether it was executed in single pass or consumed multiple passes. The results show that, in these circumstances, the sort operation was finished within one pass i.e. it was divided into 2 steps.

After 32 executions SQL Server hasn’t yet figured out the ideal memory grant. Perhaps if we try again it’ll succeed. Furthermore, what could possibly go wrong?

EXEC GetHomeVictoriesByTeam 'Benfica' 

Well… Actually, things got worse. Turns out that SQL Server will try to adjust the DesiredMemory until the 32nd execution, after that it just gives up on us. Note that the value of IsMemoryGrantFeedbackAdjusted changed to NoFeedbackDisabled. This means that, from now on, SQL Server will use the memory grant of the first execution, which in our case is far from good.

In conclusion, row mode adaptive memory grant is quite awesome, particularly in scenarios where is expectable that the query inputs have no or few variations. Otherwise, things can get unpleasant, as we saw, so it’s wise to be cautious.

posts.First();

The first New Year’s resolution is officially completed: Create a blog to share contents and experiences on my current main area of interest – SQL Server.

I have this intention for a while now, I follow blogs from some of the most notorious and recognized professionals on the community and I love to read their posts. Interestingly, they were the reason why I didn’t create this blog sooner. I thought that it wouldn’t make much sense to post about something that was already covered by a bajillion people. I thought that it wouldn’t create any value, but I was seeing things from the wrong perspective. Revisiting problems and structuring them in a way that can be easily understood will definitely force me to take a deeper dive into concepts, so that’s value right there, at least for me. But despite my selfish intentions I really hope that it can be valuable to someone else too 😊