In order for you to have a happy life as someone responsible for a database, you need to ensure that it follows a strict diet consisting of well-designed queries and indexes, plenty of CPU, memory and disk capacity. Neglecting any of these aspects will compromise the entire system and consequently your happiness.
Regarding improving CPU usage, we can roll up our sleeves and start tuning our server, our queries and indexes, so they perform better and use less CPU. This is the preferable approach, because the alternative of adding processing power to SQL Server can be quite expensive. Nonetheless, it’s possible to reach a point where your CPUs are simply not enough to support your workload and tuning queries will feel like rearranging chairs on a sinking Titanic.
In theory there are two options when increasing CPU capacity:
- Adding more CPUs, hence increasing the thread pool and improving parallel queries.
- Get faster CPUs, which will improve serial query speed and thread consumption
In practice, the second option is becoming gradually less common because most people are in the cloud and using virtual machines, which typically imply less flexibility.
But, in order to confirm that the CPU is really the bottleneck, we need to have a closer look. Wait stats are a great starting point and in this specific matter, wait types like SOS_SCHEDULER_YIELD, CXPACKET, or THREADPOOL can give us pretty useful hints of what’s happening. It’s important to stress that waits are not intrinsically bad, they are an indivisible part of SQL Server functioning and we must be able to separate evil waits from benign waits.
In this post, we’re going to overview SOS_SCHEDULER_YIELD wait type. However, to have a clearer understanding about the underlying information, we must grasp some basic concepts, namely schedulers and the query lifecycle.
Schedulers and Query Lifecycle
A scheduler is the logical equivalent of a CPU, where worker threads are scheduled to actively carry out the tasks assigned to them. We can view scheduler information using the DMV sys.dm_os_schedulers. Inernally, schedulers have three components:
- Processor: Phisycal or logical CPU core that is responsible for processing the thread. Each core will process only a thread at a time.
- Waiter List: List that contains all threads that are waiting for a resource to became available. The threads in this list are in Suspended state (we’ll talk more about these states shortly). We can view information about these resource waits in the DMV sys.dm_os_waiting_tasks.
- Runnable Queue: Encompasses all threads that are waiting for processor time (Runnable state). When priorities are not assigned to workload groups the processing order on this queue is First-In-First-Out. (more info on sys.dm_exec_requests).
Accordingly, these components are directly related to the states encompassed by the query lifecycle:
- Running: Query is being processed on a CPU.
- Runnable: Query is ready to execute, but it’s waiting for CPU time.
- Suspended: Query is waiting on a particular resource to become available.
During one execution, a query can go through these states multiple times. This behavior is enforced by the cooperative (or non-preemptive) scheduling used by SQL OS. Basically, only one thread will be running on the scheduler at a given moment and each one has a quantum of 4 milliseconds, which represents the maximum consecutive CPU time for *any* of them. After these 4 milliseconds, the thread voluntarily yields the CPU and moves back into the Runnable state. Even if there aren’t other threads waiting, nor the necessity for the thread to wait for any resource (and consequently transitioning into Suspended state), the quantum is not exceeded.
It’s important to mention that preemptive function calls work differently, they are selfish and don’t like to cooperate. They are nothing more than calls to functions outside the realm of SQL OS, like Win32 API functions, where the waiting logic rules no longer apply. Under these circumstances, and before the external function is invoked, a sibling worker is signaled and given ownership of the scheduler, while the current thread continues to run. Then, when the thread goes preemptive, its quantum “ends” and no additional accounting is done for CPU usage until it reverts to non-preemptive scheduling. When the preemptive work is done and the thread transitions into cooperative scheduling again, it declares the end of its preemptive wait and goes into the runnable queue. If you want to know more about preemptive scheduling check this awesome blog post from Ewald Cress.
This wait type occurs when a thread voluntarily yields the scheduler for other tasks to execute, waiting for its quantum to be renewed.
Since the thread doesn’t wait for any resource and it just voluntarily goes into the runnable queue, SOS_SCHEDULER_YIELD won’t imply resource wait time. This will create the signal wait time that will show how long the thread has waited in the runnable queue to get back in the running state again.
Probably not. As we saw, SOS_SCHEDULER_YIELD just indicates that threads are exhausting their quantum, hence generally being a benign wait type. Even if this wait type is prevalent, if signal wait time is low, that just means that there are multiple threads running on the scheduler and you should look somewhere else in order to find the root case of your problems.
Nonetheless, when signal wait time is high things can get interesting. Frequent SOS_SCHEDULER_YIELD waits with high signal wait time may suggest that a substantial number of CPU intensive queries are fighting for CPU and each ends up waiting a long time to be signaled. In this case it’s probably a good idea to look for the top CPU consuming queries and analyze their execution plans.
Contrarily, less frequent waits with high signal time indicates that there are not many threads fighting for CPU time. In such case, an external source may be causing CPU pressure, either from preemptive function calls or something completely external to SQL Server.