What a better time to talk about isolation than now…
SQL Server implements six distinct isolation levels that are fundamentally distinguished by the way they approach potential problems caused by concurrency. Each isolation level maintains a specific balance between integrity and performance, that is materialized by the occurrence, or not, of certain concurrency phenomena.
- Dirty Read: Occurs when a transaction reads data that was modified, but not commited, by another concurrent transaction.
- Non-Repeatable Read: Occurs when, during the course of a transaction, a row is retrieved twice and the values within the row differ between reads.
- Phantom Read: Occurs when, in the course of a transaction, new rows are added or removed by another transaction to the records being read.
The ANSI/ISO standard SQL 92 defines the rules for each isolation level, specifying which concurrency phenomena may, or may not, be observed in each.
|Read Commited (Snapshot Isolation)||No||Yes||Yes|
|Read Committed (Locking Read)||No||Yes||Yes|
Serializable is the most isolated of the available isolation levels. A serializable execution is defined to be an execution of the operations of concurrently executing SQL-transactions that produces the same effect as some serial execution of those same SQL-transactions . This implies that increasing concurrency is allowed, as long as the effects of those transactions still correspond to some possible order of serial execution. Conceptually, the serializability requirement is sufficient to exclude dirty reads, non-repeatable reads and phantom concurrency phenomena.
One funny fact regarding this isolation level is that, if you have a transaction T1 that started counting a table with 600 rows and, before finishing, another transaction T2 added 66 rows, both 600 and 666 would be logically correct results for T1. In SQL Server implementation of Serializable isolation level, transactions do not necessarily have a point-in-time view at the start of the transaction, but rather the latest committed data, as of the moment that the data was first locked for access.
A transaction under Snapshot isolation level behaves as if it was executed against its own private copy of the committed state of the database, taken at the moment the transaction started.
By providing a point-in-time view of committed data, snapshot isolation provides protection against all above mentioned concurrency phenomena. Dirty reads are avoided by reading only committed data and the static nature of snapshots prevents both non-repeatable reads and phantoms. However, snapshots may not reflect the current state of the database, as these transactions ignore any committed changes made by other concurrent transactions after the snapshot transaction begins.
Regarding data changes, conflicts are automatically handled. A snapshot write conflict occurs when a snapshot transaction T1 attempts to modify a row that has been modified by another transaction T2, after T1 began. When this happens, the last committer transaction changes are rolled back.
Snapshot isolation is vulnerable to write skews, which occur when two concurrent transactions read data that the other modifies. No write conflict occurs because each modify a distinct row and neither one sees the changes made by the other, because both are reading from a point-in-time previous to the one where changes were made.
The repeatable read isolation level provides guarantees that data will not change for the life of the transaction once it has been read for the first time, but, ironically, does not actually guarantee that reads are repeatable.
The potential issues with Repeatable Read transactions are fundamentally associated with the possibility for the occurrence of phantom reads. For example, while a transaction T1 is counting the rows of a table, another transaction T2 may insert new rows in the range already processed by T1. The shared locks taken by transaction T1 on previously-read rows will prevent those rows from being changed, but don’t prevent new rows from being inserted into the range of values already processed. A similar scenario may happen if T2 updates a row not yet processed by T1 in a way that the new value fits somewhere in the range already processed by T1.
Read Commited (Locking Read)
A transaction running under this isolation level reads only committed data, which by definition excludes dirty reads.
Regarding the locking mechanism, at row level, the shared locks are released when a shared lock is taken on the next row. At page level, the shared locks are released when the first row on the next page is read, and so forth. This differs from Repeatable Read isolation, which also takes shared locks when reading data, but will hold locks to the end of the enclosing transaction.
Despite only committed data is read in Read Committed isolation level, this doesn’t mean it will be the most recently committed data, in fact, it may be out-of-date. Additionally, the standard itself states that non-repeatable reads and phantoms may occur.
Since only a single row is share-locked at any given moment in time, it is clearly possible for concurrent transactions to modify the unlocked rows in the index our query is traversing. If these concurrent modifications change index key values, they will cause rows to move around within the index structure. This means that there is no point-in-time view, and rows can be missed or counted twice.
Read Commited (Snapshot Isolation)
The Read Committed Snapshot Isolation (RCSI) provides a point-in-time view of committed data, from the time the statement began, contrarily to Locking Read Committed, in which it refers to the moment each row is briefly locked and physically read. A point-in-time view cannot suffer from the problems of missing rows or encountering the same row multiple times, which are both possible under locking read committed isolation.
In RCSI transactions, data is read from the row version store rather than being accessed directly, so no shared locks are acquired. This may improve concurrency by eliminating conflicts between incompatible locks, but also requires more resources, in terms of memory/disk space and tempdb.
Due to the static nature of a snapshot, while a statement runs it will continue to see the committed state of the database as it was when the it began, potentially missing all the committed changes that occurred since then.
In RCSI transactions, each statement sees a static committed data set, but that set can change between statements inside the same transaction because there is nothing to prevent the real data from being changed while the RCSI statement is executing. There are no shared locks, and that’s why RCSI does not provide protection from non-repeatable reads and phantoms: SQL standard phenomena are defined at the transaction level. Multiple statements within a transaction running at RCSI may see different data, because each statement sees a point-in-time view as of the moment that particular statement started.
Read Uncommitted offers the fewest isolation guarantees. The previous concurrency effects saw in Locking Read Committed are all due to the locking implementation only taking very short-term shared locks when reading data. The read uncommitted isolation level goes one step further, by not taking shared locks at all, resulting in the additional possibility of dirty reads.
Uncommitted isolation is often used to achieve better performance through the reduction of blocking and deadlocking. By avoiding the need to acquire and release shared locks, incompatible lock problems are minimized.
But, besides the obvious drawback of reading uncommitted data that may never be committed, using this isolation level tells SQL Server that we allow any inconsistencies that might arise as the result of allocation-ordered scans, which typically are only used if there’s a guarantee that the underlying data won’t change during the scan. This may lead to data being missed, counted more than once, or even be completely arbitrary. Paul White gives an interesting example:
“(…) consider a single varchar(max) column that initially contains 10,000 ‘x’ characters. A concurrent transaction updates this value to 10,000 ‘y’ characters. A read uncommitted transaction can read ‘x’ characters from one page of the LOB, and ‘y’ characters from another, resulting in a final read value containing a mixture of ‘x’ and ‘y’ characters. It is hard to argue that this does not represent reading “corrupt” data.”
That’s My Story And I’m Sticking To It
From what we briefly saw about SQL Server’s isolation levels, the following can also be depicted:
|Read Commited (SI)||Yes||No||Yes|
The more interesting aspect for me is that in every isolation level, except Serializable, committed rows can be missed, and this exception exists just because of implementation-specific details. As we previously saw, in the case where the logically correct result could be either 600 or 666 rows, the first one would imply that posteriorly committed rows were missed, and still, none Serializable requirements would be disregarded.
With that, if you are okay with the fact that actions may be taken based on obsolete data, good for you. If not, go Serializable or go buy good sleeping pills. You’re welcome.