Introduction

While preparing some stuff around SQL Server 2005, I thought it was the ideal occasion to do some blog posts about the SQL Server 2000/2005 architecture and several concepts. In this post I'll cover transaction isolation levels, later I'll post more about transactions while talking about the new System.Transactions namespace. However, let's start by taking a look at transaction stuff inside the database itself.

Transactions: a short definition

Tons of people have given a definition of a transaction before me, so I won't cover this in much detail. I'll just try to give a definition for a transaction in order to create some context for further elaboration. Basically, a transaction is a set of statements (in our case database statements) that have to be executed in an atomic fashion (either all of the statements succeed, or all of them fail together). Next, there's the consistency property, which means that a transaction should bring the system from one consistent state to another consistent state. During the transaction, isolation has to be maintained. This means a transaction is out there on its own, it can't see what others are doing. Locking will be used to accomplish these requirements, as I'll explain in a moment. Last but not least, there's the durability property. A transaction can finish either by a commit (changes are allowed) or a rollback (changes are rolled back to the last consistent state). Durability means that when a transaction is committed, the changes made to the database should be persistent even a disaster occurs one processor cycle after the transaction commit.

The grouping of these four properties is abbreviated as the ACID properties. Transactions play a central role in database technology, especially in multi-user environments. A classic example of a transaction is the work that has to be done to transfer money from one bank account to the other. It's atomic, because the money has to be deleted from account number one and has to be added to account number two in an atomic way. It's consistent because we don't want to loose or create any money (the total sum of money aften and before the transfer has to be the same). It's isolated because others should not see an intermediate state when the money has left account number one and is not on the second account yet. Last, it's durable because we want to keep the committed state forever (on a database restart, the money should not be back on the old account).

SQL Server 2000 transaction isolation modes

In my definition of transactions, I mentioned the isolation property. The fact that transactions can't see anything from each other is simply an overstatement (or maybe it can even be called overengineering) in a lot of scenarios. Basically, there is one big tradeoff to be made: the level of concurrency versus the level of isolation. The higher the throughput you want to reach, the lower the level of isolation should be. Why is this? In one word, locking. In order to make sure others can't perform certain operations on certain objects (I'm keeping this vague intentionally for the moment) you should get a certain lock on these certain objects. Examples are shared locks and exclusive locks in database technology.

In SQL Server, a shared lock is acquired when data is read. More than one party can get a shared lock on the data, but nobody can get an exclusive lock on the data while there's still a shared lock on it. An exclusive lock is the result of a data modification (e.g. insert, update, delete). What's in a name? An exclusive lock is granted exclusively to one party (one process). Other parties can't get any type of lock on the data while there is such an exclusive lock on it. When the transaction completes, the lock can be released.

One other thing before we jump into transaction isolation modes. The SQL Server database engine contains a transaction manager which is responsible for everything around transactions, including cross-db transactions, distributed transactions, the support of savepoints, nested transactions, isolation level support, etc. A last concept you should have in mind is the concept of write-ahead logging. Simply stated, write-ahead logging ensures that rolling back work in process and rolling forward committed work to be written to the physical data pages is always possible. The idea is that the transaction log is populated with the changes that a transaction has made before the transaction is acknowledges as being committed. The system can't simply continue until the transaction's work has been logged to the transaction log (that is, synchronously).

Now back to the isolation modes. SQL Server 2000 has four different lock modi, as I'll explain now.

Uncommitted read

The weakest transaction isolation level is called uncommitted read. This means a transaction can read any data that's currently living on a data page, being committed or not. In fact, there is no isolation at all. There are simply no locks whatsoever when running in this mode, even no shared locks when reading data. So, you don't issue locks that can affect others and you don't have to wait for locks yourself either. Although this behavior affects concurrency positively, it's not the desired behavior for e.g. the financial application (money can get lost or can be created, because transactions can read intermediate states of other transactions, breaking the isolation). In data analysis scenarios this can be useful nevertheless.

Committed read

This is the default level. Data which has been changed can't ever be read by another transaction when it's not yet committed. During the data change (before the commit) an exclusive lock is held on the data. Other transactions have to wait for this lock to be released (and to acquire a shared lock) in order to continue. Also, data that is being read needs to hold a shared lock, because this blocks others from changing that data till the lock is released. However, shared locks can be released before the transaction is committed, when the read of the data has completed. This introduces the concept of non-repeatable reads. Basically this means that a transaction that is reading the same data more than once during the transaction can see different data on every read (because the shared lock doesn't need to be held in between two reads of the same data). Another concept is that of phantoms, which are new rows that appear in a table (or view) during a transaction. So, when you issue the same query more than once, new rows may have slipped in.

Repeatable read

As the name already tells us, this level ensures no non-repeatable reads can occur. The idea is that a shared lock is held till the transaction comes to an end, which negatively affects the overall performance because of possible lock contention (data which you are locking by a shared lock during a long transaction can't be modified by another transactions/user for the duration of your transaction; so it's key to reduce the length of transactions as much as you can). Therefore you can issue the same query multiple times in the context of the same transaction without having to worry about possible changes to the data you retrieved. However, phantoms still can occur.

Serializable

The summum of transaction isolation is called serializable. Basically serializable means that the result of running more than one serializable transaction concurrently is the same as running those transactions in a serialized fashion, one at a time. Next to the properties of repeatable read, phantoms can't occur too. This is done by locking on a higher level than the row/record level, e.g. the table level. The reason for this approach is the fact that we need to make sure no new rows can be added to the result sets we're using for the duration of our transaction. Assume you're querying a table for all registered members born between 1983 and 1985. If you do query this data for a second time in a serialized transaction, you don't want any phantom row to slip in. To accomplish this, SQL Server will acquire a key-range lock on the data(range) so that's impossible for another transaction to add rows to this data set until the lock is released. There are nine subtypes of key-range locks in SQL Server 2000, more info can be found in the Books Online.

How to set the transaction isolation level?

Of course, that's the last thing you should know in order to start playing around with transaction isolation levels. To set the isolation level, use the SET TRANSACTION ISOLATION LEVEL T-SQL statement. You can specify SERIALIZABLE, READ COMMITTED, REPEATABLE READ and READ UNCOMMITTED. More information can be found on MSDN.

What about SQL Server 2005?

The answer is simple: there are a couple of new modi concerning transaction isolation levels, called snapshot isolation. I won't cover this myself, but rather I want to point to an extract of the book "A First Look at SQL Server 2005 for Developers" which is available on http://www.informit.com/articles/article.asp?p=327394&seqNum=2&rl=1.