Moving onto Advanced Topics

Having experimented with some simple transactions, we're now going to present further issues involved in writing advanced transaction code. The basics of transactions that you've looked at so far form the foundation for what you'll be looking at here as you consider the wider implications of executing transactions in the real world.

We'll discuss the different transaction isolation levels that you can use. These define certain rules that govern the degree of "interaction" between multiple transactions acting on the same data. We'll also discuss the use of database locks—one of the mechanisms by which you can control concurrent access to shared resources in the database.

There are several different isolation levels and many different types of lock that can be applied, depending on your specific RDBMS. It's way beyond the scope of this chapter to provide a definitive guide to transactions and locking for each RDBMS. Instead, we aim to provide a good general understanding of the requirements that apply to each level, how you often have to use locks to meet those requirements, and the potential performance consequences of locking database resources.

Concurrency and Transaction Isolation Levels

It's a fact that, most of the time, you can't guarantee that transactions will execute one at a time—on the contrary, in complex databases it's likely that many transactions will run concurrently. This leads to potential concurrency problems, which occur when many transactions try to interact with the same database object (access the same data) at the same time. The nature of the interaction depends on the actions each transaction performs: from simply reading data to inserting, updating, or deleting data.

You need to consider these issues because they're directly related to the isolation property of transactions, which dictates that changes made by a transaction shouldn't be visible to other concurrently running transactions. If a transaction modifies information in a data table, should other transactions be able to access that data table? If yes, in what way? What if the transaction only reads from the data table without modifying it?

The answer to these questions depends on the transaction isolation level. You can manually set the isolation level for each transaction, and this establishes the way transactions behave when they're trying to access the same piece of data.

Transactions ask for ownership of a particular piece of data by placing locks on it. There are many different types of locks, and they differ in the way they limit access to database resources. For example, a row can be locked in such a way that other transactions can't access it in any way or can read it but not modify it. Also, depending on the lock granularity, the resource they apply to can be an entire database, a data table, a row or a number of rows, and so on.

It's important to understand the difference between locks and the transaction isolation level. Locks limit access to database objects, and the transaction isolation level specifies how the active transaction places locks on the resources with which it works.

SQL-99 specifies four transaction isolation levels. With each level, a different balance is struck between the level of data integrity protection offered and the performance penalties imposed.

The four transaction isolation levels, listed from the one that offers the best performance to the one that offers the best protection, are as follows:

READ UNCOMMITTED
READ COMMITTED
REPEATABLE READ
SERIALIZABLE

The SQL-99 standard also categorizes transactions into read-only transactions and read-write transactions. Read-only transactions are a special kind of transaction, and you'll learn about them in a moment.

Note

The isolation levels mentioned previously can only be applied to read-write transactions.

The SQL-99 command for setting the transaction type is SET TRANSACTION. The complete syntax is as follows:

SET [LOCAL] TRANSACTION { { READ ONLY | READ WRITE } [,...]
| ISOLATION LEVEL
   { READ COMMITTED
   | READ UNCOMMITTED
   | REPEATABLE READ
   | SERIALIZABLE } [,...]
| DIAGNOSTIC SIZE INT };

Therefore, to choose between read-only and read-write transactions, the syntax is as follows:

SET TRANSACTION [READ ONLY|READ WRITE]

To set a transaction level, you can use a command such as the following. Note that setting the transaction isolation level automatically assumes a read-write transaction:

SET TRANSACTION ISOLATION LEVEL <isolation level name>

The default transactional mode in all databases covered in this book is READ COMMITTED.

Read-Only Transactions

Read-only transactions are so named because they can't contain any data modification statements—only plain SELECT statements are allowed.

The default state for read-write transactions is statement-level consistency. In other words, if, within the scope of a transaction, the same record is read twice, different values may be retrieved each time if the record was modified between readings.

Read-only transaction mode solves this problem by establishing transaction-level read consistency. In a read-only transaction, all queries can only see the changes committed before the transaction began.

To set a transaction as being read-only, you type the following command:

SET TRANSACTION READ ONLY;

Read-only transactions aren't supported by SQL Server, DB2, or MySQL. They are supported, however, by Oracle.

READ UNCOMMITTED Isolation Level

This is the first and most dangerous isolation level, but it's also the one that offers the best performance. The following command sets the READ UNCOMMITTED isolation level for the next transaction:

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;

The READ UNCOMMITTED isolation level isn't supported by Oracle but is supported by SQL Server, DB2, and MySQL.

When the transaction is in READ UNCOMMITTED mode, its isolation property isn't enforced in any way, and the transaction is able to read uncommitted changes from other concurrently running transactions (effectively breaking the isolation property of the ACID rules).

With READ UNCOMMITTED mode, the transaction is susceptible to dirty reads and many other existing kinds of consistency problems.

Data Consistency Problem: Dirty Reads

Dirty reads happen when a transaction reads data that was modified by another transaction but that hasn't yet been committed. If the other transaction rolls back, the first transaction ends up reading values that, theoretically, never existed in the database.

To understand dirty reads, imagine two concurrent transactions that work with the Student table. One transaction calculates the total number of students from that table, and the second removes or adds students to the table, as shown in Table 10-1.

Table 10-1: Sequence of Actions That Demonstrate Dirty Reads
Time	Transaction 1	Transaction 2
T1	The transaction starts.
T2	Removes or adds records from/to the Student table.	Transaction starts.
T3		Calculates the total number of students based on the uncommitted results of Transaction 1.
T4	Transaction rolls back.

If Transaction 2 works with the READ UNCOMMITTED isolation level, it can read the uncommitted changes from Transaction 1. In this example, because Transaction 1 finally rolls back, Transaction 2 ends up calculating an erroneous number of students. This is a dirty read.

Another scenario of a dirty read would be if Transaction 1 changed the data of a student (say, the name), then Transaction 2 read that uncommitted data, and finally Transaction 1 rolled back.

Note

Oracle doesn't support the READ UNCOMMITTED isolation level. Oracle always reads the last-committed values, even if the data is being changed by other ongoing transactions.

You can avoid dirty reads by forbidding data that's being modified by other transactions from being read.

However, there are certain situations in which you might want to allow dirty reads, the most common of which is when you want to get quick reports or statistics from your database, where it isn't critical to get very accurate data. In such situations, setting an isolation mode of READ UNCOMMITTED can help you because it locks fewer resources and results in better performance than the other isolation levels.

The default transaction isolation level, READ COMMITTED, doesn't permit dirty reads, so you explicitly need to set the isolation level to READ UNCOMMITTED in situations where you want to allow dirty reads. The other isolation levels (REPEATABLE READ and SERIALIZABLE) are even more stringent about enforcing data consistency, and they don't permit dirty reads either.

READ COMMITTED Isolation Level

The second isolation level in terms of enforcing transaction isolation is READ COMMITTED, which is the default mode for most database systems.

When in READ COMMITTED mode, all resources modified by the transaction are locked until the transaction is completed—in other words, any updated, inserted, or deleted records won't be visible to other transactions (or will be only visible to their last-committed values), until (and unless) you commit the transaction. Transactions that run in READ UNCOMMITTED mode are the exception to this rule because they can uncommitted changes from other transactions.

Also, other transactions aren't allowed to modify the rows that are being modified by your transaction but are able to modify the rows that are simply read by your transaction.

When in READ COMMITTED mode, no dirty reads will happen in your current transaction because you can't see or modify data that is being updated by other transactions. However, the READ COMMITTED mode doesn't guard against unrepeatable reads.

Data Consistency Problem: Unrepeatable Reads

An unrepeatable read happens when you read some data twice in a transaction and you get different values because it has been modified in the meantime by another transaction.

When in READ COMMITTED mode, other transactions are allowed to modify data that has only been accessed for reading by your transaction. So, if you start a transaction, read a record, do some other things, and then read the same record again, you might read a different value.

If this is acceptable and no serious problems can occur because of unrepeatable reads, it's best to stick with the READ COMMITTED isolation mode. Additionally, in some cases, you can avoid unrepeatable reads by saving pieces of information in variables or temporary tables and using the saved data instead of querying the database again.

Before moving on to the next transaction isolation level (which prevents unrepeatable reads), you'll see an example demonstrating unrepeatable reads (see Table 10-2). In this example, Transaction 1 tries to calculate the average mark for all students by summing up all their marks and then dividing by the number of students.

Table 10-2: Sequence of Actions That Demonstrate Unrepeatable Reads
Time	Transaction 1	Transaction 2
T1	The transaction starts. (READ COMMITTED)
T2	Sums up the students' marks.	Transaction starts.
T3	(Waits for Transaction 2 to release locks.)	Deletes one student, locking the Student table.
T4	(Waits for Transaction 2 to release locks.)	Transaction commits, releases locks.
T5	Retrieves number of students.
T6	Calculates average mark by dividing the two numbers calculated earlier.

For the purposes of this example, assume the mark of each student is stored in the Student table (so working with marks implies working with Student).

Transaction 1 ends up calculating a wrong average mark because the students' data changes while the transaction is running. The students' data can be modified by other transactions because Transaction 1 is only reading it and doesn't place any locks on it.

In fact, after Transaction 2 updates the Student table (placing locks on it), Transaction 1 needs to wait until Transaction 2 finishes in order to calculate the number of students.

Note

Oracle would behave as before, reading the last-committed values from the Student table without blocking Transaction 1 until Transaction 2 finishes executing.

If you don't find any solutions to avoid unrepeatable reads, SQL-99 has an isolation level that does the work for you, guarding against unrepeatable reads and dirty reads: the REPEATABLE READ isolation level.

REPEATABLE READ Isolation Level

This transaction isolation mode provides an extra level of concurrency protection by preventing not only dirty reads but also unrepeatable reads.

The way most databases (again, not Oracle) enforce repeatable reads is to place shared read locks on rows that are being read by the transaction, not just on the ones that are being updated (as the READ COMMITTED isolation mode does). This way, as soon as one record is read, you can guarantee you'll get the same value if you read it again during the same transaction.

Having a shared lock on a record permits other transactions to read the record but not to modify it. In the previous example, setting the first transaction to REPEATABLE READ would prevent the second transaction from removing the student, ensuring the calculated average mark is correct. Table 10-3 shows the same transactions running, but this time the first transaction runs in REPEATABLE READ mode.

Table 10-3: Sequence of Actions That Explain the *REPEATABLE READ* Isolation Mode
Time	Transaction 1	Transaction 2
T1	The transaction starts in REPEATABLE READ mode.
T2	Sums up the students' marks (placing shared locks on the rows in the Student table).	Transaction starts.
T3	Retrieves number of students.	Tries to delete one student but finds the table locked.
T4	Calculates average mark by dividing the two numbers calculated earlier.	(Waits.)
T5	Transaction completes executing, releases locks.	(Waits.)
T6		Deletes student.

You get the right answer this time, but because other transactions need to wait for your transaction to finish processing before getting to the requested data, this can result in important performance penalties for your applications. The longer your transaction lasts, the longer the locks will be held, practically not allowing other transactions to perform any modifications on them.

With the REPEATABLE READ isolation level, you can be sure unrepeatable reads will be avoided. However, there's still one more concurrency-related problem that can happen: phantoms.

Data Consistency Problem: Phantoms

Phantoms are similar to repeatable reads, except they also take into account the case where new records are being introduced to a data table by another transaction while the current transaction is working with the table.

Let's imagine another scenario, similar to the previous example, but this time Transaction 2 inserts a new student rather than removing an existing one. Table 10-4 shows the actions performed by the two transactions.

Table 10-4: Sequence of Actions That Demonstrate Phantoms
Time	Transaction 1	Transaction 2
T1	Transaction starts in REPEATABLE READ mode.
T2	Sums up the students' marks (placing shared locks on the rows in the Student table).	Transaction starts.
T3	Tries to count the number of students, but Transaction 2 has a lock on the newly created student.	Inserts one student (placing a lock on the new created record).
T4	(Waits.)	Transaction commits, releases locks.
T5	Calculates the total number of students.
T6	Calculates average mark by dividing the two numbers calculated earlier.

Transaction 2 is allowed to insert new records to the Student table because it doesn't affect any of the existing rows in Student (on which Transaction 1 has set shared locks). Phantoms refer to the situation when other transactions insert new records that meet one of the WHERE clauses of any previous statement in the current transaction. The SERIALIZABLE isolation level guards your transaction against phantoms, as well as the previously presented concurrency-related problems.

SERIALIZABLE Isolation Level

The SERIALIZABLE isolation level guarantees the transactions will run as if they were serialized—with other words, they're guaranteed not to interfere, and they execute as if they were run in sequence.

When you set the transaction isolation level to SERIALIZABLE, you're guaranteed that other transactions cannot modify (with UPDATE, INSERT, or DELETE) any data that meets the WHERE clause of any statement in your transaction.

The SERIALIZABLE transaction isolation level is a dangerous one when it comes to performance. It does provide the highest level of consistency—indeed, transactions works the same as if they were executed one at a time.

However, while increasing consistency, you get much lower concurrency because other transactions are restricted in the actions they can do with database objects already used in other transactions—they need to wait one after the other to get access to shared data.

If Transaction 1 was set to SERIALIZABLE in the example presented in Table 10-4, Transaction 2 couldn't have inserted a new record to the Student table before Transaction 1 finished executing.

RDBMS-Specific Transaction Support

So far we've presented the SQL-99 features regarding a transaction's isolation level. Now you'll take a closer look at how they're supported by SQL Server, Oracle, MySQL, and DB2.

SQL Server

SQL Server supports the four specified isolation levels, but it doesn't support read-only transactions. After setting the transaction isolation level, it remains set as such for that connection, unless explicitly changed. The default isolation level is READ COMMITTED.

Here's the syntax:

SET TRANSACTION ISOLATION LEVEL
    { READ COMMITTED
    | READ UNCOMMITTED
    | REPEATABLE READ
    | SERIALIZABLE
    }

Oracle

Oracle supports read-only transactions, but it doesn't support the READ UNCOMMITTED and REPEATABLE READ isolation levels. The default isolation level is READ COMMITTED.

With Oracle, SET TRANSACTION affects only the current transaction—not other users, connections or other transactions. Here's its syntax:

SET TRANSACTION
{ { READ ONLY | READ WRITE }
  | ISOLATION LEVEL
    { READ COMMITTED
    | SERIALIZABLE } };

To change the default transaction isolation level for all the transactions on the current session, you use the ALTER SESSION command. Here's an example:

ALTER SESSION SET ISOLATION_LEVEL SERIALIZABLE;

MySQL

MySQL supports transaction isolation levels with InnoDB tables. Here's the syntax for setting the transaction isolation level:

SET [GLOBAL | SESSION]
TRANSACTION ISOLATION LEVEL
{ READ UNCOMMITTED | READ COMMITTED |
  REPEATABLE READ | SERIALIZABLE }

By default, in MySQL, SET TRANSACTION affects only the next (not yet started) transaction.

The SESSION optional parameter changes the default transaction level for all the transactions on the current session.

The GLOBAL optional parameter changes the default transaction isolation level for all new connections created from that point on.

DB2

DB2 supports transaction isolation levels, but the way that the isolation levels are implemented is quite different in DB2 compared to the other RDBMS implementations. In DB2, you can set the isolation level for a transaction on each statement that you use, for example:

UPDATE table name
SET assignment clause
WHERE search condition
WITH isolation level

where isolation level can be one of the following:

RR: REPEATABLE READ
RS: Read Stability (similar to REPEATABLE READ—locks all rows being read and modified but doesn't completely isolate the application process, leaving it vulnerable to phantoms)
CS: Cursor Stability (similar to READ COMMITTED)
UR: Uncommitted Read

The default isolation level of any statement relates to the isolation level of the package containing the statement.

For example, referring back to an example you saw in Chapter 3, "Modifying Data" (where you added some new professors to the university but without titles), you could use the following:

UPDATE Professor
SET Name = 'Prof. ' || Name
   WHERE ProfessorID > 6;
   WITH RS

This method of applying an isolation level also applies to other SQL statements, for example, SELECT INTO, INSERT INTO. For more information, please refer to the DB2 documentation.

Playing Concurrency

Let's do a short exercise now and test how the database enforces consistency, depending on how you set the transaction isolation level.

For this you'll need to open two connections to the same database. If you're running DB2, please first uncheck the Automatically Commit SQL Statements checkbox in the Command Center Ø Options Ø Execution window of Command Center. This way you'll start multistatement transactions just like with Oracle (which works in automatic-transactions mode).

For the purpose of this exercise you'll use the default isolation level, READ COMMITTED, for both transactions. This means that you don't need SET TRANSACTION statements to explicitly set the transaction isolation level.

First, you need to start new transactions on the two connections. If you're using Oracle or DB2, no additional statements are required. Use BEGIN TRANSACTION for SQL Server or BEGIN for MySQL.

Then, add a new record to the Student table on the first connection:

INSERT INTO Student (StudentID, Name) VALUES (115, 'Cristian');

After executing this command, while still in the first connection, test that the row was indeed successfully added:

SELECT * FROM Student

The results show the newly added student:

   StudentID   Name

   ----------- ---------------------------------------
   1            John Jones
   2            Gary Burton
   3            Emily Scarlett
   ...          ...
   12           Isabelle Jonsson
   115          Cristian

Now, while the first transaction is still active, switch to the second connection and read the Student table:

SELECT * FROM Student

Because the first transaction is in READ COMMITTED mode, it doesn't allow other transactions to READ UNCOMMITTED changes in order to prevent dirty reads. However, the databases implement this protection differently.

With Oracle and MySQL, the second transaction simply ignores the changes made by the first one (which is still running because it wasn't committed or rolled back yet). The list of students will be returned:

   StudentID   Name

   ----------- ---------------------------------------
   1            John Jones
   2            Gary Burton
   3            Emily Scarlett
   ...          ...
   12           Isabelle Jonsson

DB2 and SQL Server, on the other hand, have a different approach: They don't allow the second transaction to read the entire Student table until the first one decides to commit or roll back. The SELECT in the second transaction will be blocked until the first transaction (which keeps the Student table locked) finishes.

Note that a SELECT statement in the second transaction that refers strictly to rows that weren't added, modified, or deleted by the first transaction isn't affected in any way by that transaction (for example, it isn't blocked until the first transaction finishes, in SQL Server and DB2). An example of such a statement is as follows:

SELECT * FROM Student WHERE StudentID=10

Or even:

SELECT * FROM Student WHERE StudentID<100

Note that with SQL Server and MySQL, on the second connection you can set the transaction isolation level to READ UNCOMMITTED. This way, when you read the Student table, you'll see even the row that was inserted by the first transaction, even though that transaction hasn't been committed:

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
SELECT StudentID, Name FROM Student WHERE StudentID>1200;

Now let's roll back the first transaction to get the database to its original form:

ROLLBACK;

Transaction Best Practices and Avoiding Deadlocks

Transactions are, in a way, a necessary evil in database programming. They provide the functionality you need to ensure data consistency, but they reduce concurrency. The more consistency a transaction isolation level provides, the less concurrency you have and hence the greater performance penalties for concurrently running transactions.

The higher the isolation level you set, the more the users accessing the database at the same time are affected. For this reason, it's important to always use the lowest possible transaction isolation level:

The lowest isolation level, READ UNCOMMITTED, doesn't provide any concurrency protection, and it should be avoided except for the times when you don't require the data you read to be very accurate.
The default isolation level, READ COMMITTED, is usually the best choice, because it protects you from dirty reads, which are the most common concurrency problem.
The higher isolation protection levels, REPEATABLE READ and SERIALIZABLE, can and should usually be avoided. Apart from hurting performance, they also increase the probability of deadlocks.

A deadlock is a situation when two or more transactions started processing, locked some resources, and they both end up waiting for each other to release the locks in order to complete execution. When this happens, there can be only one that can win the battle and finish processing. The database server you use will choose one of the transactions (named a deadlock victim) and roll it back, so the other one can finish execution.

Deadlocks can rarely be entirely eliminated from a complex database system, but there are certain steps you can make to lower the probability of their happening. Of course, most of the times the rules depend on your particular system, but here are a few general rules to keep in mind:

Use the lowest possible transaction isolation level.
Keep the transactions as short as possible.
Inside transactions, access database objects in the same order.
Don't keep transactions open while waiting for user input (okay, this is common sense, but it had to be mentioned).

Distributed Transactions and the Two-Phase Commit

It's not unusual these days for companies to have, say, SQL Server, Oracle, and MySQL installations on their servers. This leads to the need of conducting transactions that spread over more databases (say, a transaction that needs to update information on three different database servers).

This situation is a bit problematic because all you've studied so far is about transactions that apply to a single database server only. Distributed transactions are possible through specific protocols (depending on the platform) that use a two-phase commit protocol system. The database originating the distributed transaction is called a Commit Coordinator, and it coordinates the transaction.

The first phase in the two phase-commit is when the Commit Coordinator instructs all participating databases to perform the required actions. The participating databases start individual transactions, and when they're ready to commit, they send a "ready to commit" signal to the Commit Coordinator.

After all participating databases send a "ready to commit" signal, the Commit Coordinator instructs all of them to commit the operations, and the distributed transaction is committed. If any of the participating databases report a failure, the Commit Coordinator instructs all the other databases to roll back and cancels the transaction.