COMP207-P2

2021-11-21

T3 transaction management

Introduction to transactions

Overview

Transaction

A sequence of queries

The complication arises from that many might happen at the same time and that computers can fail…

Executing SQL Queries in “Real Life”

In parctice, problems may arise due to

Concurrency(并发性): SQL statements that overlap in time
Partial execution of SQL statements(e.g., due to failures)

Problem 1: Concurrency

如果第一位乘客订过这个座位后面的人就没法再定这个座位了

Transactions to Rescue

SQL allows us to state that a group of SQL Statements must be executed so that no conflicts arise

Problem 2: Partial Execution

Acounts(accountNo, accountHolder, balance)
Goal: transfer 100 gbp from account ‘123’ to account ‘456’

Transactions to the Rescue

SQL allows to state that a transaction must be executed atomically(as a whole or not at all)

Transactions in SQL - Summary

Transaction in SQL: a sequence of SQL statements

Schedules, a low-level view and notation

Overview

Introduce a simplified and lower level view of transaction as well as schedules and notatios for all of these

Translating SQL into Low-level Operations

Employees(e_id, first_name, last_name, birth_day, salary)

Simplifying Low-level Operation

Basic Operations of Transactions

read(X): Reads a database item X into a program variable (also named X, for simplicity)

Find the address of the disk block(page) that contains item X
Copy that disk block into a buffer in main memory
- if that disk block is not already in some main memory buffer
Copy item X from the buffer to the program variable X

write(X): Writes the value of program variable X into the database item named X

Find the address of the disk block(page) that contains item X.
Copy that disk blok into a buffer in main memory
- if that disk block is not already in some main memory buffer
- Copy item X from the program variable X into its correct location in the buffer
- Store the updated block from the buffer back to disk either immediately or at some later point in time.

Transactions(in general)

A logical unit of processing using access operations

Begin
End (Begin/end are are omitted when the beginning/end of a transaction are understood)
read
write
+other non-databse operation

Schedules

Schedules hold many transactions for execution
The operations making up the transactions are then executed by the schedule in some order

It mus preserve that the operation in each transaction happens in the right order!

Two types:

Serial Schedules
- Exectutes the transactions one after another
- Concurrent Schedules
  - Can Interleave operation from the transactions (while still preserving that the operations in each transaction happens in the right order) - formally speaking, a serial schedule is therefore also concurrent…

A Serial Schedule

Executes all operations in transaction T1,
then all operations in transaction T2.

For simplicity we will typically ignore
the non-database operations…

Shorthand Notation for Schedules

Another Example

Sb: r1(x); w1(x); c1; r2(x); w2(x); c2

Order matter

Concurrent Schedule

Summary

ACID in details

Overview

Atomicity
Consistency
Isolation
Durability

A - Atomicity

A transaction is an atomic unit of processing

An indivisible unit of execution
Executed in its entirety or not at all

Deals with failure(“aborts”)

User aborts transaction
System aborts transaction
Transaction aborts itself
System crashes, network failure

Abort - an error prevented full execution

we UNDO the work done up to the error point
- System re-creates the database state as it was before the start of the aborted transaction

Commit - no error, entire transaction executes

The system is updated correctly

Atomicity is broken

C - Consistency

A correct execution of the transaction must take the database from one consistent state to another

Weak version: Transactions may not violate constraints(tyoical definition)
Strong version: It should correctly transfor the database to reflect the effect of a real world event(ISO/IEC 10026-1:1998)

Consistency is tricky

From definition:

A correct execution of the transaction must take the database from one consistent state to another

Transactions Preserce Consistency

I - Isolation

A schedule satisfies isolation iff it is serializable (I.e. like last slide: the effect of a serializable schedule is the same as some serial schedule)

Note: consistency is not really defined by serializable
A schedule is serializable iff it satisfies isolation

Weakening isolation

D - Durability

Once a transaction commits and changes the database, these changes cannot be lost because of subsequent failure

Once a transaction commits and changes the database, these changes cannot be lost because of
subsequent failure
But could be ovwewritten by a later update
We REDO the transaction if there are any problems after the update
Durability deals with things like media failure

Some relational DBMS Components

Transaction - ACID Properties

Schedules that behaves nearly serial

Overview

how much concurrency can be allowed while still being close enough to serial schedule

Schedules - Two Extremes

Serial Schedules

execute correctly
maintain consistency of the database
infficient in multi-user enviroment(because transactions have to wait until others are finished)

Concurrent Schedules(non-serial)

may not execute correctly
may not guarantee consistency of the database or isolation
efficient in multi-user environments(if you get asked to do something, just do it)

A Serial Schedule

A Concurrent Schedule

Efficiency: Say that this table denotes arrival time of
the commands (and that each commands takes 1 time
unit)

Concurrent Schedule not always equilent to a serial Schedual

Basic Operation of Transactions

read(X): Reads a database item X into program variable(also named X, for simplicity)

Find the address of the disk block(page) that contains item X
Copy that disk block into buffer in main memory
- if that disk block is not already in some main memory b
Copy item X from the buffer to the program variable X

write(X): Writes the value of program variable X into the database item named X

Find the address of the disk block (page) that contains item X.
Copy that disk block into a buffer in main memory
- if that disk block is not already in some main memory buffer.
- Copy item X from the program variable X into its correct location in the buffer
- Store the updated block from the buffer back to disk either immediately or at some later poin

Concurrent Schedules Do Not Guarantee Consistency or Isolation

Serializable Schedules

A schedule S is serializable if there is a serial schedule S’that has the same
effect as S on every initial database state.

Serializable schedules are essentially those schedules
that we are looking for!

Guarantee:

Correctness and consistency(because serial schedules do)
Does not satisfy isolation, but we could fix that

Problem: serializability is difficult to test

Does not only depend on reads, writes, and commits, but also on the non-database operations
Non-database operations can be complex

Conflict-serializable schedules

Overview

Serializable schedules are too complex, so what is DBMS doing instead?

Conflict-Serializability

Stronger form of serializability that is used/enforced by most commercial DBMS

Based on the notion of a conflict.

Conflict - Characterisation

A conflict in a schedule is a pair of operations

from different transactions
that access the same item
and at least one of them is a write operation

Conflict-Serializability

Two schedules S and S’ are conflict-equivalent if S’ can be obtained from S by swapping any number of (1) consecutive (2) non-conflicting operations from (3) different transactions

A schedule is conflict-serialisable if it is conflict-equivalentto a serial schedule.

Schedules

Example

Summary

Recognizing a conflict-serializable schedule

Overview

recognize conflict-serializable

Precedence Graph

The precedence graph for a schedule S is defined as follows:

It is a directed graph
Its nodes are the transactions that occur in S.
It has an edge from transaction Ti to transaction Tj if there is a conflicting pair of operations op1 and op2 in S such that
- op1 appears before op2 in S
- op1 belongs to transaction Ti
- op2 belongs to transaction Tj

Testing Conflict-Serializability

Implication of a cycle

Summary

T4

Locking conflict-serializable

Overview

how conflict-serializable schedules

Transaction Scheduling in a DBMS

Simple Locking Mechanism

A transaction has to lock an item before it accesses it.

Locks are requested from & granted by the scheduler:

Each item is locked by at most one transaction at a time
Transactions wait until a lock can be granted

Each lock has to be released(ublocked) eventually

Schedules With Simple Locks

Extend syntax for schedules by two operations:

I_i(X): transaction i requests a lock for item X
u_i(x): transaction i unlocks item X

Example:

May Not Be Serializable

Two-Phase Locking(2PL)

Simple modification of the simple locking mechanism that guarantees conflict-serializability

Two-phase locking(2PL) condition
In each transaction, all lock operations precede all unlocks (所有锁定操作优先于解锁操作)

Test a transaction for 2PL

Find first unlock

Phase 1 is up to, but not including, that unlock
Phase 2 is from unlock until the end

Transaction is 2PL iff phase 2 contains no lock

Summary

2PL ensures conflict-serializability

Overview

2PL schedules are conflict-serializability

Two-Phase Locking(2PL)

Simple modification of the simple locking mechanism that guarantees conflicy-serializability

Two-phase locking(2PL) condition:

In each transaction, all lock operations precede all unlocks

2PL Ensures Conflict-Serializability

If S is a schedule containing only 2PL transactions, then S is conflict-serializable
Proof idea

More flexible locks

Overview

Still Some Issues

2PL ensures conflict-serializability, but might lead to

Deadlocks: transactions might be forced to wait forever
other issues

Risk of Deadlocks

Different lock modes make 2PL more flexible

Shared & Exclusive locks

Schedules With Shared/Exclusive Locks

Problems With “Upgrading” Locks

Update Locks to the Rescue

Update lock:

Requested by transactions to read an item - Operation: u-lock(X) or ul_i(X)
May be upgraded later to an exlusive lock(shared locks can no longer be upgraded)
Granted to at most one transaction at a time

New upgrading policy:

Example 1: Avoiding the Deadlock

Example 2

Two-Phase Locking(2PL) with Shared/Exclusive/Update Locks

Straightforward generalisation:

In each transaction, all lock operations (i.e., shared, exclusive, or update lock requests) precede all unlocks

Locks With Miltiple Granularity

DBMS may use locks at different levels of granularity

May lock relations
May lock disk blocks
May lock tuples

Examples:

1 2	SELECT name FROM Student WHERE studentID = 123456; SELECT avg(salary) FROM Employee;

Trade-Offs

Locking at too coarse granularity:

Low overhead (don’t need to store too much information)
Less degree of concurrency: may cause unnecessary delays

Locking at too fine granularity:

High overhead: need to keep track of all locked items
High degree of concurrency: no unnecessary delays

Need to prevent issues such as the following
to guarantee (conflict-) serialisability:

A transactions holds shared lock for a tuple.
Another transaction holds exclusive lock for the relation.

Intention Locks

Policy for Granting Locks

Summary

Why Might a Transaction Abort

Some relation DBMS Components

Why Might a Transactino Abort

Errors while executing transactions

Violation of integrity constraints, other run-time errors

Problem 1: Concurrency

Why Might a Transaction Abort

Deadlocks

Why Might a Transaction Abort

Beyond the DBMS’s Control

Summary

Undo Logging for database

undo log是把所有没有COMMIT的事务回滚到事务开始前的状态，系统崩溃时，可能有些事务还没有COMMIT，在系统恢复时，这些没有COMMIT的事务就需要借助undo log来进行回滚
使用undo log时，要求：
1. 记录修改日志时，(T，x，v）中v为x修改前的值，这样才能借助这条日志来回滚；
2. 事务提交后，必须在事务的所有修改（包括记录的修改日志）都持久化后才能写COMMIT T日志；这样才能保证，宕机恢复时，已经COMMIT的事务的所有修改都已经持久化，不需要回滚。
使用undo log时事务执行顺序
1. 记录START T
2. 记录需要修改的记录的旧值（要求持久化）
3. 根据事务的需要更新数据库（要求持久化）
4. 记录COMMIT T
使用undo log进行宕机回滚
1. 扫描日志, 找出所有已经start, 还没有commit的事物
2. 针对所有commit的日志, 根据undo log来进行回滚
  Overview
  handle failure using undo log

Logging in DBMS

Idea: write important activities to a log so that a desired database state can be recovered later

Examples of important activities:

Starts of transactions, commits, aborts
Modification of database items

Should work even in case of system failures!

Techniques:

Undo logging (for maintaining Atomicity)
Redo logging (for maintaining Durability)
Combinations thereof (specifically Undo/Redo for doing both Atomicity and Durability)

Extension of Transaction Syntax

Undo Logging

Undo Logging: Procedure

If transaction T updates item X and the old value was v, then<T, X, v> must be written to the log on the disk before X is written to disk.
If transaction T commits, then must be written to disk as soon as all database
elements changed by T have been written to disk

Example

Scenario 1

Scenario 2

Scenario 3

Recovery With Undo Logs

Example of Unndo Logging

If a Transaction Aborts

Summary

Other kinds of logging

Overview

other kinds of logging how to get atomicity and durability from logging

Redo Logging

redo log是指在回放日志的时候把已经COMMIT的事务重做一遍，对于没有commit的事务按照abort处理，不进行任何操作。

使用redo log时, 要求
1. 记录redo log时，(T,x，v）中的v必须是x修改后的值，否则不能通过redo log来恢复已经COMMIT的事务。
2. 写COMMIT T日志之前，事务的修改不能进行持久化，否则恢复时，对于未COMMIT的操作，可能有数据已经修改，但重放redo log不会对该事务做任何处理，从而不能保证事务的原子性。
使用redo log时事物执行顺序:
1. 记录START T
2. 记录事物需要修改记录的新值(要求持久化)
3. 记录COMMIT T(需要持久化)
4. 将事物相关的修改写入数据库
使用redo log重做事务
1. 扫描日志, 找到所有commit事务
2. 对已经commit的事物, 根据redo log重做事务