T3 transaction management
Introduction to transactions
Overview
Transaction
- A sequence of queries
The complication arises from that many might happen at the same time and that computers can fail…
Executing SQL Queries in “Real Life”
In parctice, problems may arise due to
- Concurrency(并发性): SQL statements that overlap in time
- Partial execution of SQL statements(e.g., due to failures)
Problem 1: Concurrency
如果第一位乘客订过这个座位后面的人就没法再定这个座位了
Transactions to Rescue
SQL allows us to state that a group of SQL Statements must be executed so that no conflicts arise
Problem 2: Partial Execution
Acounts(accountNo, accountHolder, balance)
Goal: transfer 100 gbp from account ‘123’ to account ‘456’
Transactions to the Rescue
SQL allows to state that a transaction must be executed atomically(as a whole or not at all)
Transactions in SQL - Summary
Transaction in SQL: a sequence of SQL statements
Schedules, a low-level view and notation
Overview
Introduce a simplified and lower level view of transaction as well as schedules and notatios for all of these
Translating SQL into Low-level Operations
Employees(e_id, first_name, last_name, birth_day, salary)
Simplifying Low-level Operation
Basic Operations of Transactions
read(X): Reads a database item X into a program variable (also named X, for simplicity)
- Find the address of the disk block(page) that contains item X
- Copy that disk block into a buffer in main memory
- if that disk block is not already in some main memory buffer
- Copy item X from the buffer to the program variable X
write(X): Writes the value of program variable X into the database item named X
- Find the address of the disk block(page) that contains item X.
- Copy that disk blok into a buffer in main memory
- if that disk block is not already in some main memory buffer
- Copy item X from the program variable X into its correct location in the buffer
- Store the updated block from the buffer back to disk either immediately or at some later point in time.
Transactions(in general)
A logical unit of processing using access operations
- Begin
- End (Begin/end are are omitted when the beginning/end of a transaction are understood)
- read
- write
- +other non-databse operation
Schedules
Schedules hold many transactions for execution
The operations making up the transactions are then executed by the schedule in some order
- It mus preserve that the operation in each transaction happens in the right order!
Two types:
- Serial Schedules
- Exectutes the transactions one after another
- Concurrent Schedules
- Can Interleave operation from the transactions (while still preserving that the operations in each transaction happens in the right order) - formally speaking, a serial schedule is therefore also concurrent…
A Serial Schedule
Executes all operations in transaction T1,
then all operations in transaction T2.
For simplicity we will typically ignore
the non-database operations…
Shorthand Notation for Schedules
Another Example
Sb: r1(x); w1(x); c1; r2(x); w2(x); c2
Order matter
Concurrent Schedule
Summary
ACID in details
Overview
- Atomicity
- Consistency
- Isolation
- Durability
A - Atomicity
A transaction is an atomic unit of processing
- An indivisible unit of execution
- Executed in its entirety or not at all
Deals with failure(“aborts”)
- User aborts transaction
- System aborts transaction
- Transaction aborts itself
- System crashes, network failure
Abort - an error prevented full execution
- we UNDO the work done up to the error point
- System re-creates the database state as it was before the start of the aborted transaction
Commit - no error, entire transaction executes
- The system is updated correctly
Atomicity is broken
C - Consistency
A correct execution of the transaction must take the database from one consistent state to another
- Weak version: Transactions may not violate constraints(tyoical definition)
- Strong version: It should correctly transfor the database to reflect the effect of a real world event(ISO/IEC 10026-1:1998)
Consistency is tricky
From definition:
A correct execution of the transaction must take the database from one consistent state to another
Transactions Preserce Consistency
I - Isolation
A schedule satisfies isolation iff it is serializable (I.e. like last slide: the effect of a serializable schedule is the same as some serial schedule)
- Note: consistency is not really defined by serializable
- A schedule is serializable iff it satisfies isolation
Weakening isolation
D - Durability
Once a transaction commits and changes the database, these changes cannot be lost because of subsequent failure
- Once a transaction commits and changes the database, these changes cannot be lost because of
subsequent failure - But could be ovwewritten by a later update
- We REDO the transaction if there are any problems after the update
- Durability deals with things like media failure
Some relational DBMS Components
Transaction - ACID Properties
Schedules that behaves nearly serial
Overview
how much concurrency can be allowed while still being close enough to serial schedule
Schedules - Two Extremes
Serial Schedules
- execute correctly
- maintain consistency of the database
- infficient in multi-user enviroment(because transactions have to wait until others are finished)
Concurrent Schedules(non-serial)
- may not execute correctly
- may not guarantee consistency of the database or isolation
- efficient in multi-user environments(if you get asked to do something, just do it)
A Serial Schedule
A Concurrent Schedule
Efficiency: Say that this table denotes arrival time of
the commands (and that each commands takes 1 time
unit)
Concurrent Schedule not always equilent to a serial Schedual
Basic Operation of Transactions
read(X): Reads a database item X into program variable(also named X, for simplicity)
- Find the address of the disk block(page) that contains item X
- Copy that disk block into buffer in main memory
- if that disk block is not already in some main memory b
- Copy item X from the buffer to the program variable X
write(X): Writes the value of program variable X into the database item named X
- Find the address of the disk block (page) that contains item X.
- Copy that disk block into a buffer in main memory
- if that disk block is not already in some main memory buffer.
- Copy item X from the program variable X into its correct location in the buffer
- Store the updated block from the buffer back to disk either immediately or at some later poin
Concurrent Schedules Do Not Guarantee Consistency or Isolation
Serializable Schedules
A schedule S is serializable if there is a serial schedule S’that has the same
effect as S on every initial database state.
Serializable schedules are essentially those schedules
that we are looking for!
Guarantee:
- Correctness and consistency(because serial schedules do)
- Does not satisfy isolation, but we could fix that
Problem: serializability is difficult to test
- Does not only depend on reads, writes, and commits, but also on the non-database operations
- Non-database operations can be complex
Conflict-serializable schedules
Overview
Serializable schedules are too complex, so what is DBMS doing instead?
Conflict-Serializability
Stronger form of serializability that is used/enforced by most commercial DBMS
Based on the notion of a conflict.
Conflict - Characterisation
A conflict in a schedule is a pair of operations
- from different transactions
- that access the same item
- and at least one of them is a write operation
Conflict-Serializability
Two schedules S and S’ are conflict-equivalent if S’ can be obtained from S by swapping any number of (1) consecutive (2) non-conflicting operations from (3) different transactions
A schedule is conflict-serialisable if it is conflict-equivalentto a serial schedule.
Schedules
Example
Summary
Recognizing a conflict-serializable schedule
Overview
recognize conflict-serializable
Precedence Graph
The precedence graph for a schedule S is defined as follows:
- It is a directed graph
- Its nodes are the transactions that occur in S.
- It has an edge from transaction Ti to transaction Tj if there is a conflicting pair of operations op1 and op2 in S such that
- op1 appears before op2 in S
- op1 belongs to transaction Ti
- op2 belongs to transaction Tj
Testing Conflict-Serializability
Implication of a cycle
Summary
T4
Locking conflict-serializable
Overview
how conflict-serializable schedules
Transaction Scheduling in a DBMS
Simple Locking Mechanism
A transaction has to lock an item before it accesses it.
Locks are requested from & granted by the scheduler:
- Each item is locked by at most one transaction at a time
- Transactions wait until a lock can be granted
Each lock has to be released(ublocked) eventually
Schedules With Simple Locks
Extend syntax for schedules by two operations:
- Ii(X): transaction i requests a lock for item X
- ui(x): transaction i unlocks item X
Example:
May Not Be Serializable
Two-Phase Locking(2PL)
Simple modification of the simple locking mechanism that guarantees conflict-serializability
Two-phase locking(2PL) condition
In each transaction, all lock operations precede all unlocks (所有锁定操作优先于解锁操作)
Test a transaction for 2PL
Find first unlock
- Phase 1 is up to, but not including, that unlock
- Phase 2 is from unlock until the end
Transaction is 2PL iff phase 2 contains no lock
Summary
2PL ensures conflict-serializability
Overview
2PL schedules are conflict-serializability
Two-Phase Locking(2PL)
Simple modification of the simple locking mechanism that guarantees conflicy-serializability
Two-phase locking(2PL) condition:
In each transaction, all lock operations precede all unlocks
2PL Ensures Conflict-Serializability
If S is a schedule containing only 2PL transactions, then S is conflict-serializable
Proof idea
More flexible locks
Overview
Still Some Issues
2PL ensures conflict-serializability, but might lead to
- Deadlocks: transactions might be forced to wait forever
- other issues
Risk of Deadlocks
Different lock modes make 2PL more flexible
Shared & Exclusive locks
Schedules With Shared/Exclusive Locks
Problems With “Upgrading” Locks
Update Locks to the Rescue
Update lock:
- Requested by transactions to read an item - Operation: u-lock(X) or uli(X)
- May be upgraded later to an exlusive lock(shared locks can no longer be upgraded)
- Granted to at most one transaction at a time
New upgrading policy:
Example 1: Avoiding the Deadlock
Example 2
Two-Phase Locking(2PL) with Shared/Exclusive/Update Locks
Straightforward generalisation:
In each transaction, all lock operations (i.e., shared, exclusive, or update lock requests) precede all unlocks
Locks With Miltiple Granularity
DBMS may use locks at different levels of granularity
- May lock relations
- May lock disk blocks
- May lock tuples
Examples:
1 | SELECT name FROM Student WHERE studentID = 123456; |
Trade-Offs
Locking at too coarse granularity:
- Low overhead (don’t need to store too much information)
- Less degree of concurrency: may cause unnecessary delays
Locking at too fine granularity:
- High overhead: need to keep track of all locked items
- High degree of concurrency: no unnecessary delays
Need to prevent issues such as the following
to guarantee (conflict-) serialisability:
- A transactions holds shared lock for a tuple.
- Another transaction holds exclusive lock for the relation.
Intention Locks
Policy for Granting Locks
Summary
Why Might a Transaction Abort
Some relation DBMS Components
Why Might a Transactino Abort
Errors while executing transactions
- Violation of integrity constraints, other run-time errors
Problem 1: Concurrency
Why Might a Transaction Abort
Deadlocks
Why Might a Transaction Abort
Beyond the DBMS’s Control
Summary
Undo Logging for database
- undo log是把所有没有COMMIT的事务回滚到事务开始前的状态,系统崩溃时,可能有些事务还没有COMMIT,在系统恢复时,这些没有COMMIT的事务就需要借助undo log来进行回滚
- 使用undo log时,要求:
- 记录修改日志时,(T,x,v)中v为x修改前的值,这样才能借助这条日志来回滚;
- 事务提交后,必须在事务的所有修改(包括记录的修改日志)都持久化后才能写COMMIT T日志;这样才能保证,宕机恢复时,已经COMMIT的事务的所有修改都已经持久化,不需要回滚。
- 使用undo log时事务执行顺序
- 记录START T
- 记录需要修改的记录的旧值(要求持久化)
- 根据事务的需要更新数据库(要求持久化)
- 记录COMMIT T
- 使用undo log进行宕机回滚
Logging in DBMS
Idea: write important activities to a log so that a desired database state can be recovered later
Examples of important activities:
- Starts of transactions, commits, aborts
- Modification of database items
Should work even in case of system failures!
Techniques:
- Undo logging (for maintaining Atomicity)
- Redo logging (for maintaining Durability)
- Combinations thereof (specifically Undo/Redo for doing both Atomicity and Durability)
Extension of Transaction Syntax
Undo Logging
Undo Logging: Procedure
- If transaction T updates item X and the old value was v, then<T, X, v> must be written to the log on the disk before X is written to disk.
- If transaction T commits, then
must be written to disk as soon as all database
elements changed by T have been written to disk
Example
Scenario 1
Scenario 2
Scenario 3
Recovery With Undo Logs
Example of Unndo Logging
If a Transaction Aborts
Summary
Other kinds of logging
Overview
other kinds of logging how to get atomicity and durability from logging
Redo Logging
redo log是指在回放日志的时候把已经COMMIT的事务重做一遍,对于没有commit的事务按照abort处理,不进行任何操作。
使用redo log时, 要求
- 记录redo log时,(T,x,v)中的v必须是x修改后的值,否则不能通过redo log来恢复已经COMMIT的事务。
- 写COMMIT T日志之前,事务的修改不能进行持久化,否则恢复时,对于未COMMIT的操作,可能有数据已经修改,但重放redo log不会对该事务做任何处理,从而不能保证事务的原子性。
使用redo log时事物执行顺序:
- 记录START T
- 记录事物需要修改记录的新值(要求持久化)
- 记录COMMIT T(需要持久化)
- 将事物相关的修改写入数据库
使用redo log重做事务
- 扫描日志, 找到所有commit事务
- 对已经commit的事物, 根据redo log重做事务