Recovery Algorithm
- Today: ensure DBMS can recover from a failure (during normal TX processing)
- Idea: Recover to a state with ACID guarantee
Classifications
- Storage Types: Volatile Storage (DRAM/SRAM) => Non-volatile Storage (HDD, SSD, NVM) => Stable Storage (multiple storage devices)
- Failure Types: Transaction / System / Storage Media
- Transaction Errors: Logical errors / Internal State Errors
- System Errors: Software (Exceptions, e.g., divide by zero) / hardware (Power down)
- Fail-Stop Assumption: Non-volatile Storage will not be corrupted
- Storage Media Failure: Game Over! WTF
Use volatile memory for faster access:
- Procedures (Buffer Pool):
- Copy target record into memory
- Perform the writes in memory
- Write Dirty records back to disks
- Issue: There are two transactions: T1, T2. T1 is still uncommitted but T2 is already committed. At the same time, the dirty records is written back to non-volatile device. T1 needs to roll back. What’s happened?
- Steal + Force:
- Never have to undo changes of an aborted TX because the changes were not written to disk
- Never have to redo changes of a committed TX because all the changes are guaranteed to be written to disk at commit time
- Steal + Force:
Shadow Paging
- Automatically switch roles between master and shadow
- Undo: Remove the shadow pages, switch to the master
- Cons:
- Copying the entire page table is expensive (a index structure like B+Tree, Path copying)
- Commit Overhead is high
WAL (Write Ahead Log)
- Format
<Tid, Object Id, Previous Value, Current Value>
(Undo + Redo) - Trade-off:
- When should we write log entries to disk?
- When should we write dirty records to disk?
Log Styles
- Physical Log: Record the data changes in the database
- Logical Log: Higher level operations (e.g., UPDATE, INSERT)
- Cons: higher time costing to recover
- Pros: Less space needed
Checkpoint
- Actions:
- Output onto stable storage all log records currently residing in main memory
- Output to the disk all modified blocks
- Write a
<CHECKPOINT>
entry to the log and flush to stable storage
- Recovery: Redo committed TXs after checkpoint + Undo uncommitted TXs after checkpoint