Recovery Algorithm

Classifications

Storage Types: Volatile Storage (DRAM/SRAM) => Non-volatile Storage (HDD, SSD, NVM) => Stable Storage (multiple storage devices)
Failure Types: Transaction / System / Storage Media
- Transaction Errors: Logical errors / Internal State Errors
- System Errors: Software (Exceptions, e.g., divide by zero) / hardware (Power down)
- Fail-Stop Assumption: Non-volatile Storage will not be corrupted
- Storage Media Failure: Game Over! WTF

Procedures (Buffer Pool):
1. Copy target record into memory
2. Perform the writes in memory
3. Write Dirty records back to disks
Issue: There are two transactions: T1, T2. T1 is still uncommitted but T2 is already committed. At the same time, the dirty records is written back to non-volatile device. T1 needs to roll back. What’s happened?
- Steal + Force:
  1. Never have to undo changes of an aborted TX because the changes were not written to disk
  2. Never have to redo changes of a committed TX because all the changes are guaranteed to be written to disk at commit time

Automatically switch roles between master and shadow
Undo: Remove the shadow pages, switch to the master
Cons:
1. Copying the entire page table is expensive (a index structure like B+Tree, Path copying)
2. Commit Overhead is high

Format <Tid, Object Id, Previous Value, Current Value> (Undo + Redo)
Trade-off:
1. When should we write log entries to disk?
2. When should we write dirty records to disk?

Physical Log: Record the data changes in the database
Logical Log: Higher level operations (e.g., UPDATE, INSERT)
- Cons: higher time costing to recover
- Pros: Less space needed

Actions:
1. Output onto stable storage all log records currently residing in main memory
2. Output to the disk all modified blocks
3. Write a <CHECKPOINT> entry to the log and flush to stable storage
Recovery: Redo committed TXs after checkpoint + Undo uncommitted TXs after checkpoint