How to Build a Non-Volatile Memory DBMS (Survey)

This is a summary about a paper and a related talk in Sigmod 17, and this paper concludes the three main ways to make use of the special features in NVM when build up a DBMS.


NVM properties:

    1. Byte addressable: Loads and stores un`like SSD/HDD
    2. High random write throughput: Orders of magnitude higher than SSD/HDD, Smaller gap between sequential & random write throughput.
    3. Read-write asymmetry & wear-leveling: Writes might take longer to complete compared to reads. Excessive writes to a single NVM cell can destroy it.

Evaluation NVM

  • => HDD/SSD/Emulated NVM.
    • Tune DRAM latency to emulate NVM (increasing latency)
    • CLWB (Cache-line write back) instruction

Three levels in NVM DBMS:

1) Access Interfaces,
2) Storage Manager
3) Execution Engine.

Allocation interface

  • Standard functions

    • Dynamic memory allocation
    • Meta-data management
    • Durability mechanism
      • Two-step implementation
        • Allocation first by using CLWB
        • Issues a SFENCE to make sure changes are visible
    • Naming mechanism
      • Map NVM to a well know base address, Absolute pointer = Base address + Relative pointer
    • Recovery mechanism
      • Consider a crashing before linking the chunk to DS which occur to losing of this chunk (neither allocator nor application can reclaim the space).
  • File system interface

    • Device -> page cache -> app buffer => Device -> app buffer
    • 64 bytes cache line atomic

Storage Manager

  • Logging & Recovery
    • Write ahead log (LET’S TALK ABOUT STORAGE AND RECOVERY METHODS FOR NON-VOLATILE MEMORY DATABASE SYSTEMS SIGMOD' 2015)
      • Avoid duplicating data in the log and the checkpoint
        • logging the meta-data not the data updates
        • WAL + FS API vs. WA(meta-data)L + Allocation API
    • Write behind log (WRITE-BEHIND LOGGING VLDB' 17)
      • pros
        1. Write-behind logging enables instant recovery
        2. Improves device utilization by reducing data duplication
        3. Extends the device lifetime
  • Data placement
    • Three-tiers: DRAM (meta-data + data) + NVM (log + data) + SSD (data)
    • DRAM -> hot data (cache), NVM -> warm data, SSD -> cold data

Access Methods

  • Read-write asymmetry & wear-leveling
    – Writes might take longer to complete compared to reads
    – Excessive writes to a single NVM cell can destroy it
  • Write-limited access methods
    • NVM-aware B+Tree, hash table
      • NVM-aware B+Tree:
        • In leaf node, using linear scan (unsorted data) with fewer writes instead of binary search (sorted data) with more writes.
        • Using unbalanced tree (fewer writes)

Execution engine

  • Plan executor
    • reduce the number of writes (Sorting algorithm and join algorithm) by using 1) adjustment of write-intensivity knob and 2) write-limited algorithms.
      • Hybrid join/sort
      • Segment sort => adjust x (x% merge sort + (1-x)% selection sort)