This is a summary about a paper and a related talk in Sigmod 17
, and this paper concludes the three main ways to make use of the special features in NVM when build up a DBMS.
NVM properties:
- Byte addressable: Loads and stores un`like SSD/HDD
- High random write throughput: Orders of magnitude higher than SSD/HDD, Smaller gap between sequential & random write throughput.
- Read-write asymmetry & wear-leveling: Writes might take longer to complete compared to reads. Excessive writes to a single NVM cell can destroy it.
Evaluation NVM
- => HDD/SSD/Emulated NVM.
- Tune DRAM latency to emulate NVM (increasing latency)
- CLWB (Cache-line write back) instruction
Three levels in NVM DBMS:
1) Access Interfaces,
2) Storage Manager
3) Execution Engine.
Allocation interface
Standard functions
- Dynamic memory allocation
- Meta-data management
- Durability mechanism
- Two-step implementation
- Allocation first by using CLWB
- Issues a SFENCE to make sure changes are visible
- Two-step implementation
- Naming mechanism
- Map NVM to a well know base address,
Absolute pointer = Base address + Relative pointer
- Map NVM to a well know base address,
- Recovery mechanism
- Consider a crashing before linking the chunk to DS which occur to losing of this chunk (neither allocator nor application can reclaim the space).
File system interface
- Device -> page cache -> app buffer => Device -> app buffer
- 64 bytes cache line atomic
Storage Manager
- Logging & Recovery
- Write ahead log (
LET’S TALK ABOUT STORAGE AND RECOVERY METHODS FOR NON-VOLATILE MEMORY DATABASE SYSTEMS SIGMOD' 2015
)- Avoid duplicating data in the log and the checkpoint
- logging the meta-data not the data updates
WAL + FS API
vs.WA(meta-data)L + Allocation API
- Avoid duplicating data in the log and the checkpoint
- Write behind log (
WRITE-BEHIND LOGGING VLDB' 17
)- pros
- Write-behind logging enables instant recovery
- Improves device utilization by reducing data duplication
- Extends the device lifetime
- pros
- Write ahead log (
- Data placement
- Three-tiers: DRAM (meta-data + data) + NVM (log + data) + SSD (data)
- DRAM -> hot data (cache), NVM -> warm data, SSD -> cold data
Access Methods
- Read-write asymmetry & wear-leveling
– Writes might take longer to complete compared to reads
– Excessive writes to a single NVM cell can destroy it - Write-limited access methods
- NVM-aware B+Tree, hash table
- NVM-aware B+Tree:
- In leaf node, using linear scan (unsorted data) with fewer writes instead of binary search (sorted data) with more writes.
- Using unbalanced tree (fewer writes)
- NVM-aware B+Tree:
- NVM-aware B+Tree, hash table
Execution engine
- Plan executor
- reduce the number of writes (Sorting algorithm and join algorithm) by using 1) adjustment of write-intensivity knob and 2) write-limited algorithms.
- Hybrid join/sort
- Segment sort => adjust x (x% merge sort + (1-x)% selection sort)
- reduce the number of writes (Sorting algorithm and join algorithm) by using 1) adjustment of write-intensivity knob and 2) write-limited algorithms.