Algorithms Behind Modern Storage Systems [Reproduced]

The original article is here

A brief introduction:

As we know, there are two kinds of workloads in database: one is write-heavy and the other is read-heavy. With read-heavy workload, using B+Tree/B-Tree will bring with read optimization. But it will be expensive when doing modification operation (update, insert or deletion) due to we need to merge/split nodes or move items in nodes. B-Tree is 1) Sorted, 2) Self-balancing, 3) Logarithmic lookup time and 4) Mutable. With write-heavy workload, LSM-Trees (log-structured merge-tree) is an immutable disk-resident write-optimized data structure. It will transform most of the modification operations to a log entry and put it to SSTable (Sorted String Tables). The upper layer of LSM-Tree is an in-memory skip-list or binary search tree to accelerate read operations. When its size reaches to the threshold, this data structure will be flushed to disk with compaction. LSM-Tree will be good for point-query and scan operation, but will not be friendly to find operation.

In order to batch these operation together for writing to disk with one IO operation, we need to use WAL (write ahead log) to guarantee atomicity and durability.

In conclusion, there is no convenient to achieve higher performance for both write and read operations, so we need to choose our data structure carefully.