Replace Components in Libcds
Build up the source trees
TODO
A coder and systemer
apt-get
tool in Ubuntu.C++17
cannot be used in the environment.update-alternatives
to switch new version on.my_test/
,ctags
,’CMakeFiles/‘ to .gitignore
for neglecting some unnecessary files.my_test
and run cmake
in this directory and indicate the path of hwloc
and libboost
.make -j16
to build up and generate the static library.CMake
(add -DCDS_DISABLE_128BIT_ATOMIC flag for 64-bit building).libcds
The core functions are all in the cds
directory.
In algo
, there are some basic algorithm used in data structures, like flat combining (an OCC algorithm)
In most cases, Libcds
uses RCU (read-copy-update) or HP (hazard pointer) to handle concurrent. In conclusion, the main types of container including:
MichaelList
, Feldman Method
and so on.make test
to show the benchmarks.libcds
github homepage. (https://github.com/khizmax/libcds/issues/121/ )I am glad to see my proposal is accepted by STE||AR Group
, and I am eager to make contributions to this community. To push me to focus on my GSoC project, I decide to write a serial article to give a brief introduction to my project.
Firstly, I want to give my understand about HPX
, HPX
is a library to provide everything you want in HPC (high performance computation). HPX
includes most common used parallel algorithms and containers. Especially, some standard STL containers are available in HPX
.
STL containers such as vectors/maps/sets/etc are not thread safe. One cannot safely add or remove elements from one of these containers in one thread, while iterating or adding/removing in another thread without potentially catastrophic consequences (usually segmentation faults leading to eventual program failure). So these containers must upport concurrent access. Some work has begun on implementing concurrent structures in HPX, a concurrent unordered map with reader/writer lock and a partial implementation of concurrent vector exist, but they have not all been completed, do not have unit tests and need to be unified into an hpx::concurrent namespace. On the other hand, this lock based implementations are low efficiency, we hope to make use of lock free tech to complete our libraries.
A number of libraries implementing thread safe (sometimes lockfree) containers already exist that can be used for ideas and where code uses a boost compatible license can be integrated into HPX. The aim of the project is to collect as much information and as many implementations of threads safe containers and create or integrate them into the HPX library.
Fortunately, we find an open source library named libcds
. Moreover, we get the authorization to migrate these containers to our namespace.
libcds
is a data structure library including queue, stack, priority queue, binary tree, list, hash table, hash map, etc. Al these data structures are lock-free and have more than one implementation by using different algorithms.
The original article is here
As we know, there are two kinds of workloads in database: one is write-heavy and the other is read-heavy. With read-heavy workload, using B+Tree/B-Tree will bring with read optimization. But it will be expensive when doing modification operation (update, insert or deletion) due to we need to merge/split nodes or move items in nodes. B-Tree is 1) Sorted, 2) Self-balancing, 3) Logarithmic lookup time and 4) Mutable. With write-heavy workload, LSM-Trees (log-structured merge-tree) is an immutable disk-resident write-optimized data structure. It will transform most of the modification operations to a log entry and put it to SSTable (Sorted String Tables). The upper layer of LSM-Tree is an in-memory skip-list or binary search tree to accelerate read operations. When its size reaches to the threshold, this data structure will be flushed to disk with compaction. LSM-Tree will be good for point-query and scan operation, but will not be friendly to find operation.
In order to batch these operation together for writing to disk with one IO operation, we need to use WAL (write ahead log) to guarantee atomicity and durability.
In conclusion, there is no convenient to achieve higher performance for both write and read operations, so we need to choose our data structure carefully.
A good allocator should:
A good NVM allocator should:
nvm_malloc (VLDB’ 16)
Makalu (OOPLSA’ 16)
MAP_FIXED
flag of mmap
)NVML (intel Pmem
)
PAllocator
There are several analysis tools to measure our program and give suggestions to improve performance.
Now, we will introduce two common-used tools: gprof and valgrind
gmon.out
. And we can run profile tool to analysis occupied time of different functions.-pg
to your compiling file (e.g. CMake or Makefile)exe
gprof exe
in the commandgprof2dot
to visualize your analytics results. You can use gprof exe | gprof2dot | dot -Tpdf -o result.pdf
to generate a pdf file to display the results. gprof2dot
is a python tool, and you can use pip
to install it.valgrind --tool=callgrind ./cpuload
Using these out-of-box tools will benefit to hint us to optimize our codes well. Besides, other performance library like libpapi
need us to insert codes to our program, and it’s not good for programmability. For me, I prefer to use these performance profile tools to detect my codes.
A good paper will:
Rejection. Why?
Innovation
Experiments
cannot convice readers or need more experimentsMotivation
why
?Discussion
or Conclusion
is not rigorousFluent writing
Abundant Figure and Table
Tips
Reference:
Read write lock is a synchronization technology to solve multi read and write problem. There are three priorities: Read-prefering RW lock, Write-prefering RW lock, and Unspecified priority RW locks.
There are some libraries provide read write lock functions, like:
pthread_rwlock_t
(This WR lock provides different priorities)std::shared_mutex
in C++17
boost::shared_mutex
and boost::upgrade_mutex
in boost library
The read-copy-update (a.k.a. RCU) is another method to solve mutli reader and write problem. Another solution is seqlock
for fewer readers.
A simple persudo implementation:
More details about implementation in C++11
(use atomic variable as spinlock), please see this.
BTH, this simple WR_lock implement method do not gurantee writer has a higher priority.
Transaction Memory
is a concurrency mechanism, which provides atomic and isolated compared with normal memory access. Transaction memory
is easy to use, and simplify concurrent programming without lock or extra concurrent control method.
C++20 implements Transaction Memory
, and makes this becomes a standard in C++. I think this is the inevitable trend with rapid development in multi-core programming.
In gcc-7
, programmer can use more than three ways to use Transaction Memory
, and I will introduce how to write codes by taking advantage of Transaction Memory
Below is a example with using synchronized
keyword to achieve synchronized of multi threads.
|
|
The output in screen is like that:
Only one thread can access this variable i
at any time.