Replace Components in Libcds
Build up the source trees
TODO
A coder and systemer
apt-get tool in Ubuntu.C++17 cannot be used in the environment.update-alternatives to switch new version on.my_test/,ctags,’CMakeFiles/‘ to .gitignore for neglecting some unnecessary files.my_test and run cmake in this directory and indicate the path of hwloc and libboost.make -j16 to build up and generate the static library.CMake (add -DCDS_DISABLE_128BIT_ATOMIC flag for 64-bit building).libcdsThe core functions are all in the cds directory.
In algo, there are some basic algorithm used in data structures, like flat combining (an OCC algorithm)
In most cases, Libcds uses RCU (read-copy-update) or HP (hazard pointer) to handle concurrent. In conclusion, the main types of container including:
MichaelList, Feldman Method and so on.make test to show the benchmarks.libcds github homepage. (https://github.com/khizmax/libcds/issues/121/ )I am glad to see my proposal is accepted by STE||AR Group, and I am eager to make contributions to this community. To push me to focus on my GSoC project, I decide to write a serial article to give a brief introduction to my project.
Firstly, I want to give my understand about HPX, HPX is a library to provide everything you want in HPC (high performance computation). HPX includes most common used parallel algorithms and containers. Especially, some standard STL containers are available in HPX.
STL containers such as vectors/maps/sets/etc are not thread safe. One cannot safely add or remove elements from one of these containers in one thread, while iterating or adding/removing in another thread without potentially catastrophic consequences (usually segmentation faults leading to eventual program failure). So these containers must upport concurrent access. Some work has begun on implementing concurrent structures in HPX, a concurrent unordered map with reader/writer lock and a partial implementation of concurrent vector exist, but they have not all been completed, do not have unit tests and need to be unified into an hpx::concurrent namespace. On the other hand, this lock based implementations are low efficiency, we hope to make use of lock free tech to complete our libraries.
A number of libraries implementing thread safe (sometimes lockfree) containers already exist that can be used for ideas and where code uses a boost compatible license can be integrated into HPX. The aim of the project is to collect as much information and as many implementations of threads safe containers and create or integrate them into the HPX library.
Fortunately, we find an open source library named libcds. Moreover, we get the authorization to migrate these containers to our namespace.
libcds is a data structure library including queue, stack, priority queue, binary tree, list, hash table, hash map, etc. Al these data structures are lock-free and have more than one implementation by using different algorithms.
The original article is here
As we know, there are two kinds of workloads in database: one is write-heavy and the other is read-heavy. With read-heavy workload, using B+Tree/B-Tree will bring with read optimization. But it will be expensive when doing modification operation (update, insert or deletion) due to we need to merge/split nodes or move items in nodes. B-Tree is 1) Sorted, 2) Self-balancing, 3) Logarithmic lookup time and 4) Mutable. With write-heavy workload, LSM-Trees (log-structured merge-tree) is an immutable disk-resident write-optimized data structure. It will transform most of the modification operations to a log entry and put it to SSTable (Sorted String Tables). The upper layer of LSM-Tree is an in-memory skip-list or binary search tree to accelerate read operations. When its size reaches to the threshold, this data structure will be flushed to disk with compaction. LSM-Tree will be good for point-query and scan operation, but will not be friendly to find operation.
In order to batch these operation together for writing to disk with one IO operation, we need to use WAL (write ahead log) to guarantee atomicity and durability.
In conclusion, there is no convenient to achieve higher performance for both write and read operations, so we need to choose our data structure carefully.
A good allocator should:
A good NVM allocator should:
nvm_malloc (VLDB’ 16)
Makalu (OOPLSA’ 16)
MAP_FIXED flag of mmap)NVML (intel Pmem)
PAllocator

There are several analysis tools to measure our program and give suggestions to improve performance.
Now, we will introduce two common-used tools: gprof and valgrind
gmon.out. And we can run profile tool to analysis occupied time of different functions.-pg to your compiling file (e.g. CMake or Makefile)exegprof exe in the commandgprof2dot to visualize your analytics results. You can use gprof exe | gprof2dot | dot -Tpdf -o result.pdf to generate a pdf file to display the results. gprof2dot is a python tool, and you can use pip to install it.valgrind --tool=callgrind ./cpuloadUsing these out-of-box tools will benefit to hint us to optimize our codes well. Besides, other performance library like libpapi need us to insert codes to our program, and it’s not good for programmability. For me, I prefer to use these performance profile tools to detect my codes.
A good paper will:
Rejection. Why?
InnovationExperiments cannot convice readers or need more experimentsMotivationwhy?Discussion or Conclusion is not rigorousFluent writing
Abundant Figure and Table
Tips
Reference:
Read write lock is a synchronization technology to solve multi read and write problem. There are three priorities: Read-prefering RW lock, Write-prefering RW lock, and Unspecified priority RW locks.
There are some libraries provide read write lock functions, like:
pthread_rwlock_t (This WR lock provides different priorities)std::shared_mutex in C++17boost::shared_mutex and boost::upgrade_mutex in boost libraryThe read-copy-update (a.k.a. RCU) is another method to solve mutli reader and write problem. Another solution is seqlock for fewer readers.
A simple persudo implementation:
More details about implementation in C++11 (use atomic variable as spinlock), please see this.
BTH, this simple WR_lock implement method do not gurantee writer has a higher priority.
Transaction Memory is a concurrency mechanism, which provides atomic and isolated compared with normal memory access. Transaction memory is easy to use, and simplify concurrent programming without lock or extra concurrent control method.
C++20 implements Transaction Memory, and makes this becomes a standard in C++. I think this is the inevitable trend with rapid development in multi-core programming.
In gcc-7, programmer can use more than three ways to use Transaction Memory, and I will introduce how to write codes by taking advantage of Transaction Memory
Below is a example with using synchronized keyword to achieve synchronized of multi threads.
|
|
The output in screen is like that:
Only one thread can access this variable i at any time.