Atomic, SpinLock, Mutex, Thread_Lock, Which Is The Fastest One?

When we write a concurrent program, we have various methods to guarantee our program is linearizable and serializable, which means different threads will write or read a area of same memory or variable sequentially. But the efficiency of these methods is very different. Next I will introduce four methods: atomic, spin lock, mutex and thread lock, and compare these four methods to analysis which one is the fastest one (maybe is not the best one).

I use time ./exec to estimate the running time of programs.

Atomic

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
#include <future>
#include <iostream>
#include <thread>
#include <atomic>
using namespace std;
atomic<int> counter(0);
int thread_num = 2;
const int limit_iter = 10000000;
void f() {
for (int i = 0; i < limit_iter; i ++ ) {
counter ++;
}
}
int main() {
thread a[thread_num];
for (int i = 0; i < thread_num; i ++ ) {
a[i] = thread(f);
}
for (int i = 0; i < thread_num; i ++ ) {
a[i].join();
}
cout << counter << endl;
}

Spin Lock

I use c++ atomic variable to implement a spin lock likely structure, because C++11 doesn’t provide a implementation of spin lock.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
#include <future>
#include <iostream>
#include <thread>
#include <atomic>
using namespace std;
atomic_flag locked = ATOMIC_FLAG_INIT;
inline void lock() {
locked.test_and_set(std::memory_order_acquire);
}
inline void unlock() {
locked.clear(memory_order_acquire);
}
int counter(0);
int thread_num = 2;
const int limit_iter = 10000000;
void f() {
for (int i = 0; i < limit_iter; i ++ ) {
lock();
counter ++;
unlock();
}
}
int main() {
thread a[thread_num];
for (int i = 0; i < thread_num; i ++ ) {
a[i] = thread(f);
}
for (int i = 0; i < thread_num; i ++ ) {
a[i].join();
}
cout << counter << endl;
}

Mutex

I use c++11 mutex, not the mutex in unix API, to doing test.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
#include <future>
#include <iostream>
#include <thread>
#include <atomic>
using namespace std;
int counter(0);
mutex mu;
int thread_num = 2;
const int limit_iter = 10000000;
void f() {
for (int i = 0; i < limit_iter; i ++ ) {
// lock_guard<mutex> lock(mu); // Or this line can provide the same effect
mu.lock();
counter ++;
mu.unlock();
}
}
int main() {
thread a[thread_num];
for (int i = 0; i < thread_num; i ++ ) {
a[i] = thread(f);
}
for (int i = 0; i < thread_num; i ++ ) {
a[i].join();
}
cout << counter << endl;
}

thread lock

There is not existing a thread lock method in C++11, so I use pthread_mutex_lock and pthread_mutex_unlock in pthread lib to instead. But thread library in C++11 is based on pthread, so I think this is a equally and right testing.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
#include <future>
#include <iostream>
#include <thread>
#include <atomic>
#include <pthread.h>
#include <unistd.h>
using namespace std;
atomic<int> counter(0);
int thread_num = 2;
const int limit_iter = 10000000;
pthread_mutex_t locked;
void f() {
for (int i = 0; i < limit_iter; i ++ ) {
pthread_mutex_lock(&locked);
counter ++;
pthread_mutex_unlock(&locked);
}
}
int main() {
pthread_mutex_init(&locked, NULL);
thread a[thread_num];
for (int i = 0; i < thread_num; i ++ ) {
a[i] = thread(f);
}
for (int i = 0; i < thread_num; i ++ ) {
a[i].join();
}
cout << counter << endl;
}

evaluation

I use two threads in one process, and every thread do 10M times of increment operations.

methods time
without any lock 0.24s
atomic 1.08s
spin lock 2.90s
mutex 5.82s
thread lock 6.81s
  • In a nutshell, making variable becoming atomic is the fastest one.

  • This is the link of all codes.