Skip to content

Commit 97a89fe

Browse files
committed
doc: Update benchmark result
Signed-off-by: LIU Hao <lh_mouse@126.com>
1 parent a525b4a commit 97a89fe

File tree

4 files changed

+15
-59
lines changed

4 files changed

+15
-59
lines changed

README.md

Lines changed: 8 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -10,13 +10,19 @@ to provide synchronization of initialization of local static objects, and by
1010
* **boost**: `boost::mutex`
1111
* **mcf0i**: `_MCF_mutex` without inlining
1212

13-
![hyperfine](doc/hyperfine.png)
14-
1513
> [!WARNING]
1614
> This project uses some undocumented NT system calls and is not guaranteed to
1715
> work on some Windows versions. The author gives no warranty for this project.
1816
> Use it at your own risk.
1917
18+
## Benchmark Result
19+
20+
This is the result of [a benchmark program](doc/mutex_benchmark.c) on Windows
21+
11 Insider Preview (Dev channel, Build 26300.7760) on an Intel i9 14900K
22+
processor:
23+
24+
![result_win11_26300_i9_10900k](doc/result_win11_26300_i9_10900k.png)
25+
2026
## How to Build
2127

2228
Compiling natively can be done in MSYS2. We take the UCRT64 shell as an example.
@@ -51,57 +57,6 @@ ninja test
5157
> `__cxa_finalize(&__dso_handle)` followed by `fflush(NULL)` upon receipt of
5258
> `DLL_PROCESS_DETACH` in your `DllMain()`.
5359
54-
## Benchmarking
55-
56-
* **#THREADS**: number of threads
57-
* **#ITERATIONS**: number of iterations per thread
58-
* **SRWLOCK**: Windows `SRWLOCK`
59-
* **CRITICAL_SECTION**: Windows `CRITICAL_SECTION`
60-
* **WINPTHREAD**: winpthread `pthread_mutex_t`
61-
* **MCFGTHREAD**: mcfgthread `_MCF_mutex` without inlining
62-
63-
These are results of [the test program](doc/mutex_performance.c) on an x86-64
64-
*Windows 10* machine with a 10-core *Intel i9 10900K* processor:
65-
66-
| #THREADS | #ITERATIONS | SRWLOCK | CRITICAL_SECTION | WINPTHREAD | MCFGTHREAD |
67-
|---------:|------------:|--------------:|-----------------:|--------------:|--------------:|
68-
| 1 | 20,000,000 | 1541.035 ms | 1684.556 ms |**1537.788 ms**| 1539.504 ms |
69-
| 2 | 10,000,000 | 1410.687 ms | 1916.520 ms | 2135.853 ms |**1377.103 ms**|
70-
| 4 | 5,000,000 | 2070.238 ms | 4613.832 ms | 2979.166 ms |**1553.278 ms**|
71-
| 6 | 3,000,000 | 2500.003 ms | 5016.650 ms | 3159.182 ms |**1409.130 ms**|
72-
| 10 | 1,500,000 | 2416.953 ms | 6239.123 ms | 3004.653 ms |**1177.269 ms**|
73-
| 20 | 600,000 | 2266.024 ms | 8687.350 ms | 2559.691 ms |**1001.314 ms**|
74-
| 60 | 200,000 |**2831.348 ms**| 10164.012 ms | 3814.880 ms | 3299.509 ms |
75-
| 200 | 60,000 |**2849.850 ms**| 10544.007 ms | 3825.518 ms | 3579.925 ms |
76-
77-
And these are results of the same program on *Wine 6.0.3* on an x86-64
78-
*Ubuntu 22.04* virtual machine with a 16-core *AMD EPYC2* processor:
79-
80-
| #THREADS | #ITERATIONS | SRWLOCK | CRITICAL_SECTION | WINPTHREAD | MCFGTHREAD |
81-
|---------:|------------:|--------------:|-----------------:|--------------:|--------------:|
82-
| 1 | 10,000,000 | 2466.983 ms | 2574.892 ms |**2444.599 ms**| 3167.704 ms |
83-
| 2 | 5,000,000 | 1940.147 ms | **1918.091 ms**| 2078.076 ms | 2213.607 ms |
84-
| 4 | 2,000,000 | 3717.442 ms | 5356.369 ms | 3859.484 ms |**1974.007 ms**|
85-
| 6 | 1,000,000 | 3517.333 ms | 4519.209 ms | 2474.208 ms |**1582.614 ms**|
86-
| 10 | 500,000 | 3105.191 ms | 4706.027 ms | 2388.662 ms |**1363.926 ms**|
87-
| 20 | 200,000 | 2721.077 ms | 4262.151 ms | 1966.195 ms |**1340.997 ms**|
88-
| 60 | 60,000 | 2397.048 ms | 3807.141 ms | 1530.147 ms |**1511.931 ms**|
89-
| 200 | 20,000 | 2632.933 ms | 4148.604 ms |**1615.904 ms**| 1784.553 ms |
90-
91-
And these are results of the same program on an ARM *Windows 11* machine with
92-
an 8-core *Qualcomm Snapdragon 8cx Gen 3* processor, compiled with Clang:
93-
94-
| #THREADS | #ITERATIONS | SRWLOCK | CRITICAL_SECTION | WINPTHREAD | MCFGTHREAD |
95-
|---------:|------------:|--------------:|-----------------:|--------------:|--------------:|
96-
| 1 | 10,000,000 | 2105.027 ms | 2164.209 ms | 2122.998 ms |**2033.915 ms**|
97-
| 2 | 5,000,000 | 1701.007 ms | 1620.484 ms | 1547.963 ms |**1496.309 ms**|
98-
| 4 | 2,000,000 |**1395.439 ms**| 3067.075 ms | 2583.215 ms | 1525.453 ms |
99-
| 6 | 1,000,000 |**1181.352 ms**| 4334.280 ms | 2167.916 ms | 1354.046 ms |
100-
| 10 | 500,000 | 2738.153 ms | 2799.624 ms |**2687.904 ms**| 2739.022 ms |
101-
| 20 | 100,000 | 3259.999 ms | **3220.732 ms**| 3287.581 ms | 3291.146 ms |
102-
| 60 | 30,000 | 2931.157 ms | 2934.896 ms | 2938.784 ms |**2922.015 ms**|
103-
| 200 | 10,000 |**3197.414 ms**| 3216.323 ms | 3221.090 ms | 3229.249 ms |
104-
10560
## Implementation details
10661

10762
### The condition variable

doc/hyperfine.png

-64.9 KB
Binary file not shown.
Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -77,13 +77,13 @@ thread_proc(void* arg)
7777
int
7878
main(void)
7979
{
80-
start = CreateEventW(NULL, TRUE, FALSE, NULL);
81-
assert(start);
80+
SetPriorityClass(GetCurrentProcess(), HIGH_PRIORITY_CLASS);
8281

8382
my_init(&mutex);
84-
fprintf(stderr, "using `%s`:\n # of threads = %d\n # of iterations = %d\n",
85-
_CRT_STRINGIZE(my_mutex_t), NTHRD, NITER);
83+
start = CreateEventW(NULL, TRUE, FALSE, NULL);
84+
assert(start);
8685

86+
fprintf(stderr, "running %d threads with %s\n", NTHRD, _CRT_STRINGIZE(my_mutex_t));
8787
for(intptr_t k = 0; k < NTHRD; ++k) {
8888
threads[k] = CreateThread(NULL, 0, thread_proc, NULL, 0, NULL);
8989
assert(threads[k]);
@@ -100,6 +100,7 @@ main(void)
100100

101101
QueryPerformanceCounter(&t1);
102102
QueryPerformanceFrequency(&tf);
103-
fprintf(stderr, "total time:\n %.3f milliseconds\n",
104-
(double) (t1.QuadPart - t0.QuadPart) * 1000 / tf.QuadPart);
103+
double result = (double) (t1.QuadPart - t0.QuadPart) * 1.0e9 / tf.QuadPart / NITER;
104+
fprintf(stderr, "result: %.3f ns / iteration\n", result);
105+
printf("%.3f\n", result);
105106
}
37.7 KB
Loading

0 commit comments

Comments
 (0)