-
Notifications
You must be signed in to change notification settings - Fork 1.4k
[hist] Ensure consistency of RBinWithError::AtomicAdd
#21105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Only have a single (private) function that invokes lower-level atomic operations on the two members fSum and fSum2.
Instead of using separate atomic operations on the two members, use the sign bit of fSum2 to implement a cheap spin lock. This allows another thread to detect inconsistent reads and implement a retry loop. According to the microbenchmarks, this is even faster with a single thread / for the uncontended case. A possible explanation is that the code needs only one compare-exchange operation instead of two. With multiple threads and heavy contention, the implementation is slightly slower. An alternative would be double-width compare-and-swap instructions, as found on x86 (cmpxchg16b). Setting aside the portability issues, GCC generates a library call to __atomic_compare_exchange_16 which is slower than two separate compare-exchange instructions. If a direct instruction is emitted (for example with Clang), it beats the separate compare-exchange instructions in the uncontended scenario and is tied in case of contention.
|
|
||
| void AtomicInc() | ||
| private: | ||
| void AtomicAdd(double a, double a2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| void AtomicAdd(double a, double a2) | |
| void SumAtomicAdd(double a, double a2) |
As this is specifically to atomic at the sum/sum2.
or
| void AtomicAdd(double a, double a2) | |
| void AtomicAddToSum(double a, double a2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I would prefer to keep the current naming scheme: User-facing functions, such as FillAtomic, are implemented in terms of increasingly lower-level functions AtomicAdd. Given that it's a private function that is used to implemented other AtomicAdd functions just below, I think the current name is sufficient.
pcanal
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Test Results 22 files 22 suites 3d 12h 49m 9s ⏱️ Results for commit f639911. |
hageboeck
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM in general, but for the future, should one think about preventing undesired modifications of the read-only arguments?
Instead of using separate atomic operations on the two members, use the sign bit of
fSum2to implement a cheap spin lock. This allows another thread to detect inconsistent reads and implement a retry loop.According to the microbenchmarks, this is even faster with a single thread / for the uncontended case. A possible explanation is that the code needs only one compare-exchange operation instead of two. With multiple threads and heavy contention, the implementation is slightly slower.
An alternative would be double-width compare-and-swap instructions, as found on x86 (
cmpxchg16b). Setting aside the portability issues, GCC generates a library call to__atomic_compare_exchange_16which is slower than two separate compare-exchange instructions. If a direct instruction is emitted (for example with Clang), it beats the separate compare-exchange instructions in the uncontended scenario and is tied in case of contention.