Conversation
The alert store workload is read-heavy, so use RWMutex to allow reads and write in parallel. Store usage analysis: provider/mem/mem.go: Reads: - List() x 4 (Subscribe, SlurpAndSubscribe, GetPending, countByState) - Get() x 2 Writes: - Set() x 1 - GC() via gcLoop dispatch/dispatch.go (aggrGroup): Reads: - List() x 2 - Get() x 1 - Empty()x 1 Writes: - Set() x 1 - DeleteIfNotModified()x 1 inhibit/inhibit.go: Reads: - Get() x 2 Writes: - Set() x 1 Signed-off-by: Siavash Safi <siavash@cloudflare.com>
Spaceman1701
left a comment
There was a problem hiding this comment.
My intuition is that this will improve performance, but I'm not sure. RWMutex can be slower when there's a lot of write contention, which could happen in the alert ingestion flow.
Do you have any benchmarks or profiles of this change?
Generated some benchmarks: goos: darwin
goarch: arm64
pkg: github.com/prometheus/alertmanager/store
cpu: Apple M3 Pro
│ mutex.txt │ rwmutex.txt │
│ sec/op │ sec/op vs base │
Get-12 5.865n ± 1% 5.565n ± 1% -5.12% (p=0.001 n=7)
Set-12 102.2n ± 2% 103.6n ± 1% +1.37% (p=0.026 n=7)
List/100_alerts-12 172.0n ± 3% 169.6n ± 4% ~ (p=0.090 n=7)
List/1000_alerts-12 174.2n ± 5% 170.5n ± 5% ~ (p=0.710 n=7)
List/10000_alerts-12 173.1n ± 7% 172.6n ± 5% ~ (p=0.557 n=7)
ConcurrentReads-12 76.64n ± 1% 130.30n ± 40% +70.02% (p=0.001 n=7)
ConcurrentWrites-12 258.9n ± 4% 251.6n ± 10% -2.82% (p=0.026 n=7)
ConcurrentList/100_alerts-12 286.3n ± 1% 146.0n ± 27% -49.00% (p=0.001 n=7)
ConcurrentList/1000_alerts-12 286.0n ± 1% 199.4n ± 28% -30.28% (p=0.001 n=7)
MixedReadWrite/50pct_reads-12 169.6n ± 2% 147.9n ± 3% -12.79% (p=0.001 n=7)
MixedReadWrite/90pct_reads-12 92.64n ± 1% 57.49n ± 2% -37.94% (p=0.001 n=7)
MixedReadWrite/99pct_reads-12 78.78n ± 1% 37.83n ± 11% -51.98% (p=0.001 n=7)
geomean 116.5n 98.97n -15.05%Interestingly concurrent reads are 70% slower on a read only workload. Benchmark code// Benchmarks for comparing Mutex vs RWMutex performance.
func newBenchmarkAlert(i int) *types.Alert {
return &types.Alert{
Alert: model.Alert{
Labels: model.LabelSet{
"alertname": model.LabelValue("test"),
"instance": model.LabelValue(string(rune('a' + i%26))),
},
},
UpdatedAt: time.Now(),
}
}
// BenchmarkGet measures single-threaded Get performance.
func BenchmarkGet(b *testing.B) {
s := NewAlerts()
// Pre-populate with alerts.
for i := range 1000 {
_ = s.Set(newBenchmarkAlert(i))
}
alert := newBenchmarkAlert(500)
fp := alert.Fingerprint()
b.ResetTimer()
for range b.N {
_, _ = s.Get(fp)
}
}
// BenchmarkSet measures single-threaded Set performance.
func BenchmarkSet(b *testing.B) {
s := NewAlerts()
alert := newBenchmarkAlert(0)
b.ResetTimer()
for range b.N {
_ = s.Set(alert)
}
}
// BenchmarkList measures single-threaded List performance.
func BenchmarkList(b *testing.B) {
for _, size := range []int{100, 1000, 10000} {
b.Run(fmt.Sprintf("%d_alerts", size), func(b *testing.B) {
s := NewAlerts()
for i := range size {
_ = s.Set(newBenchmarkAlert(i))
}
b.ResetTimer()
for range b.N {
_ = s.List()
}
})
}
}
// BenchmarkConcurrentReads measures concurrent read performance.
// This is where RWMutex should outperform Mutex.
func BenchmarkConcurrentReads(b *testing.B) {
s := NewAlerts()
for i := range 1000 {
_ = s.Set(newBenchmarkAlert(i))
}
alert := newBenchmarkAlert(500)
fp := alert.Fingerprint()
b.ResetTimer()
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
_, _ = s.Get(fp)
}
})
}
// BenchmarkConcurrentWrites measures concurrent write performance.
func BenchmarkConcurrentWrites(b *testing.B) {
s := NewAlerts()
alerts := make([]*types.Alert, 100)
for i := range alerts {
alerts[i] = newBenchmarkAlert(i)
}
b.ResetTimer()
b.RunParallel(func(pb *testing.PB) {
i := 0
for pb.Next() {
_ = s.Set(alerts[i%len(alerts)])
i++
}
})
}
// BenchmarkConcurrentList measures concurrent List performance.
func BenchmarkConcurrentList(b *testing.B) {
for _, size := range []int{100, 1000} {
b.Run(fmt.Sprintf("%d_alerts", size), func(b *testing.B) {
s := NewAlerts()
for i := range size {
_ = s.Set(newBenchmarkAlert(i))
}
b.ResetTimer()
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
_ = s.List()
}
})
})
}
}
// BenchmarkMixedReadWrite simulates realistic workloads with different read/write ratios.
func BenchmarkMixedReadWrite(b *testing.B) {
for _, readPct := range []int{50, 90, 99} {
b.Run(fmt.Sprintf("%dpct_reads", readPct), func(b *testing.B) {
s := NewAlerts()
for i := range 1000 {
_ = s.Set(newBenchmarkAlert(i))
}
alert := newBenchmarkAlert(500)
fp := alert.Fingerprint()
b.ResetTimer()
b.RunParallel(func(pb *testing.PB) {
i := 0
for pb.Next() {
if i%100 < readPct {
_, _ = s.Get(fp)
} else {
_ = s.Set(alert)
}
i++
}
})
})
}
} |
|
After further analysis, it seems We can switch |
|
That's an interesting result! I guess using As far as how to move forward here, I don't know for sure. My concern is that we could accidentally introduce a performance regression without realizing it. I think we need some "real world" data before we can be confident that this helps more than in hurts. |
|
I can test this in a local lab environment next week, and in production next year with the canary instance and compare performance. |
That'd make me feel a lot more comfortable with the change. Thanks! |
The alert store workload is read-heavy, so use RWMutex to allow reads and write in parallel.
Store usage analysis:
provider/mem/mem.goList()x 4 (Subscribe,SlurpAndSubscribe,GetPending,countByState)Get()x 2Set()x 1GC()viagcLoopdispatch/dispatch.go (aggrGroup)Reads
List()x 2Get()x 1Empty()x 1Writes
Set()x 1DeleteIfNotModified()x 1inhibit/inhibit.goGet()x 2Set()x 1