-
Notifications
You must be signed in to change notification settings - Fork 28
Description
Currently, the SM Agent is entirely stateless, which makes is great for operational purposes. However, it does come with some downsides. One of this downsides is that, upon start, the agent has no way to tell how long ago (if any) a check was last executed. Currently, the agent tackles this problem by assuming it has never happened, and schedules the next execution to happen within 2 minutes of starting time:
synthetic-monitoring-agent/internal/scraper/scraper.go
Lines 239 to 241 in 8b9b10b
| if offset == 0 { | |
| offset = randDuration(min(frequency, maxPublishInterval)) | |
| } |
This has a number of small problems:
- Agents may run checks more often than expected if restarted. This can happen in both public and private probes.
- Agent restarts cause surges in load in public probes
I propose that we add an optional, pluggable interface that will allow the agent to store the last execution time for a check somewhere else, with the primary candidate for somewhere being a key/value database such as Valkey/Redict.
Providing a URI to this kv store will be optional, and if not supplied, the agent will behave as it does now: Assuming each check has never been executed on startup, and scheduling them immediately (with jitter).
Errors connecting to the kv store will be fatal for debuggability reasons. If a check has no last execution time recorded in the kv store, the agent will assume it has never been executed, which means the agent will degrade gracefully if the kv store is wiped or restarted.