Batching the probes to spread the workload and more exact time readings#15
Batching the probes to spread the workload and more exact time readings#15fmotrifork wants to merge 2 commits intoLesterpig:masterfrom
Conversation
Lesterpig
left a comment
There was a problem hiding this comment.
Thanks for this pull request!
It actually changes a lot of things in the project (Dockerfile, Makefile, logging, prometheus...).
I would only consider changing the manager.go and probe/http.go files in this PR.
If needed, please open new pull requests for the proposed changes.
I also have put a few comments in the modified files.
If you have some time, could you please update your pull request? Otherwise I can do it for you 😃
| for category, services := range manager.Services { | ||
| for _, service := range services { | ||
| // Batching the probes to spread the workload | ||
| if math.Mod(i, 8) == 0 { |
There was a problem hiding this comment.
Question: is there a reason to use a float for i?
We could use integers and check with if i % 8 == 0.
Moreover, the i = 0 line might not be necessary (or the condition can just be i >= 8).
| TLSClientConfig: &tls.Config{InsecureSkipVerify: !opts.VerifyCertificate}, | ||
| } | ||
| tr := http.DefaultTransport.(*http.Transport).Clone() | ||
| tr.TLSClientConfig = &tls.Config{InsecureSkipVerify: true} |
There was a problem hiding this comment.
It looks like the opts.VerifyCertificate option has been forgotten?
When having many probes on a single board instance, the current implementation results in probes waiting for each other and not for the actual services.
This results in many probes that are marked as "slow" even though the problem stems from to many open probes at once.
This PR batches the probes in 8 at a time.
It also splits the http probe time into two:
This allows us to better understand if a slow reading is from infrastructure / loadbalancers or from a slow http service.