-
Notifications
You must be signed in to change notification settings - Fork 4.3k
Introduce synctest-based framework for deterministic testing #9099
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Welcome @GaetanoMar96! |
|
Hi @GaetanoMar96. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
35ac017 to
eb540b2
Compare
| var nodeInfoComparator nodegroupset.NodeInfoComparator | ||
| if len(autoscalingOptions.BalancingLabels) > 0 { | ||
| nodeInfoComparator = nodegroupset.CreateLabelNodeInfoComparator(autoscalingOptions.BalancingLabels) | ||
| } else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we pull it outside, to main?
I just noticed that the core package calls the NewCloudProvider() -_-
My proposal of target state:
- move that branch outside, to main
- move initialization from core to main as well:
opts.CloudProvider = cloudBuilder.NewCloudProvider(opts, informerFactory) - in the core repo (and in the builder) just error out if CloudProvider is nil
- split the cloudprovider/, the interface should be clearly separated from provider-specific stuff (including cloudprovider/builder)
- Add some form of dependency enforcement, e.g. unit test or play with dependency visibility: at minimum, the core repo cannot have dependency on any of the cloud-specific providers.
I think 4 and 5 would require some larger discussion, but in this CL I think 1-3 are achievable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cloud provider depends on informer factory and core autoscaler options, which are initialized only later. If we move all this code in main then in test config we should handle all of them injecting them at build initialization. Are we sure we want to do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right... that might be a bit unsafe, since we don't know 100% which options does the NewCloudProvider really depend on.
Happy case: we can just move it out and see if any of the tests fail. But I'm not too confident that we have sufficient test coverage to detect all issues.
Another solution would be to leave it as is. IMO in this case we should at least move this initialization out of the core package, e.g. to the builder package. And ideally add a comment in the code that a target state is initialization in main.go, but it wasn't done because we don't have ability to test it thoroughly.
I'd like to get some feedback from other CA maintainers @mtrqq @towca
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And regarding the nodeInfoComparator initialization, I think it's better to leave it in main.go for now, since the code is different depending on cloud provider (also it's in the spirit of the TODO below to eventually migrate it somewhere in the CloudProvider package).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Migrated the nodeInfoComparator. I will leave the conversation unresolved for now waiting from CA maintainers feedback
cluster-autoscaler/builder/bigunittest/staticautoscaler_test.go
Outdated
Show resolved
Hide resolved
cluster-autoscaler/builder/bigunittest/staticautoscaler_test.go
Outdated
Show resolved
Hide resolved
cluster-autoscaler/cloudprovider/builder/cloud_provider_builder.go
Outdated
Show resolved
Hide resolved
cluster-autoscaler/test/integration/inmemory/staticautoscaler_test.go
Outdated
Show resolved
Hide resolved
5180f1d to
58e27de
Compare
58e27de to
5233df5
Compare
kawych
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added couple more comments, but they're increasingly minor. The biggest one is what we already agreed on offline. Thanks for the great work!
cluster-autoscaler/test/integration/inmemory/staticautoscaler_test.go
Outdated
Show resolved
Hide resolved
cluster-autoscaler/test/integration/inmemory/staticautoscaler_test.go
Outdated
Show resolved
Hide resolved
|
|
||
| c.Autoscaler, _, err = autoscalerbuilder.New(*c.Options). | ||
| WithKubeClient(c.Fakes.K8s.Client). | ||
| WithInformerFactory(c.Fakes.K8s.InformerFactory). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to initialize that inside the builder? Or does the additional option defined in main.go make it harder?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not entirely sure about this question, can you make it a bit more clear?
5233df5 to
d529731
Compare
mtrqq
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes, it will greatly simplify E2E testing of some autoscaler features while not going into expensive E2E tests territory
I like the direction that you've picked, having a set of reusable components without locking outselves in a specific framework or even synctest. It is good choice to keep direct interaction with synctest puts you in the right mindset when writing the test while also not forcing new tests to pick between framework and manual labour or mocking all the things around
Another data point to consider - while synctest is a great tool to verify correctness of the autoscaler, it's not optimal to use it with benchmarks, see testing.B reference in golang/go#73567
| WithKubeClient(kubeClient). | ||
| WithInformerFactory(fakeK8s.InformerFactory). | ||
| WithCloudProvider(fakeCloud). | ||
| WithPodObserver(&loop.UnschedulablePodObserver{}). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it something which we may default in the builder and add an override if needed? So far I don't see observer value being changed in the call sites
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is what we are actually doing, see this code in the builder
if b.podObserver == nil { b.podObserver = loop.StartPodObserver(ctx, b.kubeClient) }
| unneededTime = 1 * time.Minute | ||
| ) | ||
|
|
||
| func TestStaticAutoscaler_FullLifecycle(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add add some baseline integration test which devs can refer to when adding new ones. The main point of which would obviously be to provide a skeleton for such a test, but also it may serve as a decent source of documentation for using synctest.
For me the current most lacking point is to outline a boundary between what autoscaler components can be configured in the test bauble and what can't or are not supposed to. Adding this information may steer new tests into the correct direction straight away resulting in a lot of time saved when debugging failures
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added it in the last commit
|
i think this is a great contribution. i am still reviewing but i like what i see so far. |
|
/ok-to-test |
d529731 to
919781f
Compare
919781f to
5c82692
Compare
5c82692 to
ae452ea
Compare
elmiko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is making sense to me, no specific request for changes at the moment.
kawych
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| } | ||
|
|
||
| // MustCreateManager creates a controller-runtime manager with metrics and health probes disabled. | ||
| func MustCreateManager(t *testing.T) manager.Manager { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "Manager" is one of the most generic and unhelpful terms, can you add some substance to the naming here?
NewControllerRuntimeMgr? Or even NewControllerRuntime if that doesn't collide with anything?
|
|
||
| // Config is the "blueprint" for a test. It defines the entire | ||
| // initial state of the world before the test runs. | ||
| type Config struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: consider OptionsBuilder
| // Initialize fakes and configuration options. | ||
| // This happens outside the synctest bubble to keep the setup clean. | ||
| config := integration.NewConfig().WithOverrides() // override CA options if needed | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uber-nit: consider removing excessive whitespaces
| unneededTime = 1 * time.Minute | ||
| ) | ||
|
|
||
| func TestStaticAutoscaler_Template(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: consider adding a top-level comment explaining the purpose of this test.
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: GaetanoMar96, kawych The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
What type of PR is this?
/kind feature
What this PR does / why we need it:
This PR introduces a new testing framework for Cluster Autoscaler based on synctest.
The new framework utilizes a "bubble" environment and virtual time to:
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
The PR is structured into three logical commits to provide a clear path from architecture to implementation:
Refactor for Dependency Injection (Builder Pattern): I introduced a Builder Pattern in the main initialization logic. This allows us to inject fakes (KubeClient, CloudProvider, PodObservers) during setup. This decoupling is essential for moving away from heavyweight integration tests toward lightweight, controlled simulations.
Lifecycle & Leak Management: To support synctest's requirement for a clean "bubble" exit, I refactored internal components to properly accept and propagate context.Context. This ensures that background goroutines observe shutdown signals immediately, preventing the "durable sleep" leaks common in components that previously relied on static timers.
Framework & Lifecycle Tests: This commit introduces the TestContext and TestBuilder. It includes a full-lifecycle regression test (Scale-up -> Stabilization -> Scale-down) that demonstrates the framework's ability to orchestrate complex state transitions deterministically using the encapsulated state of our fake K8s and Cloud Provider implementations.
Usage: See staticautoscaler_test.go for an example of the new RunTest and ctx.Step pattern.
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: