cnf ran ptp: Migrate PTP log-reduction test to eco-gotests#1170
cnf ran ptp: Migrate PTP log-reduction test to eco-gotests#1170hhassid wants to merge 1 commit intorh-ecosystem-edge:mainfrom
Conversation
📝 WalkthroughWalkthroughThis PR introduces a new test suite for PTP log reduction behavior, adding a label constant and comprehensive test cases that validate log reduction with thresholds, intervals, and enhanced options. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.5.0)Command failed Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
f82b6f0 to
6c2976f
Compare
| // validateLogWithInterval validates that the log reduction is working correctly with the specified interval. | ||
| // It waits for a 1-minute window, then counts log entries matching the pattern. The expected count is | ||
| // (windowDuration/interval * numberOfProcesses), with adaptive tolerance to account for timing variations. | ||
| // For short intervals (≤5s), uses higher tolerance as they're more sensitive to timing variations. | ||
| func validateLogWithInterval(client *clients.Settings, nodeName string, interval time.Duration, numberOfProcesses int) { |
There was a problem hiding this comment.
While testing this function I encounter issue when the interval was short (under 5sec).
The last version of this function was not tested yet.
6c2976f to
5559577
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@tests/cnf/ran/ptp/tests/ptp-log-reduction.go`:
- Around line 239-240: The comment is correct: interval :=
time.Duration(GinkgoRandomSeed()%30+1) * time.Second produces 1–30s not 1–31s;
update the random range expression used to compute interval (the line defining
interval using GinkgoRandomSeed()) so it yields numbers 1 through 31 (e.g.,
change the modulus to 31 or use an equivalent method such as rand.Intn(31)+1) to
match the stated "between 1 and 31 seconds".
- Around line 318-352: In validateLogWithInterval, the computed lower bound for
Expect can be zero or negative when numberOfProcesses==1, allowing zero matches
to pass; compute cycles := int(windowDuration/interval), then compute lower :=
cycles*(numberOfProcesses-1)-numberOfProcesses and set lower = max(1, lower)
before using it in the Expect range (and keep the existing upper bound logic);
also implement the doc-comment note by adding special handling when interval <=
5*time.Second (e.g., increase tolerance by reducing the lower bound slightly but
never below 1) so short intervals get a looser but still meaningful lower bound;
update references to windowDuration, interval, numberOfProcesses, and
validateLogWithInterval accordingly.
| // Use a random interval between 1 and 31 seconds | ||
| interval := time.Duration(GinkgoRandomSeed()%30+1) * time.Second |
There was a problem hiding this comment.
Comment says "between 1 and 31 seconds" but the code produces 1–30.
GinkgoRandomSeed()%30+1 yields values in [1, 30], not [1, 31].
Proposed fix
- // Use a random interval between 1 and 31 seconds
- interval := time.Duration(GinkgoRandomSeed()%30+1) * time.Second
+ // Use a random interval between 1 and 30 seconds
+ interval := time.Duration(GinkgoRandomSeed()%30+1) * time.Second📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Use a random interval between 1 and 31 seconds | |
| interval := time.Duration(GinkgoRandomSeed()%30+1) * time.Second | |
| // Use a random interval between 1 and 30 seconds | |
| interval := time.Duration(GinkgoRandomSeed()%30+1) * time.Second |
🤖 Prompt for AI Agents
In `@tests/cnf/ran/ptp/tests/ptp-log-reduction.go` around lines 239 - 240, The
comment is correct: interval := time.Duration(GinkgoRandomSeed()%30+1) *
time.Second produces 1–30s not 1–31s; update the random range expression used to
compute interval (the line defining interval using GinkgoRandomSeed()) so it
yields numbers 1 through 31 (e.g., change the modulus to 31 or use an equivalent
method such as rand.Intn(31)+1) to match the stated "between 1 and 31 seconds".
| // validateLogWithInterval validates that the log reduction is working correctly with the specified interval. | ||
| // It waits for a 1-minute window, then counts log entries matching the pattern. The expected count is | ||
| // (windowDuration/interval * numberOfProcesses), with adaptive tolerance to account for timing variations. | ||
| // For short intervals (≤5s), uses higher tolerance as they're more sensitive to timing variations. | ||
| func validateLogWithInterval(client *clients.Settings, nodeName string, interval time.Duration, numberOfProcesses int) { | ||
| windowDuration := 1 * time.Minute | ||
|
|
||
| By(fmt.Sprintf("getting the PTP daemon pod for node %s", nodeName)) | ||
| ptpDaemonPod, err := ptpdaemon.GetPtpDaemonPodOnNode(client, nodeName) | ||
| Expect(err).ToNot(HaveOccurred(), "Failed to get PTP daemon pod for node %s", nodeName) | ||
|
|
||
| startTime := time.Now() | ||
| time.Sleep(time.Until(startTime.Add(windowDuration))) // wait until end of window | ||
|
|
||
| By(fmt.Sprintf("getting logs for node %s since %v", nodeName, startTime)) | ||
| logs, err := ptpDaemonPod.GetLogsWithOptions(&corev1.PodLogOptions{ | ||
| SinceTime: &metav1.Time{Time: startTime}, | ||
| Container: ranparam.PtpContainerName, | ||
| }) | ||
| Expect(err).ToNot(HaveOccurred(), "Failed to get logs") | ||
|
|
||
| pattern := `(phc|master)\soffset\ssummary:\scnt=[0-9]+,\smin=-?[0-9]+,\smax=-?[0-9]+,\savg=-?` + | ||
| `[0-9]+.?[0-9]*,\sSD=-?[0-9]+.?[0-9]*` | ||
|
|
||
| regexLog, err := regexp.Compile(pattern) | ||
| Expect(err).ToNot(HaveOccurred(), "Failed to compile regex") | ||
|
|
||
| numberOfLogs := len(regexLog.FindAllString(string(logs), -1)) | ||
|
|
||
| klog.V(tsparams.LogLevel).Infof("Found %d log entries matching pattern for node %s", numberOfLogs, nodeName) | ||
|
|
||
| Expect(numberOfLogs).To(And( | ||
| BeNumerically(">=", int(windowDuration/interval)*(numberOfProcesses-1)-numberOfProcesses), | ||
| BeNumerically("<", int(windowDuration/interval)*(numberOfProcesses+1)+numberOfProcesses)), | ||
| "Log count should be within expected range for interval %s", interval) |
There was a problem hiding this comment.
Tolerance bounds can make the lower bound zero or negative when numberOfProcesses is 1, allowing the test to pass with no matching logs.
When numberOfProcesses == 1:
- Lower bound =
cycles * (1-1) - 1 = -1→ effectively 0 - This means the test succeeds even if zero log entries are found, defeating the purpose of validating that logs appear at the configured interval.
Also, the doc comment mentions special handling for short intervals (≤5s) with "higher tolerance," but the formula is actually uniform — there's no conditional logic for short intervals.
Consider either:
- Adding an explicit minimum lower bound (e.g.,
max(1, lowerBound)), or - Adjusting the formula so the lower bound stays meaningful regardless of process count.
Possible fix sketch
+ expectedCount := int(windowDuration/interval) * numberOfProcesses
+ tolerance := numberOfProcesses + 1 // or a percentage-based tolerance
+ lowerBound := expectedCount - tolerance
+ if lowerBound < 1 {
+ lowerBound = 1
+ }
+ upperBound := expectedCount + tolerance
+
Expect(numberOfLogs).To(And(
- BeNumerically(">=", int(windowDuration/interval)*(numberOfProcesses-1)-numberOfProcesses),
- BeNumerically("<", int(windowDuration/interval)*(numberOfProcesses+1)+numberOfProcesses)),
+ BeNumerically(">=", lowerBound),
+ BeNumerically("<", upperBound)),
"Log count should be within expected range for interval %s", interval)🤖 Prompt for AI Agents
In `@tests/cnf/ran/ptp/tests/ptp-log-reduction.go` around lines 318 - 352, In
validateLogWithInterval, the computed lower bound for Expect can be zero or
negative when numberOfProcesses==1, allowing zero matches to pass; compute
cycles := int(windowDuration/interval), then compute lower :=
cycles*(numberOfProcesses-1)-numberOfProcesses and set lower = max(1, lower)
before using it in the Expect range (and keep the existing upper bound logic);
also implement the doc-comment note by adding special handling when interval <=
5*time.Second (e.g., increase tolerance by reducing the lower bound slightly but
never below 1) so short intervals get a looser but still meaningful lower bound;
update references to windowDuration, interval, numberOfProcesses, and
validateLogWithInterval accordingly.
klaskosk
left a comment
There was a problem hiding this comment.
Added a few comments, most are minor changes. There's a couple potential improvements to the tests we could make now that we're not rushing for GA, but feel free to disagree if they seem unnecessary
| By("checking node info map") | ||
| nodeInfoMap, err = profiles.GetNodeInfoMap(RANConfig.Spoke1APIClient) | ||
| Expect(err).ToNot(HaveOccurred(), "Failed to get node info map") | ||
|
|
||
| var gmCount uint | ||
| for _, nodeInfo := range nodeInfoMap { | ||
| gmCount += nodeInfo.Counts[profiles.ProfileTypeGM] | ||
| gmCount += nodeInfo.Counts[profiles.ProfileTypeMultiNICGM] | ||
| } | ||
|
|
||
| if gmCount > 0 { | ||
| Skip("Test does not support Grandmaster configurations") | ||
| } |
There was a problem hiding this comment.
it's a little bit of duplication, but let's move the profile check inside the node loop of each test case. That way we can support scenarios where 1 node may have a GM profile but the other OC
Here's an example, at some point I'd like to be able to create a helper since it's such a common pattern
| }) | ||
|
|
||
| AfterEach(func() { | ||
| if CurrentSpecReport().State.String() == "skipped" { |
There was a problem hiding this comment.
| if CurrentSpecReport().State.String() == "skipped" { | |
| if CurrentSpecReport().State == types.SpecStateSkipped { |
using github.com/onsi/ginkgo/v2/types, it's good to re-use constants if possible
| err = metrics.EnsureClocksAreLocked(prometheusAPI) | ||
| Expect(err).ToNot(HaveOccurred(), "Failed to assert clock state is locked") | ||
|
|
||
| testRanAtLeastOnce = false |
There was a problem hiding this comment.
it's pretty minor, but I have a slight preference towards keeping this var inside the test. If we're not using the variable in between BeforeEach/AfterEach and the It blocks, it's better to keep it scoped to the individual tests
| "logReduce to \"enhanced\"", reportxml.ID("83456"), func() { | ||
| // master offset summary log pattern for 'enhanced' logReduction option | ||
| // An example of a log with the pattern: master offset summary: cnt=2, min=-1, max=1, avg=0.00, SD=1.41 | ||
| pattern := `master\soffset\ssummary:\scnt=[0-9]+,\smin=-?[0-9]+,\smax=-?[0-9]+,\savg=-?` + |
There was a problem hiding this comment.
since the pattern is a constant, please create a global variable in the file and use regexp.MustCompile. It's technically faster, but more importantly, it'll get caught by ginkgo dry-run if there's an issue. That way you don't have to run the test to ensure it can be compiled
same goes for the other regexp if they're constant. If you need to change them in the test, this pattern works fine then
rereading the old test since I remembered this coming up, I forgot I made the matcher accept the compiled version in eco-gotests to allow you to use MustCompile 🙂
| }) | ||
|
|
||
| // setLogReduceValForNode sets the logReduce value for all PTP profiles in all PTP configs. | ||
| // It updates all profiles across all configs (not just those applying to the specified node). |
There was a problem hiding this comment.
I thought about this a little since I think only considering the ones on a single node would be ideal, but I'm pretty sure the complexity isn't worth it. As long as we're covering at least the profiles on the node, it's fine
all that to say I think the current system is good, no changes needed
| updated = true | ||
| } | ||
|
|
||
| if updated { |
There was a problem hiding this comment.
we need to update the startTime in this if statement to ensure that we're waiting for the first profile load message after all the configs have been updated
| // After updating configs, it waits for the profile to reload on the specified node. If no | ||
| // configs are updated (all already have the desired value), the wait is skipped. | ||
| func setLogReduceValForNode(client *clients.Settings, nodeName string, newLogReduceVal string) error { | ||
| startTime := time.Now() |
There was a problem hiding this comment.
the "fancy" solution here could be using var startTime time.Time and then using startTime.IsZero() instead of configsUpdated, since you'd set the startTime when a config is updated
optional though, using a separate variable works too. the startTime needs updated either way
| Expect(err).ToNot(HaveOccurred(), "Failed to convert number of processes to int") | ||
|
|
||
| // Use a random interval between 1 and 31 seconds | ||
| interval := time.Duration(GinkgoRandomSeed()%30+1) * time.Second |
There was a problem hiding this comment.
I'd be fine with making the interval constant to simplify the validation here. Like a 10 second interval and 1 minute window we just need to check that 6*proccesses <= log count <= 7*processes
this would also address the issue that coderabbit has with the check potentially accepting 0 occurrences
| By(fmt.Sprintf("getting the PTP daemon pod for node %s", nodeName)) | ||
| ptpDaemonPod, err := ptpdaemon.GetPtpDaemonPodOnNode(RANConfig.Spoke1APIClient, nodeName) | ||
| Expect(err).ToNot(HaveOccurred(), "Failed to get PTP daemon pod for node %s", nodeName) | ||
|
|
||
| By(fmt.Sprintf("getting number of ptp4l and phc2sys processes on node %s", nodeName)) | ||
| cmd := []string{"/bin/bash", "-c", "ps -C ptp4l -C phc2sys -o pid= | wc -l"} | ||
| numberOfProcessesBytes, err := ptpDaemonPod.ExecCommand(cmd, ranparam.PtpContainerName) | ||
| Expect(err).ToNot(HaveOccurred(), "Failed to execute command on pod") |
There was a problem hiding this comment.
we should use ptpdaemon.ExecuteCommandInPtpDaemonPod since it provides options for retries and handles finding the pod automatically
| // Wait for PTP processes to stabilize with new configuration | ||
| time.Sleep(10 * time.Second) |
There was a problem hiding this comment.
I'm curious if there's a better way to wait for the processes to stabilize. It may not be worth the complexity, but it'd be nice if we could wait on a condition rather than a simple sleep
This change is part of the ongoing test migration to eco-gotests, specifically covering the PTP log-reduction flow and related updates for the new framework integration.
Tracking issue: CNF-20474.
Summary by CodeRabbit