Skip to content

cnf ran ptp: Migrate PTP log-reduction test to eco-gotests#1170

Open
hhassid wants to merge 1 commit intorh-ecosystem-edge:mainfrom
hhassid:log_reduction
Open

cnf ran ptp: Migrate PTP log-reduction test to eco-gotests#1170
hhassid wants to merge 1 commit intorh-ecosystem-edge:mainfrom
hhassid:log_reduction

Conversation

@hhassid
Copy link
Collaborator

@hhassid hhassid commented Feb 11, 2026

This change is part of the ongoing test migration to eco-gotests, specifically covering the PTP log-reduction flow and related updates for the new framework integration.
Tracking issue: CNF-20474.

Summary by CodeRabbit

  • Tests
    • Added comprehensive test suite for PTP log reduction functionality, including validation of log threshold summaries, interval-based logging patterns, and enhanced logging options with configurable thresholds and intervals.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 11, 2026

📝 Walkthrough

Walkthrough

This PR introduces a new test suite for PTP log reduction behavior, adding a label constant and comprehensive test cases that validate log reduction with thresholds, intervals, and enhanced options.

Changes

Cohort / File(s) Summary
Constants
tests/cnf/ran/ptp/internal/tsparams/consts.go
Adds new public string constant LabelLogReduction with value "log-reduction" for labeling test cases.
New Test Suite
tests/cnf/ran/ptp/tests/ptp-log-reduction.go
Introduces comprehensive test suite for PTP log reduction functionality with three main test cases validating threshold-based logging, interval-based logging, and enhanced option behaviors. Includes helper functions for updating log reduction settings and validating log output intervals.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • PR #940: Adds additional exported label constants to the same tsparams const block, indicating related PTP test infrastructure changes.
  • PR #941: Provides PtpConfig save/restore and pod log-waiting helper utilities that the new log reduction tests depend on and extend.

Suggested reviewers

  • klaskosk
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main change: migrating a PTP log-reduction test to the eco-gotests framework, which is demonstrated by the addition of a new test file and supporting constants.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.5.0)

Command failed


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@hhassid hhassid force-pushed the log_reduction branch 10 times, most recently from f82b6f0 to 6c2976f Compare February 11, 2026 16:25
@hhassid hhassid requested a review from klaskosk February 11, 2026 16:27
@hhassid hhassid marked this pull request as ready for review February 11, 2026 16:28
@hhassid hhassid self-assigned this Feb 11, 2026
@hhassid
Copy link
Collaborator Author

hhassid commented Feb 11, 2026

@klaskosk @dpopsuev please take a look

Comment on lines +321 to +325
// validateLogWithInterval validates that the log reduction is working correctly with the specified interval.
// It waits for a 1-minute window, then counts log entries matching the pattern. The expected count is
// (windowDuration/interval * numberOfProcesses), with adaptive tolerance to account for timing variations.
// For short intervals (≤5s), uses higher tolerance as they're more sensitive to timing variations.
func validateLogWithInterval(client *clients.Settings, nodeName string, interval time.Duration, numberOfProcesses int) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While testing this function I encounter issue when the interval was short (under 5sec).
The last version of this function was not tested yet.

@hhassid hhassid requested a review from Bonnie-Block February 11, 2026 16:37
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@tests/cnf/ran/ptp/tests/ptp-log-reduction.go`:
- Around line 239-240: The comment is correct: interval :=
time.Duration(GinkgoRandomSeed()%30+1) * time.Second produces 1–30s not 1–31s;
update the random range expression used to compute interval (the line defining
interval using GinkgoRandomSeed()) so it yields numbers 1 through 31 (e.g.,
change the modulus to 31 or use an equivalent method such as rand.Intn(31)+1) to
match the stated "between 1 and 31 seconds".
- Around line 318-352: In validateLogWithInterval, the computed lower bound for
Expect can be zero or negative when numberOfProcesses==1, allowing zero matches
to pass; compute cycles := int(windowDuration/interval), then compute lower :=
cycles*(numberOfProcesses-1)-numberOfProcesses and set lower = max(1, lower)
before using it in the Expect range (and keep the existing upper bound logic);
also implement the doc-comment note by adding special handling when interval <=
5*time.Second (e.g., increase tolerance by reducing the lower bound slightly but
never below 1) so short intervals get a looser but still meaningful lower bound;
update references to windowDuration, interval, numberOfProcesses, and
validateLogWithInterval accordingly.

Comment on lines +239 to +240
// Use a random interval between 1 and 31 seconds
interval := time.Duration(GinkgoRandomSeed()%30+1) * time.Second
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Comment says "between 1 and 31 seconds" but the code produces 1–30.

GinkgoRandomSeed()%30+1 yields values in [1, 30], not [1, 31].

Proposed fix
-		// Use a random interval between 1 and 31 seconds
-		interval := time.Duration(GinkgoRandomSeed()%30+1) * time.Second
+		// Use a random interval between 1 and 30 seconds
+		interval := time.Duration(GinkgoRandomSeed()%30+1) * time.Second
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Use a random interval between 1 and 31 seconds
interval := time.Duration(GinkgoRandomSeed()%30+1) * time.Second
// Use a random interval between 1 and 30 seconds
interval := time.Duration(GinkgoRandomSeed()%30+1) * time.Second
🤖 Prompt for AI Agents
In `@tests/cnf/ran/ptp/tests/ptp-log-reduction.go` around lines 239 - 240, The
comment is correct: interval := time.Duration(GinkgoRandomSeed()%30+1) *
time.Second produces 1–30s not 1–31s; update the random range expression used to
compute interval (the line defining interval using GinkgoRandomSeed()) so it
yields numbers 1 through 31 (e.g., change the modulus to 31 or use an equivalent
method such as rand.Intn(31)+1) to match the stated "between 1 and 31 seconds".

Comment on lines +318 to +352
// validateLogWithInterval validates that the log reduction is working correctly with the specified interval.
// It waits for a 1-minute window, then counts log entries matching the pattern. The expected count is
// (windowDuration/interval * numberOfProcesses), with adaptive tolerance to account for timing variations.
// For short intervals (≤5s), uses higher tolerance as they're more sensitive to timing variations.
func validateLogWithInterval(client *clients.Settings, nodeName string, interval time.Duration, numberOfProcesses int) {
windowDuration := 1 * time.Minute

By(fmt.Sprintf("getting the PTP daemon pod for node %s", nodeName))
ptpDaemonPod, err := ptpdaemon.GetPtpDaemonPodOnNode(client, nodeName)
Expect(err).ToNot(HaveOccurred(), "Failed to get PTP daemon pod for node %s", nodeName)

startTime := time.Now()
time.Sleep(time.Until(startTime.Add(windowDuration))) // wait until end of window

By(fmt.Sprintf("getting logs for node %s since %v", nodeName, startTime))
logs, err := ptpDaemonPod.GetLogsWithOptions(&corev1.PodLogOptions{
SinceTime: &metav1.Time{Time: startTime},
Container: ranparam.PtpContainerName,
})
Expect(err).ToNot(HaveOccurred(), "Failed to get logs")

pattern := `(phc|master)\soffset\ssummary:\scnt=[0-9]+,\smin=-?[0-9]+,\smax=-?[0-9]+,\savg=-?` +
`[0-9]+.?[0-9]*,\sSD=-?[0-9]+.?[0-9]*`

regexLog, err := regexp.Compile(pattern)
Expect(err).ToNot(HaveOccurred(), "Failed to compile regex")

numberOfLogs := len(regexLog.FindAllString(string(logs), -1))

klog.V(tsparams.LogLevel).Infof("Found %d log entries matching pattern for node %s", numberOfLogs, nodeName)

Expect(numberOfLogs).To(And(
BeNumerically(">=", int(windowDuration/interval)*(numberOfProcesses-1)-numberOfProcesses),
BeNumerically("<", int(windowDuration/interval)*(numberOfProcesses+1)+numberOfProcesses)),
"Log count should be within expected range for interval %s", interval)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Tolerance bounds can make the lower bound zero or negative when numberOfProcesses is 1, allowing the test to pass with no matching logs.

When numberOfProcesses == 1:

  • Lower bound = cycles * (1-1) - 1 = -1 → effectively 0
  • This means the test succeeds even if zero log entries are found, defeating the purpose of validating that logs appear at the configured interval.

Also, the doc comment mentions special handling for short intervals (≤5s) with "higher tolerance," but the formula is actually uniform — there's no conditional logic for short intervals.

Consider either:

  1. Adding an explicit minimum lower bound (e.g., max(1, lowerBound)), or
  2. Adjusting the formula so the lower bound stays meaningful regardless of process count.
Possible fix sketch
+	expectedCount := int(windowDuration/interval) * numberOfProcesses
+	tolerance := numberOfProcesses + 1 // or a percentage-based tolerance
+	lowerBound := expectedCount - tolerance
+	if lowerBound < 1 {
+		lowerBound = 1
+	}
+	upperBound := expectedCount + tolerance
+
 	Expect(numberOfLogs).To(And(
-		BeNumerically(">=", int(windowDuration/interval)*(numberOfProcesses-1)-numberOfProcesses),
-		BeNumerically("<", int(windowDuration/interval)*(numberOfProcesses+1)+numberOfProcesses)),
+		BeNumerically(">=", lowerBound),
+		BeNumerically("<", upperBound)),
 		"Log count should be within expected range for interval %s", interval)
🤖 Prompt for AI Agents
In `@tests/cnf/ran/ptp/tests/ptp-log-reduction.go` around lines 318 - 352, In
validateLogWithInterval, the computed lower bound for Expect can be zero or
negative when numberOfProcesses==1, allowing zero matches to pass; compute
cycles := int(windowDuration/interval), then compute lower :=
cycles*(numberOfProcesses-1)-numberOfProcesses and set lower = max(1, lower)
before using it in the Expect range (and keep the existing upper bound logic);
also implement the doc-comment note by adding special handling when interval <=
5*time.Second (e.g., increase tolerance by reducing the lower bound slightly but
never below 1) so short intervals get a looser but still meaningful lower bound;
update references to windowDuration, interval, numberOfProcesses, and
validateLogWithInterval accordingly.

Copy link
Collaborator

@klaskosk klaskosk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few comments, most are minor changes. There's a couple potential improvements to the tests we could make now that we're not rushing for GA, but feel free to disagree if they seem unnecessary

Comment on lines +48 to +60
By("checking node info map")
nodeInfoMap, err = profiles.GetNodeInfoMap(RANConfig.Spoke1APIClient)
Expect(err).ToNot(HaveOccurred(), "Failed to get node info map")

var gmCount uint
for _, nodeInfo := range nodeInfoMap {
gmCount += nodeInfo.Counts[profiles.ProfileTypeGM]
gmCount += nodeInfo.Counts[profiles.ProfileTypeMultiNICGM]
}

if gmCount > 0 {
Skip("Test does not support Grandmaster configurations")
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a little bit of duplication, but let's move the profile check inside the node loop of each test case. That way we can support scenarios where 1 node may have a GM profile but the other OC

Here's an example, at some point I'd like to be able to create a helper since it's such a common pattern

})

AfterEach(func() {
if CurrentSpecReport().State.String() == "skipped" {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if CurrentSpecReport().State.String() == "skipped" {
if CurrentSpecReport().State == types.SpecStateSkipped {

using github.com/onsi/ginkgo/v2/types, it's good to re-use constants if possible

err = metrics.EnsureClocksAreLocked(prometheusAPI)
Expect(err).ToNot(HaveOccurred(), "Failed to assert clock state is locked")

testRanAtLeastOnce = false
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's pretty minor, but I have a slight preference towards keeping this var inside the test. If we're not using the variable in between BeforeEach/AfterEach and the It blocks, it's better to keep it scoped to the individual tests

"logReduce to \"enhanced\"", reportxml.ID("83456"), func() {
// master offset summary log pattern for 'enhanced' logReduction option
// An example of a log with the pattern: master offset summary: cnt=2, min=-1, max=1, avg=0.00, SD=1.41
pattern := `master\soffset\ssummary:\scnt=[0-9]+,\smin=-?[0-9]+,\smax=-?[0-9]+,\savg=-?` +
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since the pattern is a constant, please create a global variable in the file and use regexp.MustCompile. It's technically faster, but more importantly, it'll get caught by ginkgo dry-run if there's an issue. That way you don't have to run the test to ensure it can be compiled

same goes for the other regexp if they're constant. If you need to change them in the test, this pattern works fine then


rereading the old test since I remembered this coming up, I forgot I made the matcher accept the compiled version in eco-gotests to allow you to use MustCompile 🙂

})

// setLogReduceValForNode sets the logReduce value for all PTP profiles in all PTP configs.
// It updates all profiles across all configs (not just those applying to the specified node).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this a little since I think only considering the ones on a single node would be ideal, but I'm pretty sure the complexity isn't worth it. As long as we're covering at least the profiles on the node, it's fine

all that to say I think the current system is good, no changes needed

updated = true
}

if updated {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to update the startTime in this if statement to ensure that we're waiting for the first profile load message after all the configs have been updated

// After updating configs, it waits for the profile to reload on the specified node. If no
// configs are updated (all already have the desired value), the wait is skipped.
func setLogReduceValForNode(client *clients.Settings, nodeName string, newLogReduceVal string) error {
startTime := time.Now()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the "fancy" solution here could be using var startTime time.Time and then using startTime.IsZero() instead of configsUpdated, since you'd set the startTime when a config is updated

optional though, using a separate variable works too. the startTime needs updated either way

Expect(err).ToNot(HaveOccurred(), "Failed to convert number of processes to int")

// Use a random interval between 1 and 31 seconds
interval := time.Duration(GinkgoRandomSeed()%30+1) * time.Second
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be fine with making the interval constant to simplify the validation here. Like a 10 second interval and 1 minute window we just need to check that 6*proccesses <= log count <= 7*processes

this would also address the issue that coderabbit has with the check potentially accepting 0 occurrences

Comment on lines +227 to +234
By(fmt.Sprintf("getting the PTP daemon pod for node %s", nodeName))
ptpDaemonPod, err := ptpdaemon.GetPtpDaemonPodOnNode(RANConfig.Spoke1APIClient, nodeName)
Expect(err).ToNot(HaveOccurred(), "Failed to get PTP daemon pod for node %s", nodeName)

By(fmt.Sprintf("getting number of ptp4l and phc2sys processes on node %s", nodeName))
cmd := []string{"/bin/bash", "-c", "ps -C ptp4l -C phc2sys -o pid= | wc -l"}
numberOfProcessesBytes, err := ptpDaemonPod.ExecCommand(cmd, ranparam.PtpContainerName)
Expect(err).ToNot(HaveOccurred(), "Failed to execute command on pod")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should use ptpdaemon.ExecuteCommandInPtpDaemonPod since it provides options for retries and handles finding the pod automatically

Comment on lines +198 to +199
// Wait for PTP processes to stabilize with new configuration
time.Sleep(10 * time.Second)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious if there's a better way to wait for the processes to stabilize. It may not be worth the complexity, but it'd be nice if we could wait on a condition rather than a simple sleep

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants