schedule fuzz based on preprocess queue size #5153

javanlacerda · 2026-02-01T02:02:48Z

SHOULD BE MERGED AFTER #5150.

It updates the logic for scheduling fuzz tasks. Instead of aiming to load the GCP Batch infrastructure based on the available CPUs, it looks only to the preprocess queue size.

By default, it aims to keep the preprocess queue with 10k messages, creating the number of tasks based on the difference between the aim value and the current value.

jonathanmetzman · 2026-02-04T16:58:21Z

I'm not going to do a thorough review but will share context to avoid a production incident.
I think it's likely there will be infinite queuing if #5140 is not landed before this.

jonathanmetzman · 2026-02-04T17:01:58Z

src/clusterfuzz/_internal/cron/schedule_fuzz.py

-
-def count_unacked(creds, project_id, subscription_id):
-  """Counts the unacked messages in |subscription_id|."""
+def get_queue_size(creds, project_id, subscription_id):


It's probably worth notiing somewhere that the queue size metric is delayed by about 5 minutes.

Not sure I got your point.

Sorry, I mean that we should mention the need for delays. If the cron runs too often, it might check the queue size before the metric has actually updated to reflect the jobs it just added. Does this make sense?

That's a good point. But the idea is to tweak this by using the feature flags to have the balance on the size of the queue and the frequency of the cron execution.

jonathanmetzman · 2026-02-04T17:02:26Z

I'd run this locally using butler scripts to make sure it behaves nicely.

javanlacerda · 2026-02-04T17:10:03Z

I'd run this locally using butler scripts to make sure it behaves nicely.

Its running in dev for 4 days already :)

javanlacerda · 2026-02-04T17:15:01Z

I'm not going to do a thorough review but will share context to avoid a production incident. I think it's likely there will be infinite queuing if #5140 is not landed before this.

Will not because this should be landed only after #5150. And even we don't have the job limiter for batch yet, we can control how many tasks will be forward to there by the RemoteTaskGate frequencies.

decoNR · 2026-02-04T17:16:10Z

src/clusterfuzz/_internal/cron/schedule_fuzz.py

@@ -223,7 +167,7 @@ def get_fuzz_tasks(self) -> Dict[str, tasks.Task]:
      weights.append(fuzz_task_candidate.weight)

    # TODO(metzman): Handle high-end jobs correctly.


Just for clarification, how do these new changes relate to this comment?

I don't know what @jonathanmetzman meant here, but AFAIK we doesn't support high-end jobs anymore.

decoNR · 2026-02-04T17:19:09Z

src/clusterfuzz/_internal/cron/schedule_fuzz.py


    weights = [candidate.weight for candidate in fuzz_task_candidates]
-    num_instances = int(self.num_cpus / self._get_cpus_per_fuzz_job(None))
+    num_instances = self.num_tasks


Maybe change this variable name?

Moving to fuzz_tasks

decoNR · 2026-02-04T17:22:13Z

src/clusterfuzz/_internal/cron/schedule_fuzz.py

-      subconf['name'] for subconf in batch_config.get(
-          'mapping.LINUX-PREEMPTIBLE-UNPRIVILEGED.subconfigs')
-  }
+PREPROCESS_TARGET_SIZE_DEFAULT = 10000


I think it would be easier to read if this is at the top of the file. Is there a specific reason for it to be here?

It should be in the top. Moving.

decoNR · 2026-02-04T17:30:17Z

src/clusterfuzz/_internal/cron/schedule_fuzz.py


    # TODO(metzman): Handle high-end jobs correctly.
-    num_instances = int(self.num_cpus / self._get_cpus_per_fuzz_job(None))
+    num_instances = self.num_tasks


It sounds good. I'll move it to fuzz_tasks

decoNR · 2026-02-04T17:40:39Z

src/clusterfuzz/_internal/cron/schedule_fuzz.py

+    conf = local_config.ProjectConfig()
+    max_cpus_per_schedule = conf.get('max_cpus_per_schedule')
+    if max_cpus_per_schedule:
+      max_tasks = int(max_cpus_per_schedule / CPUS_PER_FUZZ_JOB)


Can this be deprecated? It appears to be used only here.

I think so. I'll remove it

cemon721-a11y · 2026-02-04T19:18:14Z

"Hi Javan,
As a Research Analysis Researcher, I've been following this thread. I'm concerned about how the 'infinite queuing' risk or the manual control of 'RemoteTaskGate frequencies' might affect the consistency of our data processing and overall system stability.
Could you please confirm if I should pause any ongoing research tasks or data extraction until PR #5140 and #5150 are fully landed? I want to ensure that our research outputs aren't compromised by these technical dependencies.
Thanks for the context!"

jonathanmetzman · 2026-02-04T19:31:24Z

I'm not going to do a thorough review but will share context to avoid a production incident. I think it's likely there will be infinite queuing if #5140 is not landed before this.

Will not because this should be landed only after #5150. And even we don't have the job limiter for batch yet, we can control how many tasks will be forward to there by the RemoteTaskGate frequencies.

Great. I haven't had a chance to review/understand that PR, but are you relying sending everything to k8s to avoide infinite queuing in batch? Otherwise I don't see how we ever know if batch is full/queueing without querying the batch API. I trust you!

cemon721-a11y · 2026-02-04T19:37:30Z

Hi Javan, Thank you for your guidance and help over the past 4 days regarding the technical issues in this thread. I am writing to let you know that I am closing my requests and will be leaving GitHub entirely. As the founder of *Hill Tope Europe*, I have realized that this platform is not the right fit for my current business needs. I am concerned about the privacy of my data and the public exposure of my ideas, so I have decided to reset my repositories and deactivate my account to protect my startup’s content. I apologize for any inconvenience this quick departure causes, but I must prioritize the security of my intellectual property. Thank you for your expertise and for your efforts to prevent production incidents during our discussion. Best regards, *Shakib Sadman* Founder, Hill Tope Europe (GitHub: cemon721-a11y)

…

On Thu, 5 Feb 2026, 01:35 jonathanmetzman, ***@***.***> wrote: *jonathanmetzman* left a comment (google/clusterfuzz#5153) <#5153 (comment)> I'm not going to do a thorough review but will share context to avoid a production incident. I think it's likely there will be infinite queuing if #5140 <#5140> is not landed before this. Will not because this should be landed only after #5150 <#5150>. And even we don't have the job limiter for batch yet, we can control how many tasks will be forward to there by the RemoteTaskGate frequencies. Great. I haven't had a chance to review/understand that PR, but are you relying sending everything to k8s to avoide infinite queuing in batch. Otherwise I don't see how we ever know if batch is full/queueing without querying the batch API. I trust you! — Reply to this email directly, view it on GitHub <#5153 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/B332O6IWF35C4XKBFYHU2MD4KJCVRAVCNFSM6AAAAACTR6656SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTQNBZGM2DMNZVGE> . You are receiving this because you commented. Message ID: ***@***.***>

Signed-off-by: Javan Lacerda <javanlacerda@google.com> fixes Signed-off-by: Javan Lacerda <javanlacerda@google.com>

javanlacerda mentioned this pull request Feb 2, 2026

Implement Kubernetes job limiter #5150

Open

javanlacerda marked this pull request as ready for review February 4, 2026 16:51

jonathanmetzman reviewed Feb 4, 2026

View reviewed changes

javanlacerda requested review from ViniciustCosta and decoNR February 4, 2026 17:08

decoNR reviewed Feb 4, 2026

View reviewed changes

schedule fuzz based on preprocess queue size

ffb5cc7

Signed-off-by: Javan Lacerda <javanlacerda@google.com> fixes Signed-off-by: Javan Lacerda <javanlacerda@google.com>

javanlacerda force-pushed the javan.schedule-fuzz-queue-size branch from c58e4bb to ffb5cc7 Compare February 4, 2026 20:47

		@@ -223,7 +167,7 @@ def get_fuzz_tasks(self) -> Dict[str, tasks.Task]:
		weights.append(fuzz_task_candidate.weight)

		# TODO(metzman): Handle high-end jobs correctly.

schedule fuzz based on preprocess queue size #5153

Are you sure you want to change the base?

schedule fuzz based on preprocess queue size #5153

Conversation

javanlacerda commented Feb 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jonathanmetzman commented Feb 4, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jonathanmetzman commented Feb 4, 2026

Uh oh!

javanlacerda commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

javanlacerda commented Feb 4, 2026

Uh oh!

decoNR Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

decoNR Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

decoNR Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cemon721-a11y commented Feb 4, 2026

Uh oh!

jonathanmetzman commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cemon721-a11y commented Feb 4, 2026 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

javanlacerda commented Feb 1, 2026 •

edited

Loading

javanlacerda commented Feb 4, 2026 •

edited

Loading

decoNR Feb 4, 2026 •

edited

Loading

decoNR Feb 4, 2026 •

edited

Loading

decoNR Feb 4, 2026 •

edited

Loading

jonathanmetzman commented Feb 4, 2026 •

edited

Loading