-
Notifications
You must be signed in to change notification settings - Fork 600
schedule fuzz based on preprocess queue size #5153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
I'm not going to do a thorough review but will share context to avoid a production incident. |
|
|
||
| def count_unacked(creds, project_id, subscription_id): | ||
| """Counts the unacked messages in |subscription_id|.""" | ||
| def get_queue_size(creds, project_id, subscription_id): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably worth notiing somewhere that the queue size metric is delayed by about 5 minutes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I got your point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I mean that we should mention the need for delays. If the cron runs too often, it might check the queue size before the metric has actually updated to reflect the jobs it just added. Does this make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point. But the idea is to tweak this by using the feature flags to have the balance on the size of the queue and the frequency of the cron execution.
|
I'd run this locally using butler scripts to make sure it behaves nicely. |
Its running in dev for 4 days already :) |
Will not because this should be landed only after #5150. And even we don't have the job limiter for batch yet, we can control how many tasks will be forward to there by the RemoteTaskGate frequencies. |
| @@ -223,7 +167,7 @@ def get_fuzz_tasks(self) -> Dict[str, tasks.Task]: | |||
| weights.append(fuzz_task_candidate.weight) | |||
|
|
|||
| # TODO(metzman): Handle high-end jobs correctly. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for clarification, how do these new changes relate to this comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know what @jonathanmetzman meant here, but AFAIK we doesn't support high-end jobs anymore.
|
|
||
| weights = [candidate.weight for candidate in fuzz_task_candidates] | ||
| num_instances = int(self.num_cpus / self._get_cpus_per_fuzz_job(None)) | ||
| num_instances = self.num_tasks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe change this variable name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving to fuzz_tasks
| subconf['name'] for subconf in batch_config.get( | ||
| 'mapping.LINUX-PREEMPTIBLE-UNPRIVILEGED.subconfigs') | ||
| } | ||
| PREPROCESS_TARGET_SIZE_DEFAULT = 10000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be easier to read if this is at the top of the file. Is there a specific reason for it to be here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be in the top. Moving.
|
|
||
| # TODO(metzman): Handle high-end jobs correctly. | ||
| num_instances = int(self.num_cpus / self._get_cpus_per_fuzz_job(None)) | ||
| num_instances = self.num_tasks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds good. I'll move it to fuzz_tasks
| conf = local_config.ProjectConfig() | ||
| max_cpus_per_schedule = conf.get('max_cpus_per_schedule') | ||
| if max_cpus_per_schedule: | ||
| max_tasks = int(max_cpus_per_schedule / CPUS_PER_FUZZ_JOB) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be deprecated? It appears to be used only here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so. I'll remove it
|
"Hi Javan, |
Great. I haven't had a chance to review/understand that PR, but are you relying sending everything to k8s to avoide infinite queuing in batch? Otherwise I don't see how we ever know if batch is full/queueing without querying the batch API. I trust you! |
|
Hi Javan,
Thank you for your guidance and help over the past 4 days regarding the
technical issues in this thread.
I am writing to let you know that I am closing my requests and will be
leaving GitHub entirely. As the founder of *Hill Tope Europe*, I have
realized that this platform is not the right fit for my current business
needs. I am concerned about the privacy of my data and the public exposure
of my ideas, so I have decided to reset my repositories and deactivate my
account to protect my startup’s content.
I apologize for any inconvenience this quick departure causes, but I must
prioritize the security of my intellectual property. Thank you for your
expertise and for your efforts to prevent production incidents during our
discussion.
Best regards,
*Shakib Sadman* Founder, Hill Tope Europe
(GitHub: cemon721-a11y)
…On Thu, 5 Feb 2026, 01:35 jonathanmetzman, ***@***.***> wrote:
*jonathanmetzman* left a comment (google/clusterfuzz#5153)
<#5153 (comment)>
I'm not going to do a thorough review but will share context to avoid a
production incident. I think it's likely there will be infinite queuing if
#5140 <#5140> is not landed
before this.
Will not because this should be landed only after #5150
<#5150>. And even we don't have
the job limiter for batch yet, we can control how many tasks will be
forward to there by the RemoteTaskGate frequencies.
Great. I haven't had a chance to review/understand that PR, but are you
relying sending everything to k8s to avoide infinite queuing in batch.
Otherwise I don't see how we ever know if batch is full/queueing without
querying the batch API. I trust you!
—
Reply to this email directly, view it on GitHub
<#5153 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/B332O6IWF35C4XKBFYHU2MD4KJCVRAVCNFSM6AAAAACTR6656SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTQNBZGM2DMNZVGE>
.
You are receiving this because you commented.
Message ID: ***@***.***>
|
Signed-off-by: Javan Lacerda <javanlacerda@google.com> fixes Signed-off-by: Javan Lacerda <javanlacerda@google.com>
c58e4bb to
ffb5cc7
Compare
SHOULD BE MERGED AFTER #5150.
It updates the logic for scheduling fuzz tasks. Instead of aiming to load the GCP Batch infrastructure based on the available CPUs, it looks only to the preprocess queue size.
By default, it aims to keep the preprocess queue with 10k messages, creating the number of tasks based on the difference between the aim value and the current value.