Skip to content

Improve dynamic pipeline step monitoring#4369

Open
schustmi wants to merge 2 commits intodevelopfrom
feature/improved-step-monitoring
Open

Improve dynamic pipeline step monitoring#4369
schustmi wants to merge 2 commits intodevelopfrom
feature/improved-step-monitoring

Conversation

@schustmi
Copy link
Contributor

@schustmi schustmi commented Dec 18, 2025

Describe changes

This PR updates the way isolated steps are executed and monitored in dynamic pipelines.

Previously, it worked as follows: When the user decided to run an isolated step, we would call an orchestrator method that would launch the step and then wait for the step to finish before returning. This meant that the thread was blocked for the entire duration.

With this PR, the submitting of an isolated step is now separated from the monitoring. When an isolated step needs to be run, the orchestrator submits that steps to its backend. After that, the thread is freed up and can be used to launch other steps. The monitoring of all isolated steps happens in a single thread that is just responsible for the monitoring.

Pre-requisites

Please ensure you have done the following:

  • I have read the CONTRIBUTING.md document.
  • I have added tests to cover my changes.
  • I have based my new branch on develop and the open PR is targeting develop. If your branch wasn't based on develop read Contribution guide on rebasing branch to develop.
  • IMPORTANT: I made sure that my changes are reflected properly in the following resources:
    • ZenML Docs
    • Dashboard: Needs to be communicated to the frontend team.
    • Templates: Might need adjustments (that are not reflected in the template tests) in case of non-breaking changes and deprecations.
    • Projects: Depending on the version dependencies, different projects might get affected.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Other (add details above)

@schustmi schustmi added the no-release-notes Release notes will NOT be attached and used publicly for this PR. label Dec 18, 2025
@github-actions github-actions bot added internal To filter out internal PRs and issues enhancement New feature or request labels Dec 18, 2025
@github-actions github-actions bot added the stale label Jan 2, 2026
@schustmi schustmi force-pushed the feature/improved-step-monitoring branch 3 times, most recently from 1db80b5 to 4a7a8fc Compare January 2, 2026 16:47
@github-actions github-actions bot removed the stale label Jan 3, 2026
@schustmi schustmi changed the title Improved dynamic pipeline step monitoring Improve dynamic pipeline step monitoring Jan 5, 2026
@schustmi schustmi force-pushed the feature/improved-step-monitoring branch 21 times, most recently from 4edf3e5 to a7f4029 Compare January 9, 2026 13:47
@schustmi schustmi force-pushed the feature/improved-step-monitoring branch 5 times, most recently from 926ac37 to 8eeb6f7 Compare January 14, 2026 16:07
@schustmi schustmi marked this pull request as ready for review January 16, 2026 12:47
@schustmi schustmi force-pushed the feature/improved-step-monitoring branch from 8eeb6f7 to 6dc63a3 Compare January 16, 2026 12:57
@schustmi schustmi force-pushed the feature/improved-step-monitoring branch 4 times, most recently from 5ea8bea to aa6f2b8 Compare January 20, 2026 11:58
@zenml-io zenml-io deleted a comment from github-actions bot Jan 20, 2026
@schustmi schustmi added release-notes Release notes will be attached and used publicly for this PR. and removed no-release-notes Release notes will NOT be attached and used publicly for this PR. labels Jan 20, 2026
@schustmi schustmi force-pushed the feature/improved-step-monitoring branch 2 times, most recently from 525d61b to 3093812 Compare January 20, 2026 16:39
@bcdurak bcdurak linked an issue Jan 22, 2026 that may be closed by this pull request
1 task
Copy link
Contributor

@bcdurak bcdurak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What a complicated but beautifully crafted piece of code <3

I am leaving a few comments to start with, and I will take another look into the runner tomorrow. Since it touches upon the main orchestrators, I feel like a small test session would be nice. Which ones have you tested so far?

logger.info("Waiting for AzureML job `%s` to finish...", job.name)
ml_client.jobs.stream(job.name)
logger.info("AzureML job `%s` completed.", job.name)
publish_step_run_metadata(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am just going to leave a small comment here. In our case, I don't think it is a big problem, but wanted to mention just in case. There are scenarios where this call might skip the creation of certain metadata without failing (data too large, or unsupported type), in which case, there might be a drift between the step_run in the info and in the DB.

@schustmi schustmi force-pushed the feature/improved-step-monitoring branch 2 times, most recently from 01aef69 to 96ab3f8 Compare January 30, 2026 15:47
@schustmi schustmi force-pushed the feature/improved-step-monitoring branch 2 times, most recently from 3bde83c to 4995575 Compare February 2, 2026 08:03
@schustmi schustmi force-pushed the feature/improved-step-monitoring branch 2 times, most recently from cb8f1d2 to 327433d Compare February 5, 2026 08:41
@schustmi schustmi force-pushed the feature/improved-step-monitoring branch from 327433d to 8f66546 Compare February 6, 2026 10:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request internal To filter out internal PRs and issues release-notes Release notes will be attached and used publicly for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve isolated step monitoring for dynamic pipelines

2 participants