Add model manager that automatically manage model across processes#37113
Add model manager that automatically manage model across processes#37113damccorm merged 23 commits intoapache:masterfrom
Conversation
Summary of ChangesHello @AMOOOMA, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a sophisticated model management system for Apache Beam's ML inference capabilities. The core "ModelManager" class, supported by "GPUMonitor" and "ResourceEstimator", intelligently handles the lifecycle of machine learning models, particularly on GPU-accelerated environments. It aims to prevent out-of-memory errors by dynamically estimating model memory requirements, isolating unknown models for profiling, and implementing a demand-aware eviction strategy. This system ensures efficient and concurrent execution of diverse ML models within Beam pipelines, optimizing GPU resource utilization and improving overall stability. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
R: @damccorm |
|
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control. If you'd like to restart, comment |
damccorm
left a comment
There was a problem hiding this comment.
I've been looking at it for a while today and I am still having a hard time understanding the full scope of this PR (even with this being the second long look), and it will probably take a few more passes.
For now, added some things that will help me review better, but if there are pieces we can separate out further to make this more reviewable (either by splitting large functions apart or by pulling classes out of the PR), that would be quite helpful.
| with self._load_lock: | ||
| logger.info("Loading Model: %s (Unknown: %s)", tag, is_unknown) | ||
| isolation_baseline_snap, _, _ = self._monitor.get_stats() | ||
| instance = TrackedModelProxy(loader_func()) |
There was a problem hiding this comment.
What is loader_func here - is this spawning the model in process?
There was a problem hiding this comment.
Yep! This depends on what the user pass in which in the RunInference case would be spawning a model in process with the new MultiProcessShared util.
|
|
||
| with self._cv: | ||
| # FAST PATH | ||
| if self._pending_isolation_count == 0 and not self._isolation_mode: |
There was a problem hiding this comment.
Is self._pending_isolation_count ever non-zero?
There was a problem hiding this comment.
Good catch! This is some old code from older iterations. Removed now.
| It handles: | ||
| 1. LRU Caching of idle models. | ||
| 2. Resource estimation and admission control (preventing OOM). | ||
| 3. Dynamic eviction of low-priority models when space is needed. |
There was a problem hiding this comment.
Could you add more info on what makes a model low-priority here?
…code and cleanup some code logics
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #37113 +/- ##
=============================================
+ Coverage 35.95% 54.88% +18.92%
Complexity 1676 1676
=============================================
Files 1062 1063 +1
Lines 166037 166450 +413
Branches 1208 1208
=============================================
+ Hits 59705 91354 +31649
+ Misses 104151 72915 -31236
Partials 2181 2181
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
damccorm
left a comment
There was a problem hiding this comment.
Didn't get to the tests yet, but overall things generally look good. Left some comments below and I'll ask gemini to review as well
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a sophisticated ModelManager for handling ML models, with a strong focus on GPU memory management. The implementation is well-structured, separating concerns into GPUMonitor, ResourceEstimator, and the ModelManager itself. The logic for concurrency, resource contention, and model eviction is complex but appears robust. The accompanying tests are thorough and cover a wide range of scenarios. My review includes several suggestions to enhance security, code clarity, and efficiency in specific areas.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
| 1. LRU Caching of idle models. | ||
| 2. Resource estimation and admission control (preventing OOM). | ||
| 3. Dynamic eviction of low-priority models, determined by count of | ||
| pending requests, when space is needed. |
There was a problem hiding this comment.
From the doc precommit:
/runner/_work/beam/beam/sdks/python/test-suites/tox/pycommon/build/srcs/sdks/python/target/.tox-docs/docs/lib/python3.10/site-packages/apache_beam/ml/inference/model_manager.py:docstring of apache_beam.ml.inference.model_manager.ModelManager:8: ERROR: Unexpected indentation.
I think this needs an extra indent to handle lists like
beam/sdks/python/apache_beam/transforms/core.py
Line 1071 in 183ecd3
There was a problem hiding this comment.
Thanks for the catch! Updated.
Added Model Manager as a util class that offers managed access to models, the client can request models without having to worry about managing GPU OOMs.
Also added various tests that checks the functions of all classes.
Classes
GPUMonitorstart(): Begins background memory polling.stop(): Stops polling.reset_peak(): Resets peak usage tracking.get_stats() -> (current, peak, total): Returns memory stats.ResourceEstimatoris_unknown(model_tag: str) -> bool: Checks if model needs profiling.get_estimate(model_tag: str, default_mb: float) -> float: Returns memory cost.set_initial_estimate(model_tag: str, cost: float): Manually sets cost.add_observation(active_snapshot, peak_memory): Updates cost model via NNLS solver.ModelManageracquire_model(tag: str, loader_func: Callable) -> Any: Gets model instance (handles isolation/concurrency).release_model(tag: str, instance: Any): Returns model to pool.force_reset(): Clears all models and caches.shutdown(): Cleans up resources.Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>instead.CHANGES.mdwith noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.