[OVEP] ORT 1.24 Release Patch#27238
Conversation
* Reuse weight files across shared contexts. * fix format
|
@adrianlizarraga & @HectorSVC Please review & merge. FYI @MayureshV1 |
|
/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows ARM64 QNN CI Pipeline, Windows GPU Doc Gen CI Pipeline |
|
Azure Pipelines successfully started running 4 pipeline(s). |
|
@adrianlizarraga , @yuslepukhin . Can you please review and have this merged. |
### Description Re-use weight files and their underlying memory maps across shared contexts. ### Motivation and Context This reduces resident memory when different ep shared context sets reference the same weight file. Co-authored-by: Eric Crawford <eric.r.crawford@intel.com>
There was a problem hiding this comment.
Pull request overview
This PR introduces memory optimization for the OpenVINO Execution Provider by implementing a WeightFileManager singleton that enables sharing of weight file instances and their underlying memory maps across multiple SharedContext instances. This reduces memory footprint when different execution provider shared contexts reference the same weight files.
Changes:
- Introduced
WeightFileManagersingleton to globally manage weight file instances - Changed
WeightsFilestorage fromunique_ptrtoshared_ptrto enable sharing across contexts - Modified
SharedContextconstructor to acceptbin_pathby const reference instead of by value
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| onnxruntime/core/providers/openvino/ov_shared_context.h | Added WeightFileManager singleton class, updated SharedContext to use shared weight files, and adjusted constructor signature |
| onnxruntime/core/providers/openvino/ov_shared_context.cc | Updated constructor to initialize weight_file_manager_ and modified weight file acquisition to use the global manager |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| void | ||
| LoadTensorFromFile( |
There was a problem hiding this comment.
The return type and function name are split across two lines, which is inconsistent with the rest of the codebase. All other function declarations in this file (e.g., lines 62, 76-79) keep the return type and function name on the same line. This should be reformatted to match the existing code style.
| void | |
| LoadTensorFromFile( | |
| void LoadTensorFromFile( |
| @@ -104,7 +106,9 @@ class SharedContext : public std::enable_shared_from_this<SharedContext> { | |||
| std::map<std::string, MappingContainer> imported_device_tensors_; | |||
| }; | |||
There was a problem hiding this comment.
The WeightsFile struct lacks thread synchronization for its member access. With the introduction of WeightFileManager, WeightsFile instances are now shared across multiple SharedContext objects (line 128-137). This means multiple threads can concurrently call LoadWeights() and TryGetOrCreateDeviceMapping() on the same WeightsFile instance. The LoadWeights() method performs non-thread-safe operations on the file_ member (seekg and read), and TryGetOrCreateDeviceMapping() modifies imported_device_tensors_ without synchronization. Add a mutex member to WeightsFile and protect all member accesses to prevent race conditions.
### Description Re-use weight files and their underlying memory maps across shared contexts. ### Motivation and Context This reduces resident memory when different ep shared context sets reference the same weight file. Co-authored-by: Eric Crawford <eric.r.crawford@intel.com>
Description
Re-use weight files and their underlying memory maps across shared contexts.
Motivation and Context
This reduces resident memory when different ep shared context sets reference the same weight file.