`inference` 1.0 RC1 #1959

PawelPeczek-Roboflow · 2026-01-30T19:38:58Z

What does this PR do?

Related Issue(s):

Type of Change

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change that adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Refactoring (no functional changes)
Other:

Testing

I have tested this change locally
I have added/updated tests for this change

Test details:

Checklist

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code where necessary, particularly in hard-to-understand areas
My changes generate no new warnings or errors
I have updated the documentation accordingly (if applicable)

Additional Context

…rize requests

… extras

…ary GPU build

…pport|

…e-changes of defaults

…with use-inference-models

codeflash-ai · 2026-02-04T11:48:17Z

inference/core/models/inference_models_adapters.py

+        np_images: List[np.ndarray] = [
+            load_image_bgr(
+                v,
+                disable_preproc_auto_orient=kwargs.get(
+                    "disable_preproc_auto_orient", False
+                ),
+            )
+            for v in images
+        ]
+        mapped_kwargs = self.map_inference_kwargs(kwargs)
+        return self._model.pre_process(np_images, **mapped_kwargs)


⚡️Codeflash found 12% (0.12x) speedup for InferenceModelsObjectDetectionAdapter.preprocess in inference/core/models/inference_models_adapters.py

⏱️ Runtime : 4.82 milliseconds → 4.28 milliseconds (best of 39 runs)

📝 Explanation and details

The optimized code achieves a 12% speedup by eliminating redundant operations in the preprocess method:

Key Optimization:
The critical change is hoisting the kwargs.get("disable_preproc_auto_orient", False) call outside the list comprehension. In the original code, this dictionary lookup was performed once per image (747 times in the profiler results), taking ~1.48ms total. The optimized version performs this lookup just once before the loop, reducing it to a negligible ~46μs.

Why This Works:

Dictionary lookups in Python have overhead (hash computation, key comparison)

The disable_preproc_auto_orient value is constant across all images in a batch

By extracting it to a variable, we eliminate 746 redundant lookups per batch

This is particularly impactful when processing larger batches (see the 200-image test showing similar gains)

Additional Cleanup:
The map_inference_kwargs method was removed as it simply returned kwargs unchanged. This eliminates an unnecessary method call (taking ~3.66ms in the original) and simplifies the code path. The kwargs are now passed directly to self._model.pre_process().

Performance Profile:

Line profiler shows the list comprehension time dropped from 33.1ms to 31.7ms (4.2% faster at the loop level)

The overall preprocess method improved from 42.8ms to 37.1ms (13% faster)

Test results confirm consistent 5-15% speedups across single images, batches, and edge cases

This optimization is most beneficial when preprocess is called frequently with batches of images, as the per-image overhead reduction compounds with batch size.

✅ Correctness verification report:

Test Status

⏪ Replay Tests 🔘 None Found

⚙️ Existing Unit Tests 🔘 None Found

🔎 Concolic Coverage Tests 🔘 None Found

🌀 Generated Regression Tests ✅ 6 Passed

📊 Tests Coverage 100.0%

🌀 Click to see Generated Regression Tests

import types import inference.core.models.inference_models_adapters as adapters_module import numpy as np # imports import pytest # used for our unit tests from inference.core.models.inference_models_adapters import ( InferenceModelsObjectDetectionAdapter, ) # Helper lightweight model used by tests to capture calls to pre_process. # This is not a mock from unittest.mock; it's a tiny concrete object used only # to observe how preprocess forwards its data to the model. class MinimalModel: def __init__(self, return_value=None): self.last_called_with = None self.return_value = return_value if return_value is not None else {"ok": True} def pre_process(self, np_images, **kwargs): # Record what we received for assertions in tests self.last_called_with = (list(np_images), dict(kwargs)) return self.return_value def test_preprocess_single_image_invokes_load_and_model(monkeypatch): # Prepare a deterministic ndarray to be returned by the patched load_image_bgr. fake_image = np.zeros((8, 8, 3), dtype=np.uint8) # Track calls to the patched loader to assert disable_preproc_auto_orient usage. called = {"count": 0, "last_flag": None} def fake_load_image_bgr(value, disable_preproc_auto_orient=False): # Ensure the value passed through correctly (we don't assert a specific type) called["count"] += 1 called["last_flag"] = disable_preproc_auto_orient # return a copy to ensure preprocess can't mutate the original easily return fake_image.copy() # Patch the adapter module's load_image_bgr function (it was imported there) monkeypatch.setattr(adapters_module, "load_image_bgr", fake_load_image_bgr) # Build an adapter instance without calling __init__ to avoid heavy external dependencies. adapter = object.__new__(InferenceModelsObjectDetectionAdapter) # Provide a minimal model that records calls and returns a known value. model = MinimalModel(return_value={"result": "single"}) adapter._model = model # Call preprocess with a single "image" (could be any sentinel value) sentinel = "SINGLE_IMAGE_SENTINEL" codeflash_output = adapter.preprocess(sentinel, some_kw=1) out = codeflash_output # 12.4μs -> 12.3μs (0.819% faster) np_images_passed, kwargs_passed = model.last_called_with def test_preprocess_batch_images_and_disable_flag(monkeypatch): # Prepare a small batch batch = ["img0", "img1", "img2"] returned_images = [ np.full((4, 4, 3), fill_value=i, dtype=np.uint8) for i in range(len(batch)) ] # A loader that returns distinct arrays per invocation and records the flags call_info = {"flags": []} def fake_load_image_bgr(value, disable_preproc_auto_orient=False): # choose an image from returned_images based on the invocation count idx = len(call_info["flags"]) call_info["flags"].append(disable_preproc_auto_orient) # Return a copy to avoid accidental shared-state modifications in tests return returned_images[idx].copy() monkeypatch.setattr(adapters_module, "load_image_bgr", fake_load_image_bgr) adapter = object.__new__(InferenceModelsObjectDetectionAdapter) model = MinimalModel(return_value={"result": "batch"}) adapter._model = model # Call preprocess with the batch and force disable_preproc_auto_orient True codeflash_output = adapter.preprocess(batch, disable_preproc_auto_orient=True) out = codeflash_output # 8.49μs -> 7.35μs (15.4% faster) np_images_passed, kwargs_passed = model.last_called_with # Each image received by model should match the distinct arrays returned by our fake loader for i, arr in enumerate(np_images_passed): pass def test_preprocess_empty_list_calls_model_with_empty_list(monkeypatch): # Ensure loader would raise if called (it should not be called for an empty list) def loader_should_not_be_called(value, disable_preproc_auto_orient=False): raise AssertionError( "load_image_bgr must not be called for an empty input list" ) monkeypatch.setattr(adapters_module, "load_image_bgr", loader_should_not_be_called) adapter = object.__new__(InferenceModelsObjectDetectionAdapter) model = MinimalModel(return_value={"result": "empty"}) adapter._model = model codeflash_output = adapter.preprocess([], someflag=True) out = codeflash_output # 3.30μs -> 3.00μs (10.1% faster) np_images_passed, kwargs_passed = model.last_called_with def test_preprocess_uses_map_inference_kwargs(monkeypatch): # Simple loader that always returns the same array monkeypatch.setattr( adapters_module, "load_image_bgr", lambda v, disable_preproc_auto_orient=False: np.zeros( (2, 2, 3), dtype=np.uint8 ), ) adapter = object.__new__(InferenceModelsObjectDetectionAdapter) model = MinimalModel(return_value={"ok": "mapped"}) adapter._model = model # Attach a custom map_inference_kwargs bound method to this instance that transforms kwargs. def custom_mapper(self, kwargs): # Return a new dict that intentionally alters/filters incoming kwargs return {"mapped_key": "mapped_value"} adapter.map_inference_kwargs = types.MethodType(custom_mapper, adapter) # Call preprocess with arbitrary kwargs; they should be replaced by custom_mapper output codeflash_output = adapter.preprocess("dummy_input", original="value") out = codeflash_output # 7.82μs -> 7.43μs (5.13% faster) _, kwargs_passed = model.last_called_with def test_preprocess_propagates_loader_exceptions(monkeypatch): # Patch loader to raise a ValueError for the first element def failing_loader(value, disable_preproc_auto_orient=False): raise ValueError("invalid image data") monkeypatch.setattr(adapters_module, "load_image_bgr", failing_loader) adapter = object.__new__(InferenceModelsObjectDetectionAdapter) adapter._model = MinimalModel() with pytest.raises(ValueError) as excinfo: adapter.preprocess("bad_input") # 3.35μs -> 3.41μs (1.73% slower) def test_preprocess_large_batch_handles_many_images(monkeypatch): # Create a batch of 200 small "images" to test scaling of the preprocessing step. batch_size = 200 batch = [f"img_{i}" for i in range(batch_size)] # Loader that returns the same small ndarray for each call and counts calls call_count = {"n": 0} def generic_loader(value, disable_preproc_auto_orient=False): call_count["n"] += 1 # return a tiny array unique by filling with the call index modulo 256 to keep memory low return np.full((1, 1, 3), fill_value=call_count["n"] % 256, dtype=np.uint8) monkeypatch.setattr(adapters_module, "load_image_bgr", generic_loader) adapter = object.__new__(InferenceModelsObjectDetectionAdapter) # Model returns the number of images it received to make verification simple class ReturnCountModel: def __init__(self): self.last_called_with = None def pre_process(self, np_images, **kwargs): self.last_called_with = (list(np_images), dict(kwargs)) return {"received": len(np_images)} model = ReturnCountModel() adapter._model = model codeflash_output = adapter.preprocess(batch) out = codeflash_output # 356μs -> 351μs (1.40% faster) np_images_passed, _ = model.last_called_with # codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To test or edit this optimization locally git merge codeflash/optimize-pr1959-2026-02-04T11.48.16

Suggested change

np_images: List[np.ndarray] = [

load_image_bgr(

v,

disable_preproc_auto_orient=kwargs.get(

"disable_preproc_auto_orient", False

),

)

for v in images

]

mapped_kwargs = self.map_inference_kwargs(kwargs)

return self._model.pre_process(np_images, **mapped_kwargs)

disable_preproc_auto_orient = kwargs.get("disable_preproc_auto_orient", False)

np_images: List[np.ndarray] = [

load_image_bgr(v, disable_preproc_auto_orient=disable_preproc_auto_orient)

for v in images

]

return self._model.pre_process(np_images, **kwargs)

PawelPeczek-Roboflow added 30 commits January 21, 2026 17:13

Unify naming conventions for all models in new inference

7875499

Express unification of parameters in docs

42c33c4

Add configuration of models default values from the env

e2510ce

Add docs about setting prediction parameters default from env

d00ebf2

WIP - onbording of new inference into old inference

59f7ebc

Add classification adapter and mkdocs dependencies

31ad59b

Add adapters for Florence and PaliGemma

88f12f4

Add QWEN 2.5 VL

b607960

Add QWEN3VL

d793596

Add baseline adapter for SAM model

d42b76a

Add SAM adapter and test it briefly

4787c47

Add implementation of SAM2 wrapper

00c26d0

Add CLIP adapter

03755b4

Add Face & GAZE detector pipeline adapter

53c6c87

Add adapter for running SmolVLM in old inference

2259077

Add adapters for DepthAnything V2 and V3 models

7851ed9

Add moondream2 and DocTR adapters

03e2115

Add EasyOCR and TROcr adapters

e99d746

Add GroundingDINO inference models adapter

82c3cb1

Add PerceptionEncoder adapter

eaa7010

Add baseline implementation for OWLv2 model adapter

cde3494

WIP - instant models adapters

a75bef3

Add support for OWLv2 and RF-Instant models adapters

fcc54bc

Improve logging

8f165bd

Safe commit - WIP

c512627

Add fixes for bugs detceted

28bd41d

Apply fixes for problems spotted

d5d5422

WIP - add changes to additional headers in order to make sure we auto…

083f2c9

…rize requests

Make linters happy

7d46e2f

Merge branch 'main' into feature/the-great-unification-of-inference

325f3de

PawelPeczek-Roboflow and others added 29 commits January 30, 2026 15:59

Align yolo26

736e002

Add Adapters for YOLO26

a04f1e0

Add rc release of inference-models

ef8453d

Fix the version of inference-models to latest RC

91f6145

Bump version of inference

41648bb

Provide JP5.1 build of inference 1.0

ad83433

Resolve conflicts with main

62ed793

Make linters happy

306dbca

Alter GPU build to install inference-models locally with all required…

ba80386

… extras

Fix typo in docker build

328bc63

Fix location of uv

210fa75

Adjust docker builds

f83b3f0

Adjust docker build tags such that one can run trial builds withing GHA

8cc04b1

Adjust JP6.0 build

537db04

Adjust JP6.0 build

e65a315

ALign GPU builds - TRT build should defacto now be aligned with oridn…

7962a37

…ary GPU build

Fix builds

064b91a

Adjust builds for JP51 to work again with torch compiled with CUDA su…

392f173

…pport|

Adjust requirements

c1e2d09

Adjust requirements

b44f56b

Adjust requirements

24026b9

Adjust requirements

396ccb2

Add adjustments to tests assertions suhc that we bring back values pr…

3150a13

…e-changes of defaults

Adjust unit-test assertion

b83b5eb

Adjust test threshold

4dc2713

Remove broken import

c412cfc

Merge branch 'main' into feature/the-great-unification-of-inference

17af6c6

Add changes in workflows integration tests such that it runs on grid …

714c143

…with use-inference-models

Add grid of tests with new inference models used as engine to run models

f90f2c4

codeflash-ai bot reviewed Feb 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`inference` 1.0 RC1 #1959

`inference` 1.0 RC1 #1959

PawelPeczek-Roboflow commented Jan 30, 2026

Uh oh!

codeflash-ai bot Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Test	Status
⏪ Replay Tests	🔘 None Found
⚙️ Existing Unit Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 6 Passed
📊 Tests Coverage	100.0%

inference 1.0 RC1 #1959

Are you sure you want to change the base?

inference 1.0 RC1 #1959

Conversation

PawelPeczek-Roboflow commented Jan 30, 2026

What does this PR do?

Type of Change

Testing

Checklist

Additional Context

Uh oh!

codeflash-ai bot Feb 4, 2026

Choose a reason for hiding this comment

⚡️Codeflash found 12% (0.12x) speedup for InferenceModelsObjectDetectionAdapter.preprocess in inference/core/models/inference_models_adapters.py

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

`inference` 1.0 RC1 #1959

`inference` 1.0 RC1 #1959

⚡️Codeflash found 12% (0.12x) speedup for `InferenceModelsObjectDetectionAdapter.preprocess` in `inference/core/models/inference_models_adapters.py`