Skip to content

Conversation

@PawelPeczek-Roboflow
Copy link
Collaborator

What does this PR do?

Related Issue(s):

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactoring (no functional changes)
  • Other:

Testing

  • I have tested this change locally
  • I have added/updated tests for this change

Test details:

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code where necessary, particularly in hard-to-understand areas
  • My changes generate no new warnings or errors
  • I have updated the documentation accordingly (if applicable)

Additional Context

Comment on lines +103 to +113
np_images: List[np.ndarray] = [
load_image_bgr(
v,
disable_preproc_auto_orient=kwargs.get(
"disable_preproc_auto_orient", False
),
)
for v in images
]
mapped_kwargs = self.map_inference_kwargs(kwargs)
return self._model.pre_process(np_images, **mapped_kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚡️Codeflash found 12% (0.12x) speedup for InferenceModelsObjectDetectionAdapter.preprocess in inference/core/models/inference_models_adapters.py

⏱️ Runtime : 4.82 milliseconds 4.28 milliseconds (best of 39 runs)

📝 Explanation and details

The optimized code achieves a 12% speedup by eliminating redundant operations in the preprocess method:

Key Optimization:
The critical change is hoisting the kwargs.get("disable_preproc_auto_orient", False) call outside the list comprehension. In the original code, this dictionary lookup was performed once per image (747 times in the profiler results), taking ~1.48ms total. The optimized version performs this lookup just once before the loop, reducing it to a negligible ~46μs.

Why This Works:

  • Dictionary lookups in Python have overhead (hash computation, key comparison)
  • The disable_preproc_auto_orient value is constant across all images in a batch
  • By extracting it to a variable, we eliminate 746 redundant lookups per batch
  • This is particularly impactful when processing larger batches (see the 200-image test showing similar gains)

Additional Cleanup:
The map_inference_kwargs method was removed as it simply returned kwargs unchanged. This eliminates an unnecessary method call (taking ~3.66ms in the original) and simplifies the code path. The kwargs are now passed directly to self._model.pre_process().

Performance Profile:

  • Line profiler shows the list comprehension time dropped from 33.1ms to 31.7ms (4.2% faster at the loop level)
  • The overall preprocess method improved from 42.8ms to 37.1ms (13% faster)
  • Test results confirm consistent 5-15% speedups across single images, batches, and edge cases

This optimization is most beneficial when preprocess is called frequently with batches of images, as the per-image overhead reduction compounds with batch size.

Correctness verification report:

Test Status
⏪ Replay Tests 🔘 None Found
⚙️ Existing Unit Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
🌀 Generated Regression Tests 6 Passed
📊 Tests Coverage 100.0%
🌀 Click to see Generated Regression Tests
import types

import inference.core.models.inference_models_adapters as adapters_module
import numpy as np

# imports
import pytest  # used for our unit tests
from inference.core.models.inference_models_adapters import (
    InferenceModelsObjectDetectionAdapter,
)


# Helper lightweight model used by tests to capture calls to pre_process.
# This is not a mock from unittest.mock; it's a tiny concrete object used only
# to observe how preprocess forwards its data to the model.
class MinimalModel:
    def __init__(self, return_value=None):
        self.last_called_with = None
        self.return_value = return_value if return_value is not None else {"ok": True}

    def pre_process(self, np_images, **kwargs):
        # Record what we received for assertions in tests
        self.last_called_with = (list(np_images), dict(kwargs))
        return self.return_value


def test_preprocess_single_image_invokes_load_and_model(monkeypatch):
    # Prepare a deterministic ndarray to be returned by the patched load_image_bgr.
    fake_image = np.zeros((8, 8, 3), dtype=np.uint8)

    # Track calls to the patched loader to assert disable_preproc_auto_orient usage.
    called = {"count": 0, "last_flag": None}

    def fake_load_image_bgr(value, disable_preproc_auto_orient=False):
        # Ensure the value passed through correctly (we don't assert a specific type)
        called["count"] += 1
        called["last_flag"] = disable_preproc_auto_orient
        # return a copy to ensure preprocess can't mutate the original easily
        return fake_image.copy()

    # Patch the adapter module's load_image_bgr function (it was imported there)
    monkeypatch.setattr(adapters_module, "load_image_bgr", fake_load_image_bgr)

    # Build an adapter instance without calling __init__ to avoid heavy external dependencies.
    adapter = object.__new__(InferenceModelsObjectDetectionAdapter)
    # Provide a minimal model that records calls and returns a known value.
    model = MinimalModel(return_value={"result": "single"})
    adapter._model = model

    # Call preprocess with a single "image" (could be any sentinel value)
    sentinel = "SINGLE_IMAGE_SENTINEL"
    codeflash_output = adapter.preprocess(sentinel, some_kw=1)
    out = codeflash_output  # 12.4μs -> 12.3μs (0.819% faster)
    np_images_passed, kwargs_passed = model.last_called_with


def test_preprocess_batch_images_and_disable_flag(monkeypatch):
    # Prepare a small batch
    batch = ["img0", "img1", "img2"]
    returned_images = [
        np.full((4, 4, 3), fill_value=i, dtype=np.uint8) for i in range(len(batch))
    ]

    # A loader that returns distinct arrays per invocation and records the flags
    call_info = {"flags": []}

    def fake_load_image_bgr(value, disable_preproc_auto_orient=False):
        # choose an image from returned_images based on the invocation count
        idx = len(call_info["flags"])
        call_info["flags"].append(disable_preproc_auto_orient)
        # Return a copy to avoid accidental shared-state modifications in tests
        return returned_images[idx].copy()

    monkeypatch.setattr(adapters_module, "load_image_bgr", fake_load_image_bgr)

    adapter = object.__new__(InferenceModelsObjectDetectionAdapter)
    model = MinimalModel(return_value={"result": "batch"})
    adapter._model = model

    # Call preprocess with the batch and force disable_preproc_auto_orient True
    codeflash_output = adapter.preprocess(batch, disable_preproc_auto_orient=True)
    out = codeflash_output  # 8.49μs -> 7.35μs (15.4% faster)
    np_images_passed, kwargs_passed = model.last_called_with
    # Each image received by model should match the distinct arrays returned by our fake loader
    for i, arr in enumerate(np_images_passed):
        pass


def test_preprocess_empty_list_calls_model_with_empty_list(monkeypatch):
    # Ensure loader would raise if called (it should not be called for an empty list)
    def loader_should_not_be_called(value, disable_preproc_auto_orient=False):
        raise AssertionError(
            "load_image_bgr must not be called for an empty input list"
        )

    monkeypatch.setattr(adapters_module, "load_image_bgr", loader_should_not_be_called)

    adapter = object.__new__(InferenceModelsObjectDetectionAdapter)
    model = MinimalModel(return_value={"result": "empty"})
    adapter._model = model

    codeflash_output = adapter.preprocess([], someflag=True)
    out = codeflash_output  # 3.30μs -> 3.00μs (10.1% faster)
    np_images_passed, kwargs_passed = model.last_called_with


def test_preprocess_uses_map_inference_kwargs(monkeypatch):
    # Simple loader that always returns the same array
    monkeypatch.setattr(
        adapters_module,
        "load_image_bgr",
        lambda v, disable_preproc_auto_orient=False: np.zeros(
            (2, 2, 3), dtype=np.uint8
        ),
    )

    adapter = object.__new__(InferenceModelsObjectDetectionAdapter)
    model = MinimalModel(return_value={"ok": "mapped"})
    adapter._model = model

    # Attach a custom map_inference_kwargs bound method to this instance that transforms kwargs.
    def custom_mapper(self, kwargs):
        # Return a new dict that intentionally alters/filters incoming kwargs
        return {"mapped_key": "mapped_value"}

    adapter.map_inference_kwargs = types.MethodType(custom_mapper, adapter)

    # Call preprocess with arbitrary kwargs; they should be replaced by custom_mapper output
    codeflash_output = adapter.preprocess("dummy_input", original="value")
    out = codeflash_output  # 7.82μs -> 7.43μs (5.13% faster)
    _, kwargs_passed = model.last_called_with


def test_preprocess_propagates_loader_exceptions(monkeypatch):
    # Patch loader to raise a ValueError for the first element
    def failing_loader(value, disable_preproc_auto_orient=False):
        raise ValueError("invalid image data")

    monkeypatch.setattr(adapters_module, "load_image_bgr", failing_loader)

    adapter = object.__new__(InferenceModelsObjectDetectionAdapter)
    adapter._model = MinimalModel()

    with pytest.raises(ValueError) as excinfo:
        adapter.preprocess("bad_input")  # 3.35μs -> 3.41μs (1.73% slower)


def test_preprocess_large_batch_handles_many_images(monkeypatch):
    # Create a batch of 200 small "images" to test scaling of the preprocessing step.
    batch_size = 200
    batch = [f"img_{i}" for i in range(batch_size)]

    # Loader that returns the same small ndarray for each call and counts calls
    call_count = {"n": 0}

    def generic_loader(value, disable_preproc_auto_orient=False):
        call_count["n"] += 1
        # return a tiny array unique by filling with the call index modulo 256 to keep memory low
        return np.full((1, 1, 3), fill_value=call_count["n"] % 256, dtype=np.uint8)

    monkeypatch.setattr(adapters_module, "load_image_bgr", generic_loader)

    adapter = object.__new__(InferenceModelsObjectDetectionAdapter)

    # Model returns the number of images it received to make verification simple
    class ReturnCountModel:
        def __init__(self):
            self.last_called_with = None

        def pre_process(self, np_images, **kwargs):
            self.last_called_with = (list(np_images), dict(kwargs))
            return {"received": len(np_images)}

    model = ReturnCountModel()
    adapter._model = model

    codeflash_output = adapter.preprocess(batch)
    out = codeflash_output  # 356μs -> 351μs (1.40% faster)
    np_images_passed, _ = model.last_called_with


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To test or edit this optimization locally git merge codeflash/optimize-pr1959-2026-02-04T11.48.16

Suggested change
np_images: List[np.ndarray] = [
load_image_bgr(
v,
disable_preproc_auto_orient=kwargs.get(
"disable_preproc_auto_orient", False
),
)
for v in images
]
mapped_kwargs = self.map_inference_kwargs(kwargs)
return self._model.pre_process(np_images, **mapped_kwargs)
disable_preproc_auto_orient = kwargs.get("disable_preproc_auto_orient", False)
np_images: List[np.ndarray] = [
load_image_bgr(v, disable_preproc_auto_orient=disable_preproc_auto_orient)
for v in images
]
return self._model.pre_process(np_images, **kwargs)

Static Badge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants