-
Notifications
You must be signed in to change notification settings - Fork 440
Description
Prerequisite
- I have searched Issues and Discussions but cannot get the expected help.
- The bug has not been fixed in the latest version(https://github.com/open-mmlab/mmengine).
Environment
OrderedDict([
('sys.platform', 'linux'),
('Python', '3.11.14'),
('CUDA available', True),
('GPU 0,1', 'NVIDIA GeForce RTX 3090'),
('CUDA_HOME', '/usr/local/cuda'),
('NVCC', 'Cuda compilation tools, release 12.6, V12.6.85'),
('GCC', '9.4.0'),
('PyTorch', '2.1.0+cu121'),
('TorchVision', '0.16.0+cu121'),
('OpenCV', '4.11.0'),
('MMEngine', '0.10.7'),
])
Reproduces the problem - code sample
🧩 Minimal config to reproduce
vis_backends = [
dict(type='LocalVisBackend'),
dict(
type='MLflowVisBackend',
tracking_uri="http://localhost:2222",
exp_name='vitdet_coco',
run_name=None,
),
]
visualizer = dict(
type='DetLocalVisualizer',
vis_backends=vis_backends,
name='visualizer',
)
default_hooks = dict(
logger=dict(type='LoggerHook', interval=50),
visualization=dict(
type='DetVisualizationHook',
draw=True,
interval=1,
),
)During validation, DetVisualizationHook calls:
visualizer.add_image('val_img', image, step)- LocalVisBackend → saves
val_img_{step}.png✅ - MLflowVisBackend → passes
val_imgtomlflow.log_image()❌
Below is a minimal end-to-end reproduction using a tiny COCO subset so it runs quickly, but still triggers a validation visualization step and crashes in MLflowVisBackend.add_image() due to an extension-less image name.
1) Prerequisites
-
MMDetection checkout with COCO available at
data/coco/:data/coco/annotations/instances_train2017.jsondata/coco/annotations/instances_val2017.jsondata/coco/train2017/*.jpgdata/coco/val2017/*.jpg
-
MLflow tracking server running locally (example):
mlflow server --host 0.0.0.0 --port 2222
2) Create a minimal debug config
Save this as configs/_debug/mlflow_visbackend_no_ext_repro.py:
# Repro config: triggers MLflowVisBackend crash when name has no extension
_base_ = ['./vitdet_mask-rcnn_vit-b-dinov3.py'] # any base detector config that runs on COCO
# Make it fast: 10 train iters then run val once.
train_cfg = dict(type="IterBasedTrainLoop", max_iters=10, val_interval=10)
# Ensure validation is very small (2 iters) but still calls visualization hook.
train_dataloader = dict(
dataset=dict(
indices=64,
# avoid empty dataset when taking first N
filter_cfg=dict(filter_empty_gt=False, min_size=0),
)
)
val_dataloader = dict(dataset=dict(indices=2))
test_dataloader = val_dataloader
# Enable visualization every iter.
default_hooks = dict(
logger=dict(type='LoggerHook', interval=1),
checkpoint=dict(type='CheckpointHook', by_epoch=False, interval=10, save_last=True),
visualization=dict(
type='DetVisualizationHook',
draw=True,
interval=1,
# show=False is default; when show=False, DetVisualizationHook uses an extension-less name
),
)
# Use both LocalVisBackend (works) and MLflowVisBackend (crashes).
vis_backends = [
dict(type='LocalVisBackend'),
dict(
type='MLflowVisBackend',
tracking_uri="http://localhost:2222",
exp_name='vitdet_coco',
run_name=None,
),
]
visualizer = dict(type='DetLocalVisualizer', vis_backends=vis_backends, name='visualizer')Reproduces the problem - command or script
3) Run training
python tools/train.py configs/_debug/mlflow_visbackend_no_ext_repro.pyReproduces the problem - error message
Traceback (most recent call last):
File "/home/kpysanyi/xraivision-backbone/.venv/lib/python3.11/site-packages/PIL/Image.py", line 2526, in save
format = EXTENSION[ext]
~~~~~~~~~^^^^^
KeyError: ''
The above exception was the direct cause of the following exception:
...
File "/home/kpysanyi/xraivision-backbone/.venv/lib/python3.11/site-packages/mmengine/visualization/vis_backend.py", line 784, in add_image
self._mlflow.log_image(image, name)
File "/home/kpysanyi/xraivision-backbone/.venv/lib/python3.11/site-packages/mlflow/tracking/fluent.py", line 1473, in log_image
MlflowClient().log_image(run_id, image, artifact_file, key, step, timestamp, synchronous)
File "/home/kpysanyi/xraivision-backbone/.venv/lib/python3.11/site-packages/mlflow/tracking/client.py", line 2797, in log_image
image.save(tmp_path)
File "/home/kpysanyi/xraivision-backbone/.venv/lib/python3.11/site-packages/PIL/Image.py", line 2529, in save
raise ValueError(msg) from e
ValueError: unknown file extension:Additional information
🔍 Root cause
Inconsistency between backends:
LocalVisBackend.add_image()forces a valid filenameMLflowVisBackend.add_image()assumesnamealready includes an extensionstepargument is ignored in MLflow backend
This makes MLflowVisBackend fragile and incompatible with existing hooks.
✅ Proposed solutions
Either of the following would fix the issue cleanly:
Option A (configurable)
Add an argument to MLflowVisBackend, e.g.
auto_append_ext=Truewhich would automatically transform names like val_img → val_img_{step}.png.
Option B (default behavior)
Make MLflowVisBackend.add_image() mirror LocalVisBackend behavior:
- If
namehas no extension, append_{step}.pngby default.
Example logic:
if '.' not in os.path.basename(name):
name = f'{name}_{step}.png'
self._mlflow.log_image(image, name)This would:
- Make behavior consistent across backends
- Respect the existing
stepargument - Avoid hard-to-debug runtime crashes
I’d be happy to submit a PR implementing this fix (either as a default behavior or a configurable option), if that aligns with the maintainers’ preferences.