Request: Prioritize multimodal image recognition release

Hi shantoislamdev, appreciate the ongoing work—README notes this feature is in development. Could you please prioritize and share an ETA for multimodal image recognition (image + text)? It’s critical for upcoming workflows (OCR, charts/screenshots understanding). Thanks!