multimodal-models

Star

Here are 22 public repositories matching this topic...

uncbiag / Awesome-Foundation-Models

Star

A curated list of foundation models for vision and language tasks

transformer-models vision-transformer multimodal-models foundation-models large-language-models

Updated Jun 23, 2025

AIDC-AI / Awesome-Unified-Multimodal-Models

Star

Awesome Unified Multimodal Models

multimodal-models text-to-image-generation vision-language-model multimodal-large-language-models unified-multimodal-models

Updated Feb 6, 2026

YingqingHe / Awesome-LLMs-meet-Multimodal-Generation

Star

🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).

text-to-speech multimodality text-to-image text-to-audio text-to-video text-to-music multimodal-models aigc large-language-models llm text-to-3d multimodal-generation mllm text-to-sound large-vision-language-models multimodal-large-language-models lvlm

Updated Apr 4, 2025
HTML

zli12321 / Vision-Language-Models-Overview

Star

A most Frontend Collection and survey of vision-language model papers, and models GitHub repository. Continuous updates.

reinforcement-learning clip claude world-models multimodal-models sota-model llava blip2 gpt-4v gemini-pro deepseek vision-language-models qwen-vl llama-vision-model multimodal-benchmarks vision-language-model-applications finevision-pretrain-dataset

Updated Feb 5, 2026

uni-medical / Project-Imaging-X

Star

Project Imaging-X: A Survey of 1000+ Open-Access Medical Imaging Datasets for Foundation Model Development

survey open-science ultrasound radiology ophthalmology pathology medical-image-analysis endoscopy fundus dermoscopy multimodal-models foundation-models

Updated Jan 7, 2026

OpenSenseNova / SenseNova-SI

Star

Scaling Spatial Intelligence with Multimodal Foundation Models

multimodal-models mllm spatial-intelligence mllm-for-3d

Updated Feb 6, 2026
Python

EvolvingLMMs-Lab / EASI

Star

Holistic Evaluation of Multimodal LLMs on Spatial Intelligence

multimodal-models mllm spatial-intelligence mllm-evaluation

Updated Jan 21, 2026
Dockerfile

thaoshibe / awesome-personalized-lmms

Star

A curated list of Awesome Personalized Large Multimodal Models resources

awesome personalization awesome-list personalized multimodal-models large-language-models personalized-generation large-multimodal-models

Updated Feb 4, 2026

arman-aminian / video-search

Star

Video Search with CLIP

nlp image-search clip zero-shot video-search multimodal multilingual-models multimodal-models

Updated Aug 13, 2023
Jupyter Notebook

Shwai-He / VLM-Compression

Star

The official implementation of the paper "Rethinking Pruning for Vision-Language Models: Strategies for Effective Sparsity".

sparsity model-compression multimodal-models lora-fine-tuning

Updated Jul 2, 2024
Python

CASE-Lab-UMD / Capacity-Aware-MoE

Star

The official implementation of the paper "Capacity-Aware Inference: Mitigating the Straggler Effect in Mixture of Experts" (ICLR 2026).

language-models load-balancing mixture-of-experts inference-acceleration multimodal-models test-time-optimization

Updated Nov 14, 2025
Python

AmitPeleg / CLIC

Star

Implementation of the paper "Advancing Compositional Awareness in CLIP with Efficient Fine-Tuning", arXiv, 2025

retrieval clip compositionality multimodal-models

Updated Oct 22, 2025
Python

Shwai-He / Capacity-Aware-MoE

Star

The official implementation of the paper "Capacity-Aware Inference: Mitigating the Straggler Effect in Mixture of Experts" (ICLR 2026).

language-models load-balancing mixture-of-experts inference-acceleration multimodal-models test-time-optimization

Updated Oct 27, 2025
Python

pokarats / LAP-final-project

Star

Multimodal Bi-Transformers (MMBT) in Biomedical Text/Image Classification

text-classification transformer image-classification transfer-learning attention-mechanism bert biomedical-image-processing attention-visualization multimodal-representation huggingface-transformers sparse-data-learning multimodal-models mmbt-model

Updated Apr 13, 2021
Jupyter Notebook

NanoOWL Detection System enables real-time open-vocabulary object detection in ROS 2 using a TensorRT-optimized OWL-ViT model. Describe objects in natural language and detect them instantly on panoramic images. Optimized for NVIDIA GPUs with .engine acceleration.

computer-vision natural-language transformers object-detection multimodal-models

Updated May 13, 2025
C++

rayford295 / GeoAgent4Disaster

Star

A Multi-Agent GeoAI Framework for Multimodal Disaster Perception, Restoration, Damage Recognition, and Reasoning

remote-sensing geoai multimodal-models street-view-imagery disaster-assessment

Updated Dec 10, 2025
Jupyter Notebook

nitya / model-mondays

Star

Model Mondays is a weekly livestreamed series on Microsoft Reactor that helps you make informed model choice decisions with timely updates and model deep-dives. Watch live for the content. Join Discord for the discussions.

python model-choice multilingual-models multimodal-models large-language-models model-catalog generative-ai small-language-models github-models azure-ai-foundry reasoning-models model-mondays

Updated Jan 27, 2026
Jupyter Notebook

MikyLanfra / Latent-Space-Learning-for-GPI

Star

Repository containing experiments relative to latent space temporal structure enforcement, performed as part of a project at EPFL University in collaboration with the IDIAP Lab

machine-learning ai robotics ml artificial-intelligence autoencoder bilevel-optimization pretrained-models imitation-learning blip latent-space contrastive-learning multimodal-models dinov3

Updated Jan 9, 2026
Jupyter Notebook

sitammeur / gemma3-litserve

Star

Leverage Gemma 3's capabilities using LitServe.

multilingual python deep-learning transformers artificial-intelligence fastapi multimodal-models lightning-ai litserve gemma3

Updated Mar 17, 2025
Python

sitammeur / videollama3-litserve

Star

Leverage VideoLLaMA 3's capabilities using LitServe.

python deep-learning transformers pytorch artificial-intelligence video-understanding fastapi multimodal-models lightning-ai litserve

Updated Feb 19, 2025
Python

Improve this page

Add a description, image, and links to the multimodal-models topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the multimodal-models topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multimodal-models

Here are 22 public repositories matching this topic...

uncbiag / Awesome-Foundation-Models

AIDC-AI / Awesome-Unified-Multimodal-Models

YingqingHe / Awesome-LLMs-meet-Multimodal-Generation

zli12321 / Vision-Language-Models-Overview

uni-medical / Project-Imaging-X

OpenSenseNova / SenseNova-SI

EvolvingLMMs-Lab / EASI

thaoshibe / awesome-personalized-lmms

arman-aminian / video-search

Shwai-He / VLM-Compression

CASE-Lab-UMD / Capacity-Aware-MoE

AmitPeleg / CLIC

Shwai-He / Capacity-Aware-MoE

pokarats / LAP-final-project

RubenCasal / owl_vit_detector

rayford295 / GeoAgent4Disaster

nitya / model-mondays

MikyLanfra / Latent-Space-Learning-for-GPI

sitammeur / gemma3-litserve

sitammeur / videollama3-litserve

Improve this page

Add this topic to your repo