[NeurIPS 2024] An official implementation of ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
-
Updated
Oct 9, 2024 - Python
[NeurIPS 2024] An official implementation of ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
Open Source Generative Process Automation (i.e. Generative RPA). AI-First Process Automation with Large ([Language (LLMs) / Action (LAMs) / Multimodal (LMMs)] / Visual Language (VLMs)) Models
✨✨VITA: Towards Open-Source Interactive Omni Multimodal LLM
LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills
A Framework of Small-scale Large Multimodal Models
A collection of resources on applications of multi-modal learning in medical imaging.
An open-source implementation for training LLaVA-NeXT.
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
[CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation
Open Platform for Embodied Agents
The official evaluation suite and dynamic data release for MixEval.
A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision, qwen-vl, qwen2-vl, phi3-v etc.
Embed arbitrary modalities (images, audio, documents, etc) into large language models.
[ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captions
[NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"
A curated list of awesome Multimodal studies.
[ECCV 2024] BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models
The official code of the paper "Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate".
Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"
[ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models
Add a description, image, and links to the large-multimodal-models topic page so that developers can more easily learn about it.
To associate your repository with the large-multimodal-models topic, visit your repo's landing page and select "manage topics."