large-multimodal-models

Here are 46 public repositories matching this topic...

ShareGPT4Omni / ShareGPT4Video

[NeurIPS 2024] An official implementation of ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

gpt sora text-to-video large-language-models chatgpt large-vision-language-models large-multimodal-models gpt-4v large-video-language-models

Updated Oct 9, 2024
Python

OpenAdaptAI / OpenAdapt

Sponsor

Star

Open Source Generative Process Automation (i.e. Generative RPA). AI-First Process Automation with Large ([Language (LLMs) / Action (LAMs) / Multimodal (LMMs)] / Visual Language (VLMs)) Models

Updated Nov 15, 2024
Python

VITA-MLLM / VITA

Star

✨✨VITA: Towards Open-Source Interactive Omni Multimodal LLM

multimodal-large-language-models large-multimodal-models

Updated Oct 24, 2024
Python

LLaVA-VL / LLaVA-Plus-Codebase

Star

LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills

agent tool-use large-language-models multimodal-large-language-models large-multimodal-models

Updated Feb 1, 2024
Python

TinyLLaVA / TinyLLaVA_Factory

Star

A Framework of Small-scale Large Multimodal Models

nlp transformers llama vision-language llava large-multimodal-models tinyllama

Updated Oct 16, 2024
Python

richard-peng-xia / awesome-multimodal-in-medical-imaging

Star

A collection of resources on applications of multi-modal learning in medical imaging.

medical-imaging multimodal-learning visual-question-answering multimodal-deep-learning large-language-models medical-report-generation multimodal-large-language-models large-multimodal-models

Updated Nov 11, 2024

xiaoachen98 / Open-LLaVA-NeXT

Star

An open-source implementation for training LLaVA-NeXT.

chatbot llama multimodal multi-modality gpt-4 visual-language-learning chatgpt vision-language-model llava large-multimodal-models llama3 gpt4o llava-next

Updated Oct 23, 2024
Python

MMMU-Benchmark / MMMU

Star

This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"

machine-learning natural-language-processing deep-neural-networks computer-vision deep-learning evaluation question-answering stem multimodality multimodal-learning visual-question-answering multimodal multimodal-deep-learning foundation-models large-language-models llm llms large-multimodal-models

Updated Nov 10, 2024
Python

shikiw / OPERA

Star

[CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation

chatbot llama multimodal gpt-4 chatgpt vision-language-model vision-language-learning large-multimodal-models

Updated Aug 24, 2024
Python

thunlp / LEGENT

Star

Open Platform for Embodied Agents

physics-engine robot-simulator language-grounding embodied-ai large-multimodal-models

Updated Oct 13, 2024
Python

Psycoy / MixEval

Star

The official evaluation suite and dynamic data release for MixEval.

benchmark evaluation benchmarking-suite evaluation-framework benchmarking-framework foundation-models large-language-models large-language-model llm-inference llm-evaluation large-multimodal-models llm-evaluation-framework benchmark-mixture mixeval

Updated Nov 10, 2024
Python

zjysteven / lmms-finetune

Star

A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision, qwen-vl, qwen2-vl, phi3-v etc.