ORYX

All

16 repositories

VideoGLaMM
Public
A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
vision-and-language lmm foundation-models vision-language-model llm-agent
0•33•3•0•Updated Nov 8, 2024Nov 8, 2024
Camel-Bench
Public
CAMEL-Bench is an Arabic benchmark for evaluating multimodal models across eight domains with 29,000 questions.
benchmark vqa arabic multimodal-learning visual-question-answering mbzuai large-multimodal-models
Python
•
MIT License
•1•19•0•0•Updated Oct 27, 2024Oct 27, 2024
BiMediX
Public
Bilingual Medical Mixture of Experts LLM
Other
•1•25•1•0•Updated Sep 25, 2024Sep 25, 2024
ClimateGPT
Public
[EMNLP'23] ClimateGPT: a specialized LLM for conversations related to Climate Change and Sustainability topics in both English and Arabic languages.
Python
•9•75•0•0•Updated Sep 24, 2024Sep 24, 2024
PALO
Public
(WACV 2025) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, Hindi, Bengali and Urdu.
Python
•
Apache License 2.0
•5•81•5•0•Updated Sep 10, 2024Sep 10, 2024
Video-ChatGPT
Public
[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
chatbot llama clip mulit-modal vision-language vicuna gpt-4 vision-language-pretraining llava video-chatboat
Python
•
Creative Commons Attribution 4.0 International
•108•1.2k•19•0•Updated Aug 27, 2024Aug 27, 2024
CVRR-Evaluation-Suite
Public
Official repository of paper titled "How Good is my Video LMM? Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs".
Python
•
Creative Commons Attribution 4.0 International
•3•42•0•0•Updated Aug 23, 2024Aug 23, 2024
BiMediX2
Public
Bio-Medical EXpert LMM with English and Arabic Language Capabilities
0•2•0•0•Updated Aug 12, 2024Aug 12, 2024
VideoGPT-plus
Public
Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding
chatbot clip image-encoder video-encoder multimodal dual-encoder vision-language vicuna gpt4 vision-language-pretraining
Python
•
Creative Commons Attribution 4.0 International
•15•217•15•1•Updated Aug 11, 2024Aug 11, 2024
XrayGPT
Public
[BIONLP@ACL 2024] XrayGPT: Chest Radiographs Summarization using Medical Vision-Language Models.
Python
•56•469•17•2•Updated Aug 8, 2024Aug 8, 2024
GeoChat
Public
[CVPR 2024 🔥] GeoChat, the first grounded Large Vision Language Model for Remote Sensing
remote-sensing vlm
Python
•36•447•31•1•Updated Jul 25, 2024Jul 25, 2024
LLaVA-pp
Public
🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)
conversation lmms vision-language llm llava llama3 phi3 llava-llama3 llava-phi3 llama3-llava
Python
•60•813•16•2•Updated Jul 10, 2024Jul 10, 2024
groundingLMM
Public
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
vision-and-language lmm foundation-models vision-language-model llm-agent
Python
•37•781•22•0•Updated Jun 2, 2024Jun 2, 2024
MobiLlama
Public
MobiLlama : Small Language Model tailored for edge devices
slm llm efficient-llm mobile-llm tiny-llm
Python
•
Apache License 2.0
•44•595•13•2•Updated Mar 3, 2024Mar 3, 2024
Video-LLaVA
Public
PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models
video transcription lmm grounding video-grounding llm video-conversation
Python
•11•245•14•0•Updated Jan 2, 2024Jan 2, 2024
Awesome-CV-Foundational-Models
Public
27•7•0•0•Updated Jul 31, 2023Jul 31, 2023