-
👋 Hi, I’m Teo Wu (officially Haoning Wu), working on LMMs in Rhymes AI, closely advised by Dongxu Li and Junnan Li. Prior to this, I have been a PhD candidate (preparing thesis defense) in Nanyang Technological University 🇸🇬, supervised by Prof. Weisi Lin. I obtained by B.S. degree of computer science in Peking University (北京大学).
-
I am currently focusing on LMM pre-training and evaluation (video & longer context & better instruction-following). See our LongVideoBench, the first video benchmark for LMMs proven improvable given more input frames (>=256). I have also contributed to video and long-context training of Aria (Model, Paper, GitHub), an excellent open-source native MoE LMM with abilities matching GPT-4o-mini/Gemini-1.5-Flash in only 3.9B activated parameters.
-
🌱 I have also been the lead of project Q-Future: Visual Evaluation with LMMs📹, on which 7 first-authored papers accepted in top conferences and journels including ICML, ICLR, NeurIPS, TPAMI, CVPR, ECCV and ACMMM. The flagship scorer, OneAlign has been downloaded more than 238K times (until Jul 25, 2024) on HuggingFace.
-
Prior to LMMs, my PhD topic was on video quality assessment, a traditional area trying to gauge the quality scores (and more) on videos. Among 6 papers published in that area (in ECCV, ICCV, TPAMI, etc), the two representative works are FAST-VQA and DOVER, which have been most-used baselines in that field.
-
📫 Reach me by e-mail: [email protected]/[email protected], Twitter: Twitter
-
Rhymes AI Singapore
- A seat that sees all tourists taking photos with Merlion
-
12:23
(UTC +08:00) - teowu.github.io
- @HaoningTimothy
Pinned Loading
-
rhymes-ai/Aria
rhymes-ai/Aria PublicCodebase for Aria - an Open Multimodal Native MoE
-
longvideobench/LongVideoBench
longvideobench/LongVideoBench Public[Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.
-
Q-Future/Co-Instruct
Q-Future/Co-Instruct Public④[ECCV 2024 Oral, Comparison among Multiple Images!] A study on open-ended multi-image quality comparison: a dataset, a model and a benchmark.
-
Q-Future/Q-Align
Q-Future/Q-Align Public③[ICML2024] [IQA, IAA, VQA] All-in-one Foundation Model for visual scoring. Can efficiently fine-tune to downstream datasets.
-
Q-Future/Q-Instruct
Q-Future/Q-Instruct Public②[CVPR 2024] Low-level visual instruction tuning, with a 200K dataset and a model zoo for fine-tuned checkpoints.
-
Q-Future/Q-Bench
Q-Future/Q-Bench Public①[ICLR2024 Spotlight] (GPT-4V/Gemini-Pro/Qwen-VL-Plus+16 OS MLLMs) A benchmark for multi-modality LLMs (MLLMs) on low-level vision and visual quality assessment.
If the problem persists, check the GitHub status page or contact support.