Skip to content

v0.2.4 add `generate_until_multi_round` to support interative and multi-round evaluations; add models and fix glitches

Latest
Compare
Choose a tag to compare
@Luodian Luodian released this 03 Oct 15:33
· 29 commits to main since this release
af395ae

What's Changed

  • [Fix] Fix bugs in returning result dict and bring back anls metric by @kcz358 in #221
  • fix: fix wrong args in wandb logger by @Luodian in #226
  • [feat] Add check for existence of accelerator before waiting by @Luodian in #227
  • add more language tasks and fix fewshot evaluation bugs by @Luodian in #228
  • Remove unnecessary LM object removal in evaluator by @Luodian in #229
  • [fix] Shallow copy issue by @pufanyi in #231
  • [Minor] Fix max_new_tokens in video llava by @kcz358 in #237
  • Update LMMS evaluation tasks for various subjects by @Luodian in #240
  • [Fix] Fix async append result in different order issue by @kcz358 in #244
  • Update the version requirement for transformers by @zhijian-liu in #235
  • Add new LMMS evaluation task for wild vision benchmark by @Luodian in #247
  • Add raw score to wildvision bench by @Luodian in #250
  • [Fix] Strict video to be single processing by @kcz358 in #246
  • Refactor wild_vision_aggregation_raw_scores to calculate average score by @Luodian in #252
  • [Fix] Bring back process result pbar by @kcz358 in #251
  • [Minor] Update utils.py by @YangYangGirl in #249
  • Refactor distributed gathering of logged samples and metrics by @Luodian in #253
  • Refactor caching module and fix serialization issue by @Luodian in #255
  • [Minor] Bring back fix for metadata by @kcz358 in #258
  • [Model] support minimonkey model by @white2018 in #257
  • [Feat] add regression test and change saving logic related to output_path by @Luodian in #259
  • [Feat] Add support for llava_hf video, better loading logic for llava_hf ckpt by @kcz358 in #260
  • [Model] support cogvlm2 model by @white2018 in #261
  • [Docs] Update and sort current_tasks.md by @pbcong in #262
  • fix error name with infovqa task by @ZhaoyangLi-nju in #265
  • [Task] Add MMT and MMT_MI (Multiple Image) Task by @ngquangtrung57 in #270
  • mme-realworld by @yfzhang114 in #266
  • [Model] support Qwen2 VL by @abzb1 in #268
  • Support new task mmworld by @jkooy in #269
  • Update current tasks.md by @pbcong in #272
  • [feat] support video evaluation for qwen2-vl and add mix-evals-video2text by @Luodian in #275
  • [Feat][Task] Add multi-round evaluation in llava-onevision; Add MMSearch Benchmark by @CaraJ7 in #277
  • [Fix] Model name None in Task manager, mix eval model specific kwargs, claude retrying fix by @kcz358 in #278
  • [Feat] Add support for evaluation of Oryx models by @dongyh20 in #276
  • [Fix] Fix the error when running models caused by generate_until_multi_round by @pufanyi in #281
  • [fix] Refactor GeminiAPI class to add video pooling and freeing by @pufanyi in #287
  • add jmmmu by @AtsuMiyai in #286
  • [Feat] Add support for evaluation of InternVideo2-Chat && Fix evaluation for mvbench by @yinanhe in #280

New Contributors

Full Changelog: v0.2.3...v0.2.4