Releases: PaddlePaddle/PaddleMIX
Releases · PaddlePaddle/PaddleMIX
v2.1.0
更新内容
-
发布自研多模数据能力标签模型PP-InsCapTagger;可用于数据的分析和过滤,试验案例表明在保持模型效果的条件下可减少50%的数据量,大幅提高训练效率。
-
多模态大模型InternVL2、LLaVA、SD3、SDXL适配昇腾910B,提供国产计算芯片上的训推能力。
What's Changed
- 【pir 】modify dy2static Sd and 3. Grounding DINO model by @xiaoguoguo626807 in #689
- fix llava pretrain config by @pkhk-1 in #685
- Re-network the DIT, fix some parameters, and simplify the model networking code by @chang-wenbin in #632
- update DIT doc by @chang-wenbin in #693
- [NPU] Add llava npu doc by @Birdylx in #694
- sd3推理优化——避免同步 by @chang-wenbin in #695
- 减少重复拷贝,修复BUG by @chang-wenbin in #699
- Add Qwen2-VL infer codes by @nemonameless in #698
- [doc] Update requirements by @nemonameless in #703
- Llava bug by @LokeZhou in #704
- Fix is inference mode by @zhoutianzi666 in #711
- update readme by @lyuwenyu in #705
- update opensora video save method by @westfish in #712
- Limit the installed version of paddlenlp and fix bugs of llava-next. by @luyao-cv in #716
- SD3 transformer部分的优化 by @zhoutianzi666 in #713
- [wip] add mix scheme by @lyuwenyu in #664
- [NPU] InternVL2 supports npu training by @Birdylx in #714
- Add SD3 DreamBooth by @westfish in #686
- remove phi3 in internvl2 and refine format by @nemonameless in #715
- add flash_atten for qw2vl by @luyao-cv in #723
- [NPU] sdxl support NPU training by @wangna11BD in #719
- [NPU] sdxl lora support NPU training by @warrentdrew in #718
- Adapt fa for npu by @LielinJiang in #706
- [NPU] fix readme doc for SDXL LoRA training by @warrentdrew in #724
- [npu]sd3 dreambooth adapt for npu by @LielinJiang in #726
- add pp-inscaptagger by @pkhk-1 in #727
- ADD SD3 batch_parallel by @chang-wenbin in #731
- support auto parallel in dit and largedit by @jeff41404 in #551
- add env_run.sh and correct packages version by @luyao-cv in #733
- [NPU] Fix typo by @Birdylx in #696
- paddlemix v2.1 readme by @lyuwenyu in #734
- 修复paddlenlp develop版本适配错误_10-11 by @Xiaobin-Lu in #735
- 修复qwen2vl视频图像预处理 by @luyao-cv in #737
- [wip] update v2.1 readme by @lyuwenyu in #736
- fix internvl2 minimonkey dataset docs by @nemonameless in #741
- fix tests of evaclip and internvl2 by @nemonameless in #746
- image2text_generation rm use_fast by @LokeZhou in #744
- fix readme for llava_next_interleave by @luyao-cv in #748
- support Qwen2-VL sft training by @nemonameless in #739
- fix dit training by @nemonameless in #752
- fix tests by @nemonameless in #753
- remove use_fast in AutoTokenizer by @warrentdrew in #747
- fix dit weights convert to ppdiffusers by @nemonameless in #759
- [PPDiffusers]fix bugs and release 0.29.0 by @westfish in #742
- autolabel fix nltk download by @LokeZhou in #763
- [NPU] fix npu llava infer by @Birdylx in #757
- Add npu model list by @nepeplwu in #758
- Fix docs of by @nemonameless in #767
- merge upstream readme by @luyao-cv in #766
- correct huggingface_hub version by @luyao-cv in #771
- [NPU] Refine doc by @Birdylx in #774
New Contributors
- @xiaoguoguo626807 made their first contribution in #689
- @chang-wenbin made their first contribution in #632
- @wangna11BD made their first contribution in #719
- @LielinJiang made their first contribution in #706
- @jeff41404 made their first contribution in #551
- @Xiaobin-Lu made their first contribution in #735
- @nepeplwu made their first contribution in #758
Full Changelog: https://github.com/PaddlePaddle/PaddleMIX/commits/v2.1.0
v2.0.0
多模态理解
- 新增模型:LLaVA: v1.5-7b, v1.5-13b, v1,6-7b,CogAgent, CogVLM, Qwen-VL, InternLM-XComposer2
- 数据集增强:新增chatml_dataset图文对话数据读取方案,可自定义chat_template文件适配,支持混合数据集
- 工具链升级:新增Auto模块,统一SFT训练流程,兼容全参数、lora训练。新增mixtoken训练策略,SFT吞吐量提升5.6倍。支持Qwen-VL,LLaVA推理部署,较torch推理性能提升2.38倍
多模态生成
- 视频生成能力:支持Sora相关技术,支持DiT、SiT、UViT训练推理,新增NaViT、MAGVIT-v2模型; 新增视频生成模型SVD、Open Sora,支持模型微调和推理; 新增姿态可控视频生成模型AnimateAnyone、即插即用视频生成模型AnimateDiff、GIF视频生成模型Hotshot-XL;
- 文生图模型库:新增高速推理文图生成模型LCM,适配SD/SDXL训练和推理;
- 工具链升级:发布ppdiffusers 0.24.1版本,新增peft,accelerate后端; 权重加载/保存全面升级,支持分布式、模型切片、safetensors等场景。
- 生态兼容:提供基于ppdiffusers开发的ComfyUI插件,支持了常见的模型加载转换、文生图、图生图、图像局部修改等任务。新增Stable Diffusion 1.5系列节点;新增Stable Diffusion XL系列节点。新增4个图像生成的workflow案例。
DataCopilot(多模态数据处理工具箱)
- 多模态数据集类型MMDataset,支持加载和导出Json、H5、Jsonl等多种数据存储格式,内置并发(map, filter)数据处理接口等
- 多模态数据格式工具,支持自定义数据结构,数据转换,离线格式检查
- 多模态数据分析工具,支持基本的统计信息,数据可视化功能,以及注册自定义功能