Releases: facebookresearch/fairscale
Releases · facebookresearch/fairscale
v0.3.3: [chore] 0.3.3 release (#568)
- releasing 0.3.3
- I need it in vissl for the auto_wrap_bn change
v0.3.2
[chore] 0.3.2 release (#535)
v0.3.1: [chore] 0.3.1 release (#504)
* [chore] 0.3.1 release
- mainly because vissl needs the new version
- added a doc on release steps
* Update CHANGELOG.md
Co-authored-by: anj-s <[email protected]>
* review comments
Co-authored-by: anj-s <[email protected]>
v0.3.0
[0.3.0] - 2021-02-22
Added
- FullyShardedDataParallel (FSDP) (#413)
- ShardedDDP fp16 grad reduction option (#402)
- Expose experimental algorithms within the pip package (#410)
Fixed
- Catch corner case when the model is too small with respect to the world size, and shards are empty (#406)
- Memory leak in checkpoint_wrapper (#412)
v0.1.7
Fixed
- ShardedDDP and OSS handle model trainability changes during training (#369)
- ShardedDDP state dict load/save bug (#386)
- ShardedDDP handle train/eval modes (#393)
- AdaScale handling custom scaling factors (#401)
Added
- ShardedDDP manual reduce option for checkpointing (#389)
v0.1.6
Added
- Checkpointing model wrapper (#376)
- Faster OSS, flatbuffers (#371)
- Small speedup in OSS clipgradnorm (#363)
Fixed
- Bug in ShardedDDP with 0.1.5 depending the init (KeyError / OSS)
- Much refactoring in Pipe (#357, #358, #360, #362, #370, #373)
- Better pip integration / resident pytorch (#375)
v0.1.5
Added
- Pytorch compatibility for OSS checkpoints (#310)
- Elastic checkpoints for OSS, world size can vary in between save and loads (#310)
- Tensor views for OSS bucketing, reduced CPU use (#300)
- Bucket calls in ShardedDDP, for faster inter node communications (#327)
- FlattenParamWrapper, which flattens module parameters into a single tensor seamlessly (#317)
- AMPnet experimental support (#304)
Fixed
- ShardedDDP properly handles device changes via .to() (#353)
- Add a new interface for AdaScale, AdaScaleWrapper, which makes it compatible with OSS (#347)
v0.1.4
Fixed
- Missing cu files in the pip package
v0.1.3
Same as 0.1.2, but with the correct numbering in the source code (see init.py)
v0.1.2
Added
- AdaScale: Added gradient accumulation feature (#202)
- AdaScale: Added support of torch.lr_scheduler (#229)
Fixed
- AdaScale: smoothing factor value fixed when using gradient accumulation (#235)
- Pipe: documentation on balancing functions (#243)
- ShardedDDP: handle typical NLP models
- ShardedDDP: better partitioning when finetuning