16 Dec 01:50

bghira

854e191

v0.8.1 - fix for bucketing

What's Changed

min-snr-gamma now calculates by only adding 1 to the divisor
fix for bucketing error in (#240)

Full Changelog: v0.8.0...v0.8.1-fix

Assets 2

08 Dec 20:11

bghira

v0.8.0

dcba15e

v0.8.0 - improved quality and crop-training support

Changelog

Breaking Changes

SDXL Launch Script Format: Updated launch script format, set new defaults, and rearranged options, introducing breaking changes.
- PR: #226 by @bghira

New Features

BucketManager: Enhanced to remove images that are too small, now controlled via --minimum_image_size.
- PR: #223 by @bghira
Captioning Toolkit: Advanced CogVLM captioner and a basic LLaVA captioner now available.
- PR: #222 by @bghira
Crop Options: Added --crop_style and --crop_aspect options for improved control over cropping behavior.
- PR: #222 by @bghira
Validation Negative Prompt: Added --validation_negative_prompt option.
- PR: #222 by @bghira

Enhancements and Refinements

Learning Rate Scheduler: Distinguished between cosine and cosine_with_restarts schedulers. The default LR scheduler is now cosine.
- PR: #222 by @bghira
DeepSpeed for SD 2.x: Integrated DeepSpeed for improved performance in Stable Diffusion 2.x models.
- PR: #222 by @bghira
Downsampling Method: Switched to using LANCZOS for downsampling to reduce image artifacts compared to BICUBIC.
- PR: #222 by @bghira
Diffusers Update: Adapted to the new version of the diffusers library and fixed issues related to the refactored config style.
- PR: #228 by @bghira

Bug Fixes and Improvements

Captioning Dropout: Enhanced to also drop conditioning inputs, ensuring a more consistent dropout mechanism.
- PR: #229 by @bghira
Unit Tests: Added unit tests for random cropping within image boundaries and updated VAE Cache to accommodate random crop coordinates.
- PR: #229 by @bghira
EMA Model Params: Optimized logging to not print EMA (Exponential Moving Average) model parameters.
- PR: #227 by @bghira
Dropout Code Conflict: Removed unnecessary conflicting dropout code.
- PR: #230 by @bghira

Full Changelog: v0.7.4...v0.8.0

Contributors

bghira

Assets 2

0 Join discussion

26 Nov 00:42

bghira

v0.7.4

de88262

v0.7.4 - crop conditional bugfix

What's Changed

CSV Downloader: MJ dataset compatibility improvements
BucketManager: periodically save image metadata during aspect bucket caching every 1 hour by default
Crop conditional fix so that we only alter image size if it is smaller, before we crop, fixing a mismatched tensor size error by @bghira in #220
MultiAspectImage: remove center_crop arg by @bghira in #221

Full Changelog: v0.7.3...v0.7.4

Contributors

bghira

Assets 2

05 Nov 01:40

bghira

v0.7.3

ffdd057

v0.7.3

Model release

Available on Huggingface Hub for use with any v-prediction/zero-terminal SNR capable platform, such as Diffusers:

Gamma model:

What's Changed

EMA decay option --ema_decay
Captioner: offload BLIP + CLIP models
Renamed --learning_rate_end to lr_end and scale_lr to lr_scale
Fix env file parsing to use LR_WARMUP_STEPS
Update dependencies
Fix bug with DictProxy failure on local filesystem training
Fix cosine annealing warm restarts so that it actually cosines and allows changing LR on startup
Fix eta_min value
Offset noise and input perturbations are now probabilistically applied 25% of the time when used
VAECache fix for a crash when we encounter a skipped future in multiprocessing setups or multinode training where one node gets ahead of the other

Contributors

All work by @bghira in #216

Full Changelog: v0.7.2...v0.7.3

Contributors

bghira

Assets 2

23 Oct 02:31

bghira

v0.7.2

27fa5b4

v0.7.2 - speed fix for large datasets

What's Changed

VAECache optimization & refactor by @bghira in #210
Fix README hyphens by @Beinsezii in #213
SDXL Checkpoint saving is now atomic
Fix inefficiency in aspect bucket sampler is_seen check
Updated README (minor) by @bghira in #215

New Contributors

@Beinsezii made their first contribution in #213

Full Changelog: v0.7.1...v0.7.2

Contributors

Beinsezii and bghira

Assets 2

15 Oct 19:45

bghira

v0.7.1

f37884b

v0.7.1 - maintenance release

What's Changed

Fix a zero-training run that merely exports the pipeline from hitting an error
Reduced wasteful CPU use of aspect bucketing, it should now use more available CPU percent for more meaningful progress
Reduce more warnings from PIL, especially about transparent images (RGBA) and annoying Transformers/Diffusers log outputs
Improve the efficiency of VAE encoding on many-GPU systems. It's still problematic, especially for cloud storage backend
A weak attempt at resolving the memory use of text embed caching causing OOMs for large datasets. Probably unresolved still, but the issue goes away after multiple execution passes, so this is a low-priority concern.

Full Changelog: v0.7.0...v0.7.1

Assets 2

13 Oct 01:28

bghira

v0.7.0

7a54f38

v0.7.0 - tune SDXL in just 12G VRAM!

DeepSpeed lives!

Now correctly integrated, DeepSpeed allows for training SDXL's full U-net at 8 seconds per iteration on just ~12G of VRAM. See the documentation for more information!

Features

Added --validation_torch_compile option so that you can try using the PyTorch inductor purely at inference time. Your mileage may vary.
The Parquet/CSV to S3 dataset script now treats LFS repositories as thin, downloading a single parquet catalogue at once just before parsing it, rather than requiring the entire repository to be pulled. Makes it possible to process LAION-COCO with just a few GB of disk spent.
- The S3 dataset script now also uses multiprocessing rather than multithreading. If you have been using it with a large worker size, it was likely bottlenecked on luminance value calculations. Now, it will happily use all available CPU.
- There is a new --minimum_pixel_area option, which, measured in megapixels, will allow selecting SDXL-compatible training images by default. Set this value to 0 and instead supply --minimum_resolution=768 for example, to revert to the previous behaviour.

Changes

SD 2.x always uses zero-terminal SNR now. There's no reason not to use it.
The final validations of the model's training are now in line with the training validations. They now use the same code.
Progress bars now tend to disappear once their job is done, unless the output is redirected to a log file.
More statistics are being logged to Weights & Biases now - including the current epoch, EMA decay rate (if in use) and timestep selection bias, which needs to be configured as a historyTable to show all previous steps.

Bugfixes

EMA model optimization_step was going out of sync with global_step - they are now always in sync.
- The impact of this is that the EMA decay rate was greatly under-calculated, reducing the normalisation factor of the EMA model.
SD 2.x trainer fixes / improvements. Tested on 14x 4090 24G system. Does not implement DeepSpeed yet.
Better logging for a couple scenarios where things hit the proverbial fan due to residual VAE caches from previous runs.
Now --disable_compel does not explode anything when you provide it. Compel is properly disabled.
During final validation, the shortnames in Weights & Biases are reflected correctly.
The range option for --timestep_bias_strategy has now been fixed. It was missing from the list of available choices.

Full Changelog: v0.6.1...v0.7.0

Assets 2

06 Oct 21:41

bghira

v0.6.1

aa6b486

v0.6.1 - melodramatic robot edition

What's Changed

Random bucket sampling fix by @bghira in #205
Fixing arguments not known error
Fix unloading of text encoders when --fully_unload_text_encoder is given
Fix SD 2.x trainer
Reduce multi-GPU logging
Log errored-out PyTorch tensor files, when we fail to load one
Adding more prompts to the built-in library to demonstrate difficult compositions/contrast by @bghira in #206

Full Changelog: v0.6.0...v0.6.1

Contributors

bghira

Assets 2

02 Oct 00:27

bghira

v0.6.0

5c1bb64

v0.6.0 - robust multiGPU training, long validation prompts

What's Changed

Compel for long prompt handling, opt-out via --disable_compel by @bghira in #196
Bugfix/adafactor 24gb by @bghira in #197
Min-SNR Gamma fix for v_prediction (round twelve thousand) by @bghira in #198
Release by @bghira in #199
v0.5.2 by @bghira in #200
Use a seed per-gpu by default, and allow persistent workers by @bghira in #201
Merge pull request #200 from bghira/main by @bghira in #202
v0.6.0 by @bghira in #203
Fix Epoch tracking by trimming aspect buckets to remove inaccessible samples
Fix directory creation on local backend
Fix luminance tracking
Fix multi-GPU training hang on epoch rollover by @bghira in #204

Potential issues

Long prompt weighting in Compel might be wonky, remove the cache/ directory entries and try again with --disable_compel if any tensor size issues arise.

Full Changelog: v0.5.1...v0.6.0

Contributors

bghira

Assets 2

30 Sep 22:20

bghira

v0.6.0-beta2

6e85f6f

v0.6.0-beta2 - more AWS backend optimisations Pre-release

Pre-release

What's Changed

Training speed-up, from 266 seconds per iteration at batch size 15 across 5 GPUs down to 26 seconds per iteration
Substantial cost reduction in the use of AWS S3 as a backend for training

Full Changelog: v0.6.0-beta...v0.6.0-beta2

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

Changelog

Breaking Changes

New Features

Enhancements and Refinements

Bug Fixes and Improvements

Contributors

What's Changed

Contributors

Model release

What's Changed

Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

DeepSpeed lives!

Features

Changes

Bugfixes

What's Changed

Contributors

What's Changed

Potential issues

Contributors

What's Changed

Releases: bghira/SimpleTuner

v0.8.1 - fix for bucketing

What's Changed

v0.8.0 - improved quality and crop-training support

Changelog

Breaking Changes

New Features

Enhancements and Refinements

Bug Fixes and Improvements

Contributors

v0.7.4 - crop conditional bugfix

What's Changed

Contributors

v0.7.3

Model release

What's Changed

Contributors

Contributors

v0.7.2 - speed fix for large datasets

What's Changed

New Contributors

Contributors

v0.7.1 - maintenance release

What's Changed

v0.7.0 - tune SDXL in just 12G VRAM!

DeepSpeed lives!

Features

Changes

Bugfixes

v0.6.1 - melodramatic robot edition

What's Changed

Contributors

v0.6.0 - robust multiGPU training, long validation prompts

What's Changed

Potential issues

Contributors

v0.6.0-beta2 - more AWS backend optimisations

What's Changed