Skip to content

Releases: bghira/SimpleTuner

v0.8.1 - fix for bucketing

16 Dec 01:50
Compare
Choose a tag to compare

What's Changed

  • min-snr-gamma now calculates by only adding 1 to the divisor
  • fix for bucketing error in (#240)

Full Changelog: v0.8.0...v0.8.1-fix

v0.8.0 - improved quality and crop-training support

08 Dec 20:11
dcba15e
Compare
Choose a tag to compare

image

Changelog

Breaking Changes

  • SDXL Launch Script Format: Updated launch script format, set new defaults, and rearranged options, introducing breaking changes.

New Features

  • BucketManager: Enhanced to remove images that are too small, now controlled via --minimum_image_size.
  • Captioning Toolkit: Advanced CogVLM captioner and a basic LLaVA captioner now available.
  • Crop Options: Added --crop_style and --crop_aspect options for improved control over cropping behavior.
  • Validation Negative Prompt: Added --validation_negative_prompt option.

Enhancements and Refinements

  • Learning Rate Scheduler: Distinguished between cosine and cosine_with_restarts schedulers. The default LR scheduler is now cosine.
  • DeepSpeed for SD 2.x: Integrated DeepSpeed for improved performance in Stable Diffusion 2.x models.
  • Downsampling Method: Switched to using LANCZOS for downsampling to reduce image artifacts compared to BICUBIC.
  • Diffusers Update: Adapted to the new version of the diffusers library and fixed issues related to the refactored config style.

Bug Fixes and Improvements

  • Captioning Dropout: Enhanced to also drop conditioning inputs, ensuring a more consistent dropout mechanism.
  • Unit Tests: Added unit tests for random cropping within image boundaries and updated VAE Cache to accommodate random crop coordinates.
  • EMA Model Params: Optimized logging to not print EMA (Exponential Moving Average) model parameters.
  • Dropout Code Conflict: Removed unnecessary conflicting dropout code.

Full Changelog: v0.7.4...v0.8.0

v0.7.4 - crop conditional bugfix

26 Nov 00:42
de88262
Compare
Choose a tag to compare

What's Changed

  • CSV Downloader: MJ dataset compatibility improvements
  • BucketManager: periodically save image metadata during aspect bucket caching every 1 hour by default
  • Crop conditional fix so that we only alter image size if it is smaller, before we crop, fixing a mismatched tensor size error by @bghira in #220
  • MultiAspectImage: remove center_crop arg by @bghira in #221

Full Changelog: v0.7.3...v0.7.4

v0.7.3

05 Nov 01:40
ffdd057
Compare
Choose a tag to compare

Model release

Available on Huggingface Hub for use with any v-prediction/zero-terminal SNR capable platform, such as Diffusers:

Gamma model:
image

What's Changed

  • EMA decay option --ema_decay
  • Captioner: offload BLIP + CLIP models
  • Renamed --learning_rate_end to lr_end and scale_lr to lr_scale
  • Fix env file parsing to use LR_WARMUP_STEPS
  • Update dependencies
  • Fix bug with DictProxy failure on local filesystem training
  • Fix cosine annealing warm restarts so that it actually cosines and allows changing LR on startup
  • Fix eta_min value
  • Offset noise and input perturbations are now probabilistically applied 25% of the time when used
  • VAECache fix for a crash when we encounter a skipped future in multiprocessing setups or multinode training where one node gets ahead of the other

Contributors

All work by @bghira in #216

Full Changelog: v0.7.2...v0.7.3

v0.7.2 - speed fix for large datasets

23 Oct 02:31
27fa5b4
Compare
Choose a tag to compare

What's Changed

  • VAECache optimization & refactor by @bghira in #210
  • Fix README hyphens by @Beinsezii in #213
  • SDXL Checkpoint saving is now atomic
  • Fix inefficiency in aspect bucket sampler is_seen check
  • Updated README (minor) by @bghira in #215

New Contributors

Full Changelog: v0.7.1...v0.7.2

v0.7.1 - maintenance release

15 Oct 19:45
f37884b
Compare
Choose a tag to compare

What's Changed

  • Fix a zero-training run that merely exports the pipeline from hitting an error
  • Reduced wasteful CPU use of aspect bucketing, it should now use more available CPU percent for more meaningful progress
  • Reduce more warnings from PIL, especially about transparent images (RGBA) and annoying Transformers/Diffusers log outputs
  • Improve the efficiency of VAE encoding on many-GPU systems. It's still problematic, especially for cloud storage backend
  • A weak attempt at resolving the memory use of text embed caching causing OOMs for large datasets. Probably unresolved still, but the issue goes away after multiple execution passes, so this is a low-priority concern.

Full Changelog: v0.7.0...v0.7.1

v0.7.0 - tune SDXL in just 12G VRAM!

13 Oct 01:28
7a54f38
Compare
Choose a tag to compare

image

DeepSpeed lives!

Now correctly integrated, DeepSpeed allows for training SDXL's full U-net at 8 seconds per iteration on just ~12G of VRAM. See the documentation for more information!

Features

  • Added --validation_torch_compile option so that you can try using the PyTorch inductor purely at inference time. Your mileage may vary.
  • The Parquet/CSV to S3 dataset script now treats LFS repositories as thin, downloading a single parquet catalogue at once just before parsing it, rather than requiring the entire repository to be pulled. Makes it possible to process LAION-COCO with just a few GB of disk spent.
    • The S3 dataset script now also uses multiprocessing rather than multithreading. If you have been using it with a large worker size, it was likely bottlenecked on luminance value calculations. Now, it will happily use all available CPU.
    • There is a new --minimum_pixel_area option, which, measured in megapixels, will allow selecting SDXL-compatible training images by default. Set this value to 0 and instead supply --minimum_resolution=768 for example, to revert to the previous behaviour.

Changes

  • SD 2.x always uses zero-terminal SNR now. There's no reason not to use it.
  • The final validations of the model's training are now in line with the training validations. They now use the same code.
  • Progress bars now tend to disappear once their job is done, unless the output is redirected to a log file.
  • More statistics are being logged to Weights & Biases now - including the current epoch, EMA decay rate (if in use) and timestep selection bias, which needs to be configured as a historyTable to show all previous steps.

Bugfixes

  • EMA model optimization_step was going out of sync with global_step - they are now always in sync.
    • The impact of this is that the EMA decay rate was greatly under-calculated, reducing the normalisation factor of the EMA model.
  • SD 2.x trainer fixes / improvements. Tested on 14x 4090 24G system. Does not implement DeepSpeed yet.
  • Better logging for a couple scenarios where things hit the proverbial fan due to residual VAE caches from previous runs.
  • Now --disable_compel does not explode anything when you provide it. Compel is properly disabled.
  • During final validation, the shortnames in Weights & Biases are reflected correctly.
  • The range option for --timestep_bias_strategy has now been fixed. It was missing from the list of available choices.

Full Changelog: v0.6.1...v0.7.0

v0.6.1 - melodramatic robot edition

06 Oct 21:41
aa6b486
Compare
Choose a tag to compare

image

What's Changed

  • Random bucket sampling fix by @bghira in #205
  • Fixing arguments not known error
  • Fix unloading of text encoders when --fully_unload_text_encoder is given
  • Fix SD 2.x trainer
  • Reduce multi-GPU logging
  • Log errored-out PyTorch tensor files, when we fail to load one
  • Adding more prompts to the built-in library to demonstrate difficult compositions/contrast by @bghira in #206

Full Changelog: v0.6.0...v0.6.1

v0.6.0 - robust multiGPU training, long validation prompts

02 Oct 00:27
5c1bb64
Compare
Choose a tag to compare

image

What's Changed

  • Compel for long prompt handling, opt-out via --disable_compel by @bghira in #196
  • Bugfix/adafactor 24gb by @bghira in #197
  • Min-SNR Gamma fix for v_prediction (round twelve thousand) by @bghira in #198
  • Release by @bghira in #199
  • v0.5.2 by @bghira in #200
  • Use a seed per-gpu by default, and allow persistent workers by @bghira in #201
  • Merge pull request #200 from bghira/main by @bghira in #202
  • v0.6.0 by @bghira in #203
  • Fix Epoch tracking by trimming aspect buckets to remove inaccessible samples
  • Fix directory creation on local backend
  • Fix luminance tracking
  • Fix multi-GPU training hang on epoch rollover by @bghira in #204

Potential issues

  • Long prompt weighting in Compel might be wonky, remove the cache/ directory entries and try again with --disable_compel if any tensor size issues arise.

Full Changelog: v0.5.1...v0.6.0

v0.6.0-beta2 - more AWS backend optimisations

30 Sep 22:20
Compare
Choose a tag to compare

What's Changed

  • Training speed-up, from 266 seconds per iteration at batch size 15 across 5 GPUs down to 26 seconds per iteration
  • Substantial cost reduction in the use of AWS S3 as a backend for training

Full Changelog: v0.6.0-beta...v0.6.0-beta2