Skip to content

Releases: bghira/SimpleTuner

v1.1.4

22 Oct 15:07
71bea97
Compare
Choose a tag to compare

Support for SD 3.5 fine-tuning.

Stability AI has provided a tutorial on using SimpleTuner for this task here and the SD3 quickstart provided by SimpleTuner is available here

What's Changed

Full Changelog: v1.1.3...v1.1.4

v1.1.3

18 Oct 17:31
8bf644f
Compare
Choose a tag to compare
  • Nested subdir datasets will now have caches also nested in subdirectories, which unfortunately requires most-likely regenerating these entries. Sorry - it was not feasible to keep the old structure working in parallel.
  • FlashAttention3 fixes for H100 nodes by downgrading default torch version to 2.4.1
  • Resume fixes for multi-gpu/multi-node state/epoch tracking
  • Other misc bugfixes

What's Changed

New Contributors

Full Changelog: v1.1.2...v1.1.3

v1.1.2 - masked loss and strong prior preservation

13 Oct 04:36
dddaf4f
Compare
Choose a tag to compare

New stuff

  • New is_regularisation_data option for datasets, works great
  • H100 or greater now has better torch compile support
  • SDXL ControlNet training is back, now with quantised base model (int8)
  • Multi-node training works now, with a guide to deploy it easily
  • Configure.py now can generate a very rudimentary user prompt library for you if you are in a hurry
  • Flux model cards now have more useful information about your Flux training setup
  • Masked loss training & a demo script in the toolkit dir for generating a folder of image masks

What's Changed

Full Changelog: v1.1.1...v1.1.2

v1.1.1 - bring on the potato models

05 Oct 00:37
01de5d0
Compare
Choose a tag to compare

image

Trained with NF4 via PagedLion8Bit.

  • New custom timestep distribution for Flux via --flux_use_beta_schedule, --flux_beta_schedule_alpha, --flux_beta_schedule_beta (#1023)
  • The trendy AdEMAMix, its 8bit and paged counterparts are all now available as bnb-ademamix, bnb-ademamix-8bit, and bnb-ademamix8bit-paged`
  • All low-bit optimisers from Bits n Bytes are now included for NVIDIA and ROCm systems
  • NF4 training on NVIDIA systems down to 9090M total using Lion8Bit and 512px training at 1.5 sec/iter on a 4090

What's Changed

Full Changelog: v1.1...v1.1.1

v1.1 - API-friendly edition

01 Oct 20:51
696760e
Compare
Choose a tag to compare

Features

image

Performance

  • Improved launch speed for large datasets (>1M samples)
  • Improved speed for quantising on CPU
  • Optional support for directly quantising on GPU near-instantly (--quantize_via)

Compatibility

  • SDXL, SD1.5 and SD2.x compatibility with LyCORIS training
  • Updated documentation to make multiGPU configuration a bit more obvious.
  • Improved support for torch.compile(), including automatically disabling it when eg. fp8-quanto is enabled
    • Enable via accelerate config or config/config.env via TRAINER_DYNAMO_BACKEND=inductor
  • TorchAO for quantisation as an alternative to Optimum Quanto for int8 weight-only quantisation (int8-torchao)
  • f8uz-quanto, a compatibility level for AMD users to experiment with FP8 training dynamics
  • Support for multigpu PEFT LoRA training with Quanto enabled (not fp8-quanto)
    • Previously, only LyCORIS would reliably work with quantised multigpu training sessions.
  • Ability to quantise models when full-finetuning, without warning or error. Previously, this configuration was blocked. Your mileage may vary, it's an experimental configuration.

Integrations

  • Images now get logged to tensorboard (thanks @anhi)
  • FastAPI endpoints for integrations (undocumented)
  • "raw" webhook type that sends a large number of HTTP requests containing events, useful for push notification type service

Optims

  • SOAP optimiser support
    • uses fp32 gradients, nice and accurate but uses more memory than other optims, by default slows down every 10 steps as it preconditions
  • New 8bit and 4bit optimiser options from TorchAO (ao-adamw8bit, ao-adamw4bit etc)

Pull Requests

Full Changelog: v1.0.1...v1.1

v1.0.1

14 Sep 18:45
a5ca5a2
Compare
Choose a tag to compare

This is a maintenance release with not many new features.

What's Changed

New Contributors

Full Changelog: v1.0...v1.0.1

v1.0 the total recall edition

02 Sep 20:57
bebcbee
Compare
Choose a tag to compare

Everything has changed! And yet, nothing has. Some defaults may. No, will - be different. It's hard to know which ones.

For those who can do so, it's recommended to use configure.py to reconfigure your environment on this new release.

It should go without saying, but for those in the middle of a training run, do not upgrade to this release until you finish.

Refactoring and Enhancements:

  1. Refactor train.py into a Trainer Class:

    • The core logic of train.py has been restructured into a Trainer class, improving modularity and maintainability.
    • Exposes an SDK for reuse elsewhere.
  2. Model Family Unification:

    • References to specific model types (--sd3, --flux, etc.) have been replaced with a unified --model_family argument, streamlining model specification and reducing clutter in configurations.
  3. Configuration System Overhaul:

    • Switched from .env configuration files to JSON (config.json), with multiple backends supporting JSON configuration loading. This allows more flexible and readable configuration management.
    • Updated the configuration loader to auto-detect the best backend when launching.
  4. Enhanced Argument Handling:

    • Deprecated old argument references and moved argument parsing to helpers/configuration/cmd_args.py for better organization.
    • Introduced support for new arguments such as --model_card_safe_for_work, --flux_schedule_shift, and --disable_bucket_pruning.
  5. Improved Hugging Face Integration:

    • Modified configure.py to avoid asking for Hugging Face model name details unless required.
    • Added the ability to pass the SFW (safe-for-work) argument into the training script.
  6. Optimizations and Bug Fixes:

    • Fixed several references to learning rate (lr) initialization and corrected --optimizer usage.
    • Addressed issues with attention masks swapping and fixed the persistence of text encoders in RAM after refactoring.
  7. Training and Validation Enhancements:

    • Added better dataset examples with support for multiple resolutions and mixed configurations.
    • Configured training scripts to disable gradient accumulation steps by default and provided better control over training options via the updated config files.
  8. Enhanced Logging and Monitoring:

    • Improved the handling of Weights & Biases (wandb) logs and updated tracker argument references.
  9. Documentation Updates:

    • Revised documentation to reflect changes in model family handling, argument updates, and configuration management.
    • Added guidance on setting up the new configuration files and examples for multi-resolution datasets.
  10. Miscellaneous Improvements:

    • Enabled support for NSFW tags in model cards enabled by default.
    • Updated train.sh to minimal requirements, reducing complexity and streamlining the training process.

More detailed change log

  • lycoris model card updates by @bghira in #820
  • Generate and store attention masks for T5 for flux by @AmericanPresidentJimmyCarter in #821
  • Fix validation by @AmericanPresidentJimmyCarter in #822
  • backwards-compatible flux embedding cache masks by @bghira in #823
  • merge by @bghira in #824
  • parquet add width and height columns by @frankchieng in #825
  • quanto: remove warnings about int8/fp8 confusion as it happened so long ago now; add warning about int4 by @bghira in #826
  • remove clip warning by @bghira in #827
  • update lycoris to dev branch, 3.0.1dev3 by @bghira in #828
  • Fix caption_with_blip3.py on CUDA by @anhi in #833
  • fix quanto resuming by @bghira in #834
  • lycoris: resume should use less vram now by @bghira in #835
  • (#644) temporarily block training on multi-gpu setup with quanto + PEFT, inform user to go with lycoris instead by @bghira in #837
  • quanto + deepspeed minor fixes for multigpu training by @bghira in #839
  • deepspeed sharding by @bghira in #840
  • fix: only run save full model on main process by @ErwannMillon in #838
  • merge by @bghira in #841
  • clean-up by @bghira in #842
  • follow-up fixes for quanto limitation on multigpu by @bghira in #846
  • merge by @bghira in #850
  • (#851) remove shard merge code on load hook by @bghira in #853
  • csv backend updates by @williamzhuk in #645
  • csv fixes by @bghira in #856
  • add schedulefree optim w/ kahan summation by @bghira in #857
  • merge by @bghira in #858
  • merge by @bghira in #861
  • schedulefree: return to previous stable settings and add a new preset for aggressive training by @bghira in #862
  • fix validation image filename only using resolution from first img, and, unreadable/untypeable parenthesis by @bghira in #863
  • (#519) add side by side comparison with base model by @bghira in #865
  • merge fixes by @bghira in #870
  • (#864) add flux final export for full tune by @bghira in #871
  • wandb gallery mode by @bghira in #872
  • sdxl: dtype inference followup fix by @bghira in #873
  • merge by @bghira in #878
  • combine the vae cache clear logic with bucket rebuild logic by @bghira in #879
  • flux: mobius-style training via augmented guidance scale by @bghira in #880
  • track flux cfg via wandb by @bghira in #881
  • multigpu VAE cache rebuild fixes; random crop auto-rebuild; mobius flux; json backend now renamed to discovery ; wandb guidance tracking by @bghira in #888
  • fixing typo in flux document for preserve_data_backend_cache key by @riffmaster-2001 in #882
  • reintroduce timestep dependent shift as an option during flux training for dev and schnell, disabled by default by @bghira in #892
  • adding SD3 timestep-dependent shift for Flux training by @bghira in #894
  • fix: set optimizer details to empty dict w/ deepspeed by @ErwannMillon in #895
  • fix: make sure wandb_logs is always defined by @ErwannMillon in #896
  • merge by @bghira in #900
  • Dataloader Docs - Correct caption strategy for instance prompt by @barakyo in #902
  • refactor train.py into Trainer class by @bghira in #899
  • Update TRAINER_EXTRA_ARGS for model_family by @barakyo in #903
  • Fix text encoder nuking regression by @mhirki in #906
  • added lokr lycoris init_lora by @flotos in #907
  • Fix Flux schedule shift and add resolution-dependent schedule shift by @mhirki in #905
  • Swap the attention mask location, because Flux swapped text and image… by @AmericanPresidentJimmyCarter in #908
  • support toml, json, env config backend, and multiple config environments by @bghira in #909
  • Add "none" to --report_to argument by @twri in #911
  • Add support for tiny PEFT-based Flux LoRA based on TheLastBen's post on Reddit by @mhirki in #912
  • Update lycoris_config.json.example with working defaults by @mhirki in #918
  • fix constant_with_warmup not being so constant or warming up by @bghira in #919
  • follow-up fix for setting last_epoch by @bghira in #920
  • fix multigpu schedule issue with LR on resume by @bghira in #921
  • multiply the resume state step by the number of GPUs in an attempt to overcome accelerate v0.33 issue by @bghira in #922
  • default to json/toml before the env file in case multigpu is configured by @bghira in #923
  • fix json/toml configs str bool values by @bghira in #924
  • bypass some "helpful" diffusers logic that makes random decisions to run on CPU by @bghira in #925
  • v1.0 merge by @bghira in #910
    *...
Read more

v0.9.3.9 - bugfixes, better defaults

27 Aug 13:50
202c437
Compare
Choose a tag to compare

What's Changed

LyCORIS

  • lycoris model card updates by @bghira in #820
  • lycoris: resume should use less vram now by @bghira in #835
  • update lycoris to dev branch, 3.0.1dev3 by @bghira in #828
  • (#644) temporarily block training on multi-gpu setup with quanto + PEFT, inform user to go with lycoris instead by @bghira in #837

Flux

Misc features

  • add schedulefree optim w/ kahan summation by @bghira in #857
  • schedulefree: return to previous stable settings and add a new preset for aggressive training by @bghira in #862
  • (#519) add side by side comparison with base model by @bghira in #865
  • wandb gallery mode by @bghira in #872
  • combine the vae cache clear logic with bucket rebuild logic by @bghira in #879

Misc bug-fixes

  • parquet add width and height columns by @frankchieng in #825
  • quanto: remove warnings about int8/fp8 confusion as it happened so long ago now; add warning about int4 by @bghira in #826
  • Fix caption_with_blip3.py on CUDA by @anhi in #833
  • quanto + deepspeed minor fixes for multigpu training by @bghira in #839
  • deepspeed sharding fixes by @bghira in #840
  • fix: only run save full model on main process by @ErwannMillon in #838
  • (#851) remove shard merge code on load hook by @bghira in #853
  • csv backend updates by @williamzhuk in #645
  • csv fixes by @bghira in #856
  • fix validation image filename only using resolution from first img, and, unreadable/untypeable parenthesis by @bghira in #863
  • sdxl: dtype inference followup fix by @bghira in #873

New Contributors

Full Changelog: v0.9.8.3.2...v0.9.3.9

v0.9.3.8.2 - 2 releases a day keeps the bugs away

19 Aug 19:23
4f3d545
Compare
Choose a tag to compare

What's Changed

  • fix merged model being exported by accident instead of normal lora safetensors after LyCORIS was introduced by @bghira in #817

Full Changelog: v0.9.8.3.1...v0.9.8.3.2

v0.9.8.3.1 - state dict fix for final resulting safetensors

19 Aug 17:52
3172256
Compare
Choose a tag to compare

Minor, but important - the intermediary checkpoints weren't impacted before, just part of the weights ended up mis-labeled.

What's Changed

  • state dict export for final pipeline by @bghira in #812
  • lycoris: disable text encoder training (it doesn't work here, yet)
  • state dict fix for the final pipeline export after training by @bghira in #813
  • lycoris: final export fix, correctly save weights by @bghira in #814
  • update lycoris doc by @bghira in #815
  • lycoris updates by @bghira in #816

Full Changelog: v0.9.8.3...v0.9.8.3.1