Releases · bghira/SimpleTuner

Nested subdir datasets will now have caches also nested in subdirectories, which unfortunately requires most-likely regenerating these entries. Sorry - it was not feasible to keep the old structure working in parallel.
FlashAttention3 fixes for H100 nodes by downgrading default torch version to 2.4.1
Resume fixes for multi-gpu/multi-node state/epoch tracking
Other misc bugfixes

What's Changed

fix flux attn masked transformer modeling code by @bghira in #1055
merge by @bghira in #1056
fix rope function for FA3 by @bghira in #1057
merge by @bghira in #1058
lokr: resume by default training state if not found by @bghira in #1060
merge by @bghira in #1061
Restore init_lokr_norm functionality by @imit8ed in #1065
refactor how masks are retrieved by @bghira in #1066
nvidia dependency update for pytorch-triton / aiohappyeyeballs by @bghira in #1062
downgrade cuda to pt241 by default by @bghira in #1067
add nightly build for pt26 by @bghira in #1068
Add recropping script for image JSON metadata backends by @AmericanPresidentJimmyCarter in #1063
merge by @bghira in #1069
bugfix: restore sampler state on rank 0 correctly by @bghira in #1071
merge by @bghira in #1072
fix vae cache dir creation for subdirs by @bghira in #1076
fix for nested image subdirs w/ duplicated filenames across subdirs by @bghira in #1078

New Contributors

@imit8ed made their first contribution in #1065

Full Changelog: v1.1.2...v1.1.3

Contributors

imit8ed, bghira, and AmericanPresidentJimmyCarter

Assets 2

0 Join discussion

13 Oct 04:36

bghira

v1.1.2

dddaf4f

v1.1.2 - masked loss and strong prior preservation

New stuff

New is_regularisation_data option for datasets, works great
H100 or greater now has better torch compile support
SDXL ControlNet training is back, now with quantised base model (int8)
Multi-node training works now, with a guide to deploy it easily
Configure.py now can generate a very rudimentary user prompt library for you if you are in a hurry
Flux model cards now have more useful information about your Flux training setup
Masked loss training & a demo script in the toolkit dir for generating a folder of image masks

What's Changed

quanto: improve support for SDXL training by @bghira in #1027
Fix attention masking transformer for flux by @AmericanPresidentJimmyCarter in #1032
merge by @bghira in #1036
H100/H200/B200 FlashAttention3 for Flux + TorchAO improvements by @bghira in #1033
utf8 fix for emojis in dataset configs by @bghira in #1037
fix venv instructions and edge case for aspect crop bucket list by @bghira in #1038
merge by @bghira in #1039
multi-node training fixes for state tracker by @bghira in #1040
merge bugfixes by @bghira in #1041
configure.py can configure caption strategy by @bghira in #1042
regression by @bghira in #1043
fix multinode state resumption by @bghira in #1044
merge by @bghira in #1045
validations can crash when sending updates to wandb by @bghira in #1046
aws: do not give up on fatal errors during exists() by @bghira in #1047
merge by @bghira in #1048
add prompt expander based on 1B Llama model by @bghira in #1049
implement regularisation dataset parent-student loss for LyCORIS training by @bghira in #1050
metadata: add more flux model card details by @bghira in #1051
merge by @bghira in #1052
fix controlnet training for sdxl and introduce masked loss preconditioning by @bghira in #1053
merge by @bghira in #1054

Full Changelog: v1.1.1...v1.1.2

Contributors

bghira and AmericanPresidentJimmyCarter

Assets 2

05 Oct 00:37

bghira

v1.1.1

01de5d0

v1.1.1 - bring on the potato models

Trained with NF4 via PagedLion8Bit.

New custom timestep distribution for Flux via --flux_use_beta_schedule, --flux_beta_schedule_alpha, --flux_beta_schedule_beta (#1023)
The trendy AdEMAMix, its 8bit and paged counterparts are all now available as bnb-ademamix, bnb-ademamix-8bit, and bnb-ademamix8bit-paged`
All low-bit optimisers from Bits n Bytes are now included for NVIDIA and ROCm systems
NF4 training on NVIDIA systems down to 9090M total using Lion8Bit and 512px training at 1.5 sec/iter on a 4090

What's Changed

int8-quanto followup fixes (batch size > 1) by @bghira in #1016
merge by @bghira in #1018
update doc by @bghira in #1019
update docs by @bghira in #1025
Add the ability to use a Beta schedule to select Flux timesteps by @AmericanPresidentJimmyCarter in #1023
AdEMAMix, 8bit Adam/AdamW/Lion/Adagrad, Paged optimisers by @bghira in #1026
Bits n Bytes NF4 training by @bghira in #1028
merge by @bghira in #1029

Full Changelog: v1.1...v1.1.1

Contributors

bghira and AmericanPresidentJimmyCarter

Assets 2

0 Join discussion

01 Oct 20:51

bghira

v1.1

696760e

v1.1 - API-friendly edition

Features

Performance

Improved launch speed for large datasets (>1M samples)
Improved speed for quantising on CPU
Optional support for directly quantising on GPU near-instantly (--quantize_via)

Compatibility

SDXL, SD1.5 and SD2.x compatibility with LyCORIS training
Updated documentation to make multiGPU configuration a bit more obvious.
Improved support for torch.compile(), including automatically disabling it when eg. fp8-quanto is enabled
- Enable via accelerate config or config/config.env via TRAINER_DYNAMO_BACKEND=inductor
TorchAO for quantisation as an alternative to Optimum Quanto for int8 weight-only quantisation (int8-torchao)
f8uz-quanto, a compatibility level for AMD users to experiment with FP8 training dynamics
Support for multigpu PEFT LoRA training with Quanto enabled (not fp8-quanto)
- Previously, only LyCORIS would reliably work with quantised multigpu training sessions.
Ability to quantise models when full-finetuning, without warning or error. Previously, this configuration was blocked. Your mileage may vary, it's an experimental configuration.

Integrations

Images now get logged to tensorboard (thanks @anhi)
FastAPI endpoints for integrations (undocumented)
"raw" webhook type that sends a large number of HTTP requests containing events, useful for push notification type service

Optims

SOAP optimiser support
- uses fp32 gradients, nice and accurate but uses more memory than other optims, by default slows down every 10 steps as it preconditions
New 8bit and 4bit optimiser options from TorchAO (ao-adamw8bit, ao-adamw4bit etc)

Pull Requests

Fix flux cfg sampling bug by @AmericanPresidentJimmyCarter in #981
merge by @bghira in #982
FastAPI endpoints for managing trainer as a service by @bghira in #969
constant lr resume fix for optimi-stableadamw by @bghira in #984
clear data backends before configuring new ones by @bghira in #992
update to latest quanto main by @bghira in #994
log images in tensorboard by @anhi in #998
merge by @bghira in #999
torchao: add int8; quanto: add NF4; torch compile fixes + ability to compile optim by @bghira in #986
update flux quickstart by @bghira in #1000
compile optimiser by @bghira in #1001
optimizer compile step only by @bghira in #1002
remove optimiser compilation arg by @bghira in #1003
remove optim compiler from options by @bghira in #1004
remove optim compiler from options by @bghira in #1005
SOAP optimiser; int4 fixes for 4090 by @bghira in #1006
torchao: install 0.5.0 from pytorch source by @bghira in #1007
update safety check warning with guidance toward cache clear interval for OOM issues by @bghira in #1008
fix webhook contents for discord by @bghira in #1011
fp8-quanto fixes, unblocking of PEFT multigpu LoRA training for other precision levels by @bghira in #1013
quanto: activations sledgehammer by @bghira in #1014
1.1 merge window by @bghira in #1010

Full Changelog: v1.0.1...v1.1

Contributors

anhi, bghira, and AmericanPresidentJimmyCarter

Assets 2

0 Join discussion

14 Sep 18:45

bghira

v1.0.1

a5ca5a2

v1.0.1

This is a maintenance release with not many new features.

What's Changed

fix reference error to use_dora by @bghira in #929
fix merge error by @bghira in #930
fix use of --num_train_epochs by @bghira in #932
merge fixes by @bghira in #934
documentation updates, deepspeed config reference error fix by @bghira in #935
Fix caption_with_cogvlm.py for cogvlm2 + textfile strategy by @burgalon in #936
dependency updates, cogvlm fixes, peft/lycoris resume fix by @bghira in #939
feature: zero embed padding for t5 on request by @bghira in #941
merge by @bghira in #942
comet_ml validation images by @burgalon in #944
Allow users to init their LoKr with perturbed normal w2 by @AmericanPresidentJimmyCarter in #943
merge by @bghira in #948
fix typo in PR by @bghira in #949
update arg name for norm init by @bghira in #950
configure script should not set dropout by default by @bghira in #955
VAECache: improve startup speed for large sets by @bghira in #956
Update FLUX.md by @anae-git in #957
mild bugfixes by @bghira in #963
fix bucket worker not waiting for all queue worker to finish by @burgalon in #967
merge by @bghira in #968
fix DDP for PEFT LoRA & minor exit error by @bghira in #974

New Contributors

@anae-git made their first contribution in #957

Full Changelog: v1.0...v1.0.1

Contributors

burgalon, bghira, and 2 other contributors

Assets 2

0 Join discussion

02 Sep 20:57

bghira

v1.0

bebcbee

v1.0 the total recall edition

Everything has changed! And yet, nothing has. Some defaults may. No, will - be different. It's hard to know which ones.

For those who can do so, it's recommended to use configure.py to reconfigure your environment on this new release.

It should go without saying, but for those in the middle of a training run, do not upgrade to this release until you finish.

Refactoring and Enhancements:

Refactor train.py into a Trainer Class:
- The core logic of train.py has been restructured into a Trainer class, improving modularity and maintainability.
- Exposes an SDK for reuse elsewhere.
Model Family Unification:
- References to specific model types (--sd3, --flux, etc.) have been replaced with a unified --model_family argument, streamlining model specification and reducing clutter in configurations.
Configuration System Overhaul:
- Switched from .env configuration files to JSON (config.json), with multiple backends supporting JSON configuration loading. This allows more flexible and readable configuration management.
- Updated the configuration loader to auto-detect the best backend when launching.
Enhanced Argument Handling:
- Deprecated old argument references and moved argument parsing to helpers/configuration/cmd_args.py for better organization.
- Introduced support for new arguments such as --model_card_safe_for_work, --flux_schedule_shift, and --disable_bucket_pruning.
Improved Hugging Face Integration:
- Modified configure.py to avoid asking for Hugging Face model name details unless required.
- Added the ability to pass the SFW (safe-for-work) argument into the training script.
Optimizations and Bug Fixes:
- Fixed several references to learning rate (lr) initialization and corrected --optimizer usage.
- Addressed issues with attention masks swapping and fixed the persistence of text encoders in RAM after refactoring.
Training and Validation Enhancements:
- Added better dataset examples with support for multiple resolutions and mixed configurations.
- Configured training scripts to disable gradient accumulation steps by default and provided better control over training options via the updated config files.
Enhanced Logging and Monitoring:
- Improved the handling of Weights & Biases (wandb) logs and updated tracker argument references.
Documentation Updates:
- Revised documentation to reflect changes in model family handling, argument updates, and configuration management.
- Added guidance on setting up the new configuration files and examples for multi-resolution datasets.
Miscellaneous Improvements:
- Enabled support for NSFW tags in model cards enabled by default.
- Updated train.sh to minimal requirements, reducing complexity and streamlining the training process.

More detailed change log

lycoris model card updates by @bghira in #820
Generate and store attention masks for T5 for flux by @AmericanPresidentJimmyCarter in #821
Fix validation by @AmericanPresidentJimmyCarter in #822
backwards-compatible flux embedding cache masks by @bghira in #823
merge by @bghira in #824
parquet add width and height columns by @frankchieng in #825
quanto: remove warnings about int8/fp8 confusion as it happened so long ago now; add warning about int4 by @bghira in #826
remove clip warning by @bghira in #827
update lycoris to dev branch, 3.0.1dev3 by @bghira in #828
Fix caption_with_blip3.py on CUDA by @anhi in #833
fix quanto resuming by @bghira in #834
lycoris: resume should use less vram now by @bghira in #835
(#644) temporarily block training on multi-gpu setup with quanto + PEFT, inform user to go with lycoris instead by @bghira in #837
quanto + deepspeed minor fixes for multigpu training by @bghira in #839
deepspeed sharding by @bghira in #840
fix: only run save full model on main process by @ErwannMillon in #838
merge by @bghira in #841
clean-up by @bghira in #842
follow-up fixes for quanto limitation on multigpu by @bghira in #846
merge by @bghira in #850
(#851) remove shard merge code on load hook by @bghira in #853
csv backend updates by @williamzhuk in #645
csv fixes by @bghira in #856
add schedulefree optim w/ kahan summation by @bghira in #857
merge by @bghira in #858
merge by @bghira in #861
schedulefree: return to previous stable settings and add a new preset for aggressive training by @bghira in #862
fix validation image filename only using resolution from first img, and, unreadable/untypeable parenthesis by @bghira in #863
(#519) add side by side comparison with base model by @bghira in #865
merge fixes by @bghira in #870
(#864) add flux final export for full tune by @bghira in #871
wandb gallery mode by @bghira in #872
sdxl: dtype inference followup fix by @bghira in #873
merge by @bghira in #878
combine the vae cache clear logic with bucket rebuild logic by @bghira in #879
flux: mobius-style training via augmented guidance scale by @bghira in #880
track flux cfg via wandb by @bghira in #881
multigpu VAE cache rebuild fixes; random crop auto-rebuild; mobius flux; json backend now renamed to discovery ; wandb guidance tracking by @bghira in #888
fixing typo in flux document for preserve_data_backend_cache key by @riffmaster-2001 in #882
reintroduce timestep dependent shift as an option during flux training for dev and schnell, disabled by default by @bghira in #892
adding SD3 timestep-dependent shift for Flux training by @bghira in #894
fix: set optimizer details to empty dict w/ deepspeed by @ErwannMillon in #895
fix: make sure wandb_logs is always defined by @ErwannMillon in #896
merge by @bghira in #900
Dataloader Docs - Correct caption strategy for instance prompt by @barakyo in #902
refactor train.py into Trainer class by @bghira in #899
Update TRAINER_EXTRA_ARGS for model_family by @barakyo in #903
Fix text encoder nuking regression by @mhirki in #906
added lokr lycoris init_lora by @flotos in #907
Fix Flux schedule shift and add resolution-dependent schedule shift by @mhirki in #905
Swap the attention mask location, because Flux swapped text and image… by @AmericanPresidentJimmyCarter in #908
support toml, json, env config backend, and multiple config environments by @bghira in #909
Add "none" to --report_to argument by @twri in #911
Add support for tiny PEFT-based Flux LoRA based on TheLastBen's post on Reddit by @mhirki in #912
Update lycoris_config.json.example with working defaults by @mhirki in #918
fix constant_with_warmup not being so constant or warming up by @bghira in #919
follow-up fix for setting last_epoch by @bghira in #920
fix multigpu schedule issue with LR on resume by @bghira in #921
multiply the resume state step by the number of GPUs in an attempt to overcome accelerate v0.33 issue by @bghira in #922
default to json/toml before the env file in case multigpu is configured by @bghira in #923
fix json/toml configs str bool values by @bghira in #924
bypass some "helpful" diffusers logic that makes random decisions to run on CPU by @bghira in #925
v1.0 merge by @bghira in #910
*...

Contributors

barakyo, anhi, and 9 other contributors

Assets 2

27 Aug 13:50

bghira

v0.9.3.9

202c437

v0.9.3.9 - bugfixes, better defaults

What's Changed

LyCORIS

lycoris model card updates by @bghira in #820
lycoris: resume should use less vram now by @bghira in #835
update lycoris to dev branch, 3.0.1dev3 by @bghira in #828
(#644) temporarily block training on multi-gpu setup with quanto + PEFT, inform user to go with lycoris instead by @bghira in #837

Flux

Generate and store attention masks for T5 for flux by @AmericanPresidentJimmyCarter in #821
backwards-compatible flux embedding cache masks by @bghira in #823
remove clip warning by @bghira in #827
fix quanto resuming by @bghira in #834
(#864) add flux final export for full tune by @bghira in #871
flux: mobius-style training via augmented guidance scale by @bghira in #880
track flux cfg via wandb by @bghira in #881

Misc features

add schedulefree optim w/ kahan summation by @bghira in #857
schedulefree: return to previous stable settings and add a new preset for aggressive training by @bghira in #862
(#519) add side by side comparison with base model by @bghira in #865
wandb gallery mode by @bghira in #872
combine the vae cache clear logic with bucket rebuild logic by @bghira in #879

Misc bug-fixes

parquet add width and height columns by @frankchieng in #825
quanto: remove warnings about int8/fp8 confusion as it happened so long ago now; add warning about int4 by @bghira in #826
Fix caption_with_blip3.py on CUDA by @anhi in #833
quanto + deepspeed minor fixes for multigpu training by @bghira in #839
deepspeed sharding fixes by @bghira in #840
fix: only run save full model on main process by @ErwannMillon in #838
(#851) remove shard merge code on load hook by @bghira in #853
csv backend updates by @williamzhuk in #645
csv fixes by @bghira in #856
fix validation image filename only using resolution from first img, and, unreadable/untypeable parenthesis by @bghira in #863
sdxl: dtype inference followup fix by @bghira in #873

New Contributors

@anhi made their first contribution in #833
@ErwannMillon made their first contribution in #838
@williamzhuk made their first contribution in #645

Full Changelog: v0.9.8.3.2...v0.9.3.9

Contributors

anhi, williamzhuk, and 4 other contributors

Assets 2

0 Join discussion

19 Aug 19:23

bghira

v0.9.8.3.2

4f3d545

v0.9.3.8.2 - 2 releases a day keeps the bugs away

What's Changed

fix merged model being exported by accident instead of normal lora safetensors after LyCORIS was introduced by @bghira in #817

Full Changelog: v0.9.8.3.1...v0.9.8.3.2

Contributors

bghira

Assets 2

19 Aug 17:52

bghira

v0.9.8.3.1

3172256

v0.9.8.3.1 - state dict fix for final resulting safetensors

Minor, but important - the intermediary checkpoints weren't impacted before, just part of the weights ended up mis-labeled.

What's Changed

state dict export for final pipeline by @bghira in #812
lycoris: disable text encoder training (it doesn't work here, yet)
state dict fix for the final pipeline export after training by @bghira in #813
lycoris: final export fix, correctly save weights by @bghira in #814
update lycoris doc by @bghira in #815
lycoris updates by @bghira in #816

Full Changelog: v0.9.8.3...v0.9.8.3.1

Contributors

bghira

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

Contributors

What's Changed

New Contributors

Contributors

New stuff

What's Changed

Contributors

What's Changed

Contributors

Features

Performance

Compatibility

Integrations

Optims

Pull Requests

Contributors

What's Changed

New Contributors

Contributors

Refactoring and Enhancements:

More detailed change log

Contributors

What's Changed

LyCORIS

Flux

Misc features

Misc bug-fixes

New Contributors

Contributors

What's Changed

Contributors

What's Changed

Contributors

Releases: bghira/SimpleTuner

v1.1.4

What's Changed

Contributors

v1.1.3

What's Changed

New Contributors

Contributors

v1.1.2 - masked loss and strong prior preservation

New stuff

What's Changed

Contributors

v1.1.1 - bring on the potato models

What's Changed

Contributors

v1.1 - API-friendly edition

Features

Performance

Compatibility

Integrations

Optims

Pull Requests

Contributors

v1.0.1

What's Changed

New Contributors

Contributors

v1.0 the total recall edition

Refactoring and Enhancements:

More detailed change log

Contributors

v0.9.3.9 - bugfixes, better defaults

What's Changed

LyCORIS

Flux

Misc features

Misc bug-fixes

New Contributors

Contributors

v0.9.3.8.2 - 2 releases a day keeps the bugs away

What's Changed

Contributors

v0.9.8.3.1 - state dict fix for final resulting safetensors

What's Changed

Contributors