Releases: bghira/SimpleTuner
v0.9.0-rc4
What's Changed
- adjust epochs so we always train at least 1
- encode the --validation_prompt value too
- attach a Queue to each thread, instead of shared one
- logging reduction
- str byte conversion error for .txt captions
- delete text-embed backend caches at startup by @bghira in #285
Full Changelog: v0.9.0-rc3...v0.9.0-rc4
v0.9.0-rc3 - the case of the mysteriously missing text embeds
What's Changed
- fix missing text embed issue
- fix validations dtype param usage
- cogvlm: upload to s3 directly
- maximum_image_size and target_downsample_size dataset parameters for very large images by @bghira in #281
- text embed cache bug 2: electric bugaloo, fix reference to incorrect value when caching by @bghira in #282
Full Changelog: v0.9.0-rc2...v0.9.0-rc3
v0.9.0-rc2
What's Changed
- bugfix: validations cache load failure
- cleanup: remove sd 2.x collate_fn
- bugfix: ignore repeats and ignore_epochs in config value comparison at startup
- bugfix: performance regression in text embed cache initialisation
- bugfix: torch_load encountering empty files should delete them
- bugfix: exit cleanly all threads at shutdown
- bugfix: final validations should not crash for LoRA
- bugfix: unload text encoder after validations when it is moved to GPU for Compel
- documentation: update install / tutorial / readme by @bghira in #280
Full Changelog: v0.9.0-rc1...v0.9.0-rc2
v0.9.0-rc1
Release Candidate
This release has been mostly tested in a variety of situations, and two models are training with it.
It's most likely safe for production use, and the code from here for v0.9.0 is frozen to purely bugfixes.
A massive number of breaking changes since v0.8 are included. See the TUTORIAL for more information.
What's Changed
- VAECache: fix for jpg files not being detected/processed and then erroring out later by @bghira in #247
- Multi-dataset sampler by @bghira in #235
- v0.9.0-alpha by @bghira in #248
- Feature/multi dataset sampler by @bghira in #253
- allow disabling backends
- default noise scheduler should be euler
- fix state tracker IDs by @bghira in #254
- CogVLM: 4bit inference by default
- Diffusers: bump to 0.26.0
- MultiDataBackend: better support for epoch tracking across datasets.
- MultiDataBackend: throw error and end training when global epoch != dataset epoch.
- Logging: major reduction in debug noise
- SDXL: fix num update steps per epoch calculations
- SDXL: Fix number of batch display
- SDXL: Correctness fixes for global_step handling by @bghira in #255
- v0.9.0-alpha3 fixes for logging and probability config / epochs not continuing by @bghira in #256
- multidatabackend fix for non-square image training, data bucket config override by @bghira in #257
- LoRA trainer via --model_type by @bghira in #259
- Remove unnecessary code, simplify commandline args by @bghira in #260
- VAE cache rebuild and dataset repeats by @bghira in #261
- torch compile fixes | DeepSpeed save state fixes by @bghira in #263
- updates for next release by @bghira in #264
- collate_fn: multi-threaded retrieval of SDXL text embeds by @bghira in #265
- text embedding cache should write embeds in parallel by @bghira in #266
- text embedding cache should stop writing and kill the thread when we finish by @bghira in #267
- text embedding cache: optimise the generation of embeds by @bghira in #268
- multiple text embed caches | cache the text embed lists and only process meaningful prompts by @bghira in #269
- text embedding cache speed-up for slow backends (eg. S3 or spinning disks) by @bghira in #271
Full Changelog: v0.8.2...v0.9.0-rc1
v0.9.0-alpha5 - text embed cache data backend configuration
Breaking changes
Your text embed cache location now must be specified in the multidatabackend.json
configuration file. There is an example in the repository. The current text embed cache directory can be used, it should work as it did before.
Text embeds can now be stored on S3 buckets, leaving next to zero local storage in use for training other than model checkpoints.
What's Changed
- torch compile fixes
- DeepSpeed save state fixes by @bghira in #263
- updates for next release by @bghira in #264
- collate_fn: multi-threaded retrieval of SDXL text embeds by @bghira in #265
- text embedding cache should write embeds in parallel by @bghira in #266
- text embedding cache should stop writing and kill the thread when we finish by @bghira in #267
- text embedding cache: optimise the generation of embeds by @bghira in #268
- multiple text embed caches
- cache the text embed lists and only process meaningful prompts by @bghira in #269
Full Changelog: v0.9.0-alpha4...v0.9.0-alpha5
v0.9.0-alpha4 - we have the technology, we can rebuild him
What's Changed
- Added
repeats
andvae_cache_clear_each_epoch
to the data backend config.
See this document for more information on dataloader configuration options.
Code changes
- Remove unnecessary code, simplify commandline args by @bghira in #260
- VAE cache rebuild and dataset repeats by @bghira in #261
Full Changelog: v0.9.0-alpha3...v0.9.0-alpha4
v0.9.0-alpha3 - LoRA now, LoRA forever
What's Changed
LoRA training
-
Use
--model_type=lora
and--rank=...
to configure LoRA training. Everything else is drop-in! Use the same datasets, configs, etc. -
v0.9.0-alpha3 fixes for logging and probability config / epochs not continuing by @bghira in #256
-
multidatabackend fix for non-square image training, data bucket config override by @bghira in #257
Full Changelog: v0.9.0-alpha2...v0.9.0-alpha3
v0.9.0-alpha2 - fixes for multi-backend training
What's Changed
This is another alpha release following up on the multi-databackend work from v0.9.0-alpha.
As it is a prerelease, it is recommended to use caution and keep backups of sensitive data.
Great care has been taken to ensure this has "correctness" for this release. It might be wise to start a new training run for this release series due to the extensive changes in how checkpoints are saved and loaded.
- allow disabling backends
- default noise scheduler should be euler
- fix state tracker IDs by @bghira in #254
- CogVLM: 4bit inference by default
- Diffusers: bump to 0.25.0
- MultiDataBackend: better support for epoch tracking across datasets.
- MultiDataBackend: throw error and end training when global epoch != dataset epoch.
- Logging: major reduction in debug noise
- SDXL: fix num update steps per epoch calculations
- SDXL: Fix number of batch display
- SDXL: Correctness fixes for global_step handling by @bghira in #255
Full Changelog: v0.9.0-alpha...v0.9.0-alpha2
v0.9.0-alpha - multi-dataset edition
Changes
There's more info about these in OPTIONS.md
and TUTORIAL.md
.
New behaviour
- When a dataset config entry has
"scan_for_errors": true
it will be read entirely at startup and any bad images will be removed ifdelete_problematic_images: true
. It will remove any outdated cache entries. - Datasets are defined by a config file, this is now mandatory. Removing datasets can be achieved by setting
"disabled": true
in the dataset config entry.
Removed arguments
- All of the --aws_* commandline arguments were removed for privacy reasons, are now in the
multidatabackend.json
--data_backend
is now--data_backend_config
and is a path to a dataset config, seemultidatabackend.json.example
for help converting your existing configurations over
New arguments
--data_backend_config
- What: Path to your SimpleTuner dataset configuration, set as
DATALOADER_CONFIG
insdxl-env.sh
- Why: Multiple datasets on different storage medium may be combined into a single training session.
- Example: See (multidatabackend.json.example)[/multidatabackend.json.example] for an example configuration.
--override_dataset_config
- What: When provided, will allow SimpleTuner to ignore differences between the cached config inside the dataset and the current values.
- Why: When SimplerTuner is run for the first time on a dataset, it will create a cache document containing information about everything in that dataset. This includes the dataset config, including its "crop" and "resolution" related configuration values. Changing these arbitrarily or by accident could result in your training jobs crashing randomly, so it's highly recommended to not use this parameter, and instead resolve the differences you'd like to apply in your dataset some other way.
--vae_cache_behaviour
- What: Configure the behaviour of the integrity scan check.
- Why: A dataset could have incorrect settings applied at multiple points of training, eg. if you accidentally delete the
.json
cache files from your dataset and switch the data backend config to use square images rather than aspect-crops. This will result in an inconsistent data cache, which can be corrected by settingscan_for_errors
totrue
in yourmultidatabackend.json
configuration file. When this scan runs, it relies on the setting of--vae_cache_behaviour
to determine how to resolve the inconsistency:recreate
(the default) will remove the offending cache entry so that it can be recreated, andsync
will update the bucket metadata to reflect the reality of the real training sample. Recommended value:recreate
.
Full Changelog: v0.8.2...v0.9.0-alpha
v0.8.2 - fix for "area" based measurements
What's Changed
Full Changelog: v0.8.1-fix...v0.8.2