Skip to content

Releases: bghira/SimpleTuner

v0.9.0-rc4

24 Jan 23:41
1b2a482
Compare
Choose a tag to compare
v0.9.0-rc4 Pre-release
Pre-release

What's Changed

  • adjust epochs so we always train at least 1
  • encode the --validation_prompt value too
  • attach a Queue to each thread, instead of shared one
  • logging reduction
  • str byte conversion error for .txt captions
  • delete text-embed backend caches at startup by @bghira in #285

Full Changelog: v0.9.0-rc3...v0.9.0-rc4

v0.9.0-rc3 - the case of the mysteriously missing text embeds

22 Jan 22:44
0c25f88
Compare
Choose a tag to compare

What's Changed

  • fix missing text embed issue
  • fix validations dtype param usage
  • cogvlm: upload to s3 directly
  • maximum_image_size and target_downsample_size dataset parameters for very large images by @bghira in #281
  • text embed cache bug 2: electric bugaloo, fix reference to incorrect value when caching by @bghira in #282

Full Changelog: v0.9.0-rc2...v0.9.0-rc3

v0.9.0-rc2

19 Jan 23:46
21fec44
Compare
Choose a tag to compare
v0.9.0-rc2 Pre-release
Pre-release

What's Changed

  • bugfix: validations cache load failure
  • cleanup: remove sd 2.x collate_fn
  • bugfix: ignore repeats and ignore_epochs in config value comparison at startup
  • bugfix: performance regression in text embed cache initialisation
  • bugfix: torch_load encountering empty files should delete them
  • bugfix: exit cleanly all threads at shutdown
  • bugfix: final validations should not crash for LoRA
  • bugfix: unload text encoder after validations when it is moved to GPU for Compel
  • documentation: update install / tutorial / readme by @bghira in #280

Full Changelog: v0.9.0-rc1...v0.9.0-rc2

v0.9.0-rc1

16 Jan 14:14
0a7da2d
Compare
Choose a tag to compare
v0.9.0-rc1 Pre-release
Pre-release

Release Candidate

This release has been mostly tested in a variety of situations, and two models are training with it.

It's most likely safe for production use, and the code from here for v0.9.0 is frozen to purely bugfixes.

A massive number of breaking changes since v0.8 are included. See the TUTORIAL for more information.

What's Changed

  • VAECache: fix for jpg files not being detected/processed and then erroring out later by @bghira in #247
  • Multi-dataset sampler by @bghira in #235
  • v0.9.0-alpha by @bghira in #248
  • Feature/multi dataset sampler by @bghira in #253
  • allow disabling backends
  • default noise scheduler should be euler
  • fix state tracker IDs by @bghira in #254
  • CogVLM: 4bit inference by default
  • Diffusers: bump to 0.26.0
  • MultiDataBackend: better support for epoch tracking across datasets.
  • MultiDataBackend: throw error and end training when global epoch != dataset epoch.
  • Logging: major reduction in debug noise
  • SDXL: fix num update steps per epoch calculations
  • SDXL: Fix number of batch display
  • SDXL: Correctness fixes for global_step handling by @bghira in #255
  • v0.9.0-alpha3 fixes for logging and probability config / epochs not continuing by @bghira in #256
  • multidatabackend fix for non-square image training, data bucket config override by @bghira in #257
  • LoRA trainer via --model_type by @bghira in #259
  • Remove unnecessary code, simplify commandline args by @bghira in #260
  • VAE cache rebuild and dataset repeats by @bghira in #261
  • torch compile fixes | DeepSpeed save state fixes by @bghira in #263
  • updates for next release by @bghira in #264
  • collate_fn: multi-threaded retrieval of SDXL text embeds by @bghira in #265
  • text embedding cache should write embeds in parallel by @bghira in #266
  • text embedding cache should stop writing and kill the thread when we finish by @bghira in #267
  • text embedding cache: optimise the generation of embeds by @bghira in #268
  • multiple text embed caches | cache the text embed lists and only process meaningful prompts by @bghira in #269
  • text embedding cache speed-up for slow backends (eg. S3 or spinning disks) by @bghira in #271

Full Changelog: v0.8.2...v0.9.0-rc1

v0.9.0-alpha5 - text embed cache data backend configuration

15 Jan 21:18
776c38f
Compare
Choose a tag to compare

Breaking changes

Your text embed cache location now must be specified in the multidatabackend.json configuration file. There is an example in the repository. The current text embed cache directory can be used, it should work as it did before.

Text embeds can now be stored on S3 buckets, leaving next to zero local storage in use for training other than model checkpoints.

What's Changed

  • torch compile fixes
  • DeepSpeed save state fixes by @bghira in #263
  • updates for next release by @bghira in #264
  • collate_fn: multi-threaded retrieval of SDXL text embeds by @bghira in #265
  • text embedding cache should write embeds in parallel by @bghira in #266
  • text embedding cache should stop writing and kill the thread when we finish by @bghira in #267
  • text embedding cache: optimise the generation of embeds by @bghira in #268
  • multiple text embed caches
  • cache the text embed lists and only process meaningful prompts by @bghira in #269

Full Changelog: v0.9.0-alpha4...v0.9.0-alpha5

v0.9.0-alpha4 - we have the technology, we can rebuild him

13 Jan 04:07
5e2a6a0
Compare
Choose a tag to compare

What's Changed

  • Added repeats and vae_cache_clear_each_epoch to the data backend config.

See this document for more information on dataloader configuration options.

Code changes

  • Remove unnecessary code, simplify commandline args by @bghira in #260
  • VAE cache rebuild and dataset repeats by @bghira in #261

Full Changelog: v0.9.0-alpha3...v0.9.0-alpha4

v0.9.0-alpha3 - LoRA now, LoRA forever

07 Jan 06:25
135c12a
Compare
Choose a tag to compare
Pre-release

What's Changed

LoRA training

  • Use --model_type=lora and --rank=... to configure LoRA training. Everything else is drop-in! Use the same datasets, configs, etc.

  • v0.9.0-alpha3 fixes for logging and probability config / epochs not continuing by @bghira in #256

  • multidatabackend fix for non-square image training, data bucket config override by @bghira in #257

  • LoRA trainer via --model_type by @bghira in #259

Full Changelog: v0.9.0-alpha2...v0.9.0-alpha3

v0.9.0-alpha2 - fixes for multi-backend training

31 Dec 02:21
3b5781a
Compare
Choose a tag to compare

What's Changed

This is another alpha release following up on the multi-databackend work from v0.9.0-alpha.

As it is a prerelease, it is recommended to use caution and keep backups of sensitive data.

Great care has been taken to ensure this has "correctness" for this release. It might be wise to start a new training run for this release series due to the extensive changes in how checkpoints are saved and loaded.

  • allow disabling backends
  • default noise scheduler should be euler
  • fix state tracker IDs by @bghira in #254
  • CogVLM: 4bit inference by default
  • Diffusers: bump to 0.25.0
  • MultiDataBackend: better support for epoch tracking across datasets.
  • MultiDataBackend: throw error and end training when global epoch != dataset epoch.
  • Logging: major reduction in debug noise
  • SDXL: fix num update steps per epoch calculations
  • SDXL: Fix number of batch display
  • SDXL: Correctness fixes for global_step handling by @bghira in #255

Full Changelog: v0.9.0-alpha...v0.9.0-alpha2

v0.9.0-alpha - multi-dataset edition

25 Dec 23:53
f11ebad
Compare
Choose a tag to compare
Pre-release

image

Changes

There's more info about these in OPTIONS.md and TUTORIAL.md.

New behaviour

  • When a dataset config entry has "scan_for_errors": true it will be read entirely at startup and any bad images will be removed if delete_problematic_images: true. It will remove any outdated cache entries.
  • Datasets are defined by a config file, this is now mandatory. Removing datasets can be achieved by setting "disabled": true in the dataset config entry.

Removed arguments

  • All of the --aws_* commandline arguments were removed for privacy reasons, are now in the multidatabackend.json
  • --data_backend is now --data_backend_config and is a path to a dataset config, see multidatabackend.json.example for help converting your existing configurations over

New arguments

--data_backend_config

  • What: Path to your SimpleTuner dataset configuration, set as DATALOADER_CONFIG in sdxl-env.sh
  • Why: Multiple datasets on different storage medium may be combined into a single training session.
  • Example: See (multidatabackend.json.example)[/multidatabackend.json.example] for an example configuration.

--override_dataset_config

  • What: When provided, will allow SimpleTuner to ignore differences between the cached config inside the dataset and the current values.
  • Why: When SimplerTuner is run for the first time on a dataset, it will create a cache document containing information about everything in that dataset. This includes the dataset config, including its "crop" and "resolution" related configuration values. Changing these arbitrarily or by accident could result in your training jobs crashing randomly, so it's highly recommended to not use this parameter, and instead resolve the differences you'd like to apply in your dataset some other way.

--vae_cache_behaviour

  • What: Configure the behaviour of the integrity scan check.
  • Why: A dataset could have incorrect settings applied at multiple points of training, eg. if you accidentally delete the .json cache files from your dataset and switch the data backend config to use square images rather than aspect-crops. This will result in an inconsistent data cache, which can be corrected by setting scan_for_errors to true in your multidatabackend.json configuration file. When this scan runs, it relies on the setting of --vae_cache_behaviour to determine how to resolve the inconsistency: recreate (the default) will remove the offending cache entry so that it can be recreated, and sync will update the bucket metadata to reflect the reality of the real training sample. Recommended value: recreate.

Full Changelog: v0.8.2...v0.9.0-alpha

v0.8.2 - fix for "area" based measurements

22 Dec 20:35
351171f
Compare
Choose a tag to compare

What's Changed

  • VAECache: fix "area" resolution_type mode incorrectly resizing to 1x1 by @bghira in #245

Full Changelog: v0.8.1-fix...v0.8.2