Implementation of Token Downsampling for stable-diffusion-webui
Based on the reference implementation by Ethan Smith: https://github.com/ethansmith2000/ImprovedTokenMerge
Token Downsampling is an optimization that improves upon token merging, with a focus on improving performance and preserving output quality.
This extension is compatible with aria1th's DeepCache extension.
Settings are found under Settings > Token Downsampling.
- Token downsampling factor: Set higher than 1 to enable ToDo.
- Recommended: 2-3
- Token downsampling max depth: Raising this affects more layers but reduces quality for not much gain.
- Recommended: 1 (Default)
- Token downsampling disable after: Disable ToDo after a percentage of steps to improve details.
- Recommended: 0.6-0.8
Downsampling factor and max depth can be raised a bit at higher resolutions (>1536px) because there's more redundant information.
In addition to inference, ToDo (and ToME) can massively speed up training as well. I find that LoRAs trained this way maintain image quality much better at higher downsampling factors, so I can highly recommend training with ToDo if it's an option (pending PR in kohya/sd-scripts).
Higher settings greatly reduce quality and don't offer much speed improvement.
Improve detail at higher settings (max depth 2) by disabling ToDo near the end.
Bonus: LoRA trained with ToDo
High downsampling factor and max depth with much less quality loss. Combined with DeepCache I get about ~2.3x speedup.