Multi-Seed Gradient Accumulation? #2877

Hergotzer · 2024-10-01T16:22:36Z

Hergotzer
Oct 1, 2024

I have been lately testing the effects of training a Lora using multiple different seeds throughout the process. So far, the results have been pretty good, as long as one resumes training using saved training states after changing the seed, rather than defining just the network weights to resume from.

However, as I was looking through all the various options, settings and such, I came to read more about Gradient Accumulation. And it gave me a little idea for my multi-seed training:

What would be the effects of a type of Gradient Accumulation, where each backward pass would not be composed of the gradients of multiple different images generated with the same seed, but rather the gradients of the same image generated with a number of different seeds?

There would be a problem with regularization images if they are used, as for each seed used in this type of training, a regularization image set generated with that seed would need to be defined.

Now, this is just an idea, so I'm not here asking it to be added. Rather, I am here to ask some help on how I could potentially implement this to my copy of Kohya_ss so I can test the effects it would have on the results.

I unfortunately don't have a lot of experience with this kind of stuff, but some pointers would be greatly appreciated. For example, I'd like to find the code on how normal Gradient Accumulation is done in Kohya_ss so I could maybe try to just modify it directly for testing purposes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-Seed Gradient Accumulation? #2877

{{title}}

Replies: 0 comments

Select a reply

Multi-Seed Gradient Accumulation? #2877

Hergotzer Oct 1, 2024

Replies: 0 comments

Hergotzer
Oct 1, 2024