Skip to content
This repository has been archived by the owner on Dec 14, 2023. It is now read-only.

Add InfiNet module for DiffusionOverDiffusion training to allow for extremely (minutes!) long video creation #27

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

kabachuha
Copy link
Contributor

@kabachuha kabachuha commented Apr 2, 2023

Hi, Exponential-ML!

As you probably know, a bit more than a week ago, Microsoft published their paper where they described the novel DiffusionOverDiffusion technique https://arxiv.org/abs/2303.12346 working by firstly outlining the coarse keyframes and then picking a pair of them as starting points and filling in-betweens (with different, more local prompts!)

image

Using it they were able to tune on and create whole 11 minutes long Flintstones episodes https://www.reddit.com/r/StableDiffusion/comments/11zwaxx/microsofts_nuwaxl_creates_an_11_minute/

Seeing their impressive results, I couldn't have restrained myself from trying to replicate them.

Having read the article, I noticed that the model structure is extremely similar to the ModelScope one, and the only difference is the 'video conditioning' layer (in green), which information is being transferred into the preexisting U-net3D by a set of Conv-down cells.

image

Thanks to them using so called zero-convolutions I realized that layer as a ControlNet-like network https://github.com/kabachuha/InfiNet, with which it is possible to introduce the new layers without altering the work of the existing model. (See DoDBlock in the code)

image

I already tested the inference with diffusion_depth=0 and diffusion_depth=1 (any diffusion_depth>0 turns on the DoD-blocks), so when inferring the model definitely works

image

I'll start training experiments as soon as I'll figure out the dataset and the system requirements for it

P.S. @ExponentialML, contact me on Discord. I'd really appreciate more close communications

@ExponentialML
Copy link
Owner

This is great @kabachuha! Thanks for this PR, and sure we can get in touch.

@sergiobr
Copy link
Contributor

sergiobr commented Apr 2, 2023

@kabachuha thanks for your contribution!
I agree would be nice to have a discord server or channel about txt2video showcase and tech discuss. I'll ping you there.

@kabachuha
Copy link
Contributor Author

@sergiobr hi, we have a some sort of a text2vodeo team on the Deforum discord server, join it :) https://discord.gg/deforum

@kabachuha
Copy link
Contributor Author

@ExponentialML training works, btw

@ExponentialML
Copy link
Owner

@ExponentialML training works, btw

Great! Let me know if any you need any assistance getting things up to speed with the new repository changes.

@kabachuha
Copy link
Contributor Author

kabachuha commented Apr 9, 2023

Yeah, I'd really appreciate help in carrying it over, since you know much better about the mainline changes

@ExponentialML
Copy link
Owner

Yeah, I'd really appreciate help in carrying it over, since you know much better about the mainline changes

By all means. Just let me know when it's ready to merge. If you don't want to resolve the conflicts yourself, I'm more than willing to do it 👍 .

@ExponentialML ExponentialML added the enhancement New feature or request label Apr 10, 2023
@kabachuha kabachuha marked this pull request as ready for review April 22, 2023 11:53
@kabachuha
Copy link
Contributor Author

Now sampling to a video folder dataset is working correctly

image

@ExponentialML ExponentialML mentioned this pull request May 6, 2023
@Gitterman69
Copy link

bump bump

@kabachuha
Copy link
Contributor Author

So, I'm going to write an automatic DoD captioner using OpenAI's (or other LLM provider, maybe local oobabooga).

How it will work:

  1. Multilevel DoD-splitting is done with the current script
  2. The lowest level subclips are captioned with BLIP2 (see @ExponentialML's repo)
  3. The LLM forms the upper level descriptions given just one global prompt for the whole video

It eliminates the difficulty of forming the mid-level captions

@Maki9009
Copy link

sooo any updates on this?

@Gitterman69
Copy link

bump
bump

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants