Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stable Diffusion deep dive notebook can't be run on 8GB GPUs #19

Open
jarandaf opened this issue Oct 25, 2022 · 5 comments
Open

Stable Diffusion deep dive notebook can't be run on 8GB GPUs #19

jarandaf opened this issue Oct 25, 2022 · 5 comments

Comments

@jarandaf
Copy link

OOM errors pop up when running the notebook on a 8GB GPU. I managed to run it successfully by using half-precision tensors (fp16) instead.

@kevinbird15
Copy link
Contributor

You can also try pipe.enable_attention_slicing() after you create pipe. I have had decent luck with this when running on my 4gb gpu in my laptop, but I haven't been able to run it through the full notebook. I was getting stuck when generating the image grid.

@jarandaf
Copy link
Author

Hi @kevinbird15. There is no pipe in Stable Diffusion Deep Dive notebook as it is (components are loaded independently). I think you may be talking about a different notebook (perhaps this one?).

@kevinbird15
Copy link
Contributor

You're right, my bad. Didn't see you were referring to the deep dive notebook in the title!

@kevinbird15
Copy link
Contributor

@jarandaf what if you add this before the "# To the GPU we go!" comment:

slice_size = unet.config.attention_head_dim // 2
unet.set_attention_slice(slice_size)

This is what is inside the enable_attention_slicing function:

Signature:
pipe.enable_attention_slicing(
    slice_size: Union[str, int, NoneType] = 'auto',
)
Source:   
    def enable_attention_slicing(self, slice_size: Optional[Union[str, int]] = "auto"):
        r"""
        Enable sliced attention computation.

        When this option is enabled, the attention module will split the input tensor in slices, to compute attention
        in several steps. This is useful to save some memory in exchange for a small speed decrease.

        Args:
            slice_size (`str` or `int`, *optional*, defaults to `"auto"`):
                When `"auto"`, halves the input to the attention heads, so attention will be computed in two steps. If
                a number is provided, uses as many slices as `attention_head_dim // slice_size`. In this case,
                `attention_head_dim` must be a multiple of `slice_size`.
        """
        if slice_size == "auto":
            # half the attention head size is usually a good trade-off between
            # speed and memory
            slice_size = self.unet.config.attention_head_dim // 2
        self.unet.set_attention_slice(slice_size)
File:      ~/.local/lib/python3.9/site-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py

@cgoldammer
Copy link

cgoldammer commented Nov 20, 2022

Running on Paperspace with 8GB GPU, I get the error on this line vae = vae.to(torch_device):

RuntimeError: CUDA out of memory. Tried to allocate 2.00 MiB 
(GPU 0; 7.80 GiB total capacity; 6.00 GiB already allocated; 
2.44 MiB free; 6.61 GiB reserved in total by PyTorch) 
If reserved memory is >> allocated memory try setting max_split_size_mb 
to avoid fragmentation.  See documentation for Memory 
Management and PYTORCH_CUDA_ALLOC_CONF

It looks like one can help by going to float16 for the models:

model = "CompVis/stable-diffusion-v1-4"
vae = AutoencoderKL.from_pretrained(model, subfolder="vae", torch_dtype=torch.float16)
unet = UNet2DConditionModel.from_pretrained(model, subfolder="unet", torch_dtype=torch.float16)

And then further down one this image = vae.decode(latents).sample should become image = vae.decode(latents.type(torch.float16)).sample.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants