Stable Diffusion deep dive notebook can't be run on 8GB GPUs #19

jarandaf · 2022-10-25T12:45:39Z

OOM errors pop up when running the notebook on a 8GB GPU. I managed to run it successfully by using half-precision tensors (fp16) instead.

kevinbird15 · 2022-10-25T17:28:46Z

You can also try pipe.enable_attention_slicing() after you create pipe. I have had decent luck with this when running on my 4gb gpu in my laptop, but I haven't been able to run it through the full notebook. I was getting stuck when generating the image grid.

jarandaf · 2022-10-25T18:17:37Z

Hi @kevinbird15. There is no pipe in Stable Diffusion Deep Dive notebook as it is (components are loaded independently). I think you may be talking about a different notebook (perhaps this one?).

kevinbird15 · 2022-10-25T18:23:03Z

You're right, my bad. Didn't see you were referring to the deep dive notebook in the title!

kevinbird15 · 2022-10-25T20:52:53Z

@jarandaf what if you add this before the "# To the GPU we go!" comment:

slice_size = unet.config.attention_head_dim // 2
unet.set_attention_slice(slice_size)

This is what is inside the enable_attention_slicing function:

Signature:
pipe.enable_attention_slicing(
    slice_size: Union[str, int, NoneType] = 'auto',
)
Source:   
    def enable_attention_slicing(self, slice_size: Optional[Union[str, int]] = "auto"):
        r"""
        Enable sliced attention computation.

        When this option is enabled, the attention module will split the input tensor in slices, to compute attention
        in several steps. This is useful to save some memory in exchange for a small speed decrease.

        Args:
            slice_size (`str` or `int`, *optional*, defaults to `"auto"`):
                When `"auto"`, halves the input to the attention heads, so attention will be computed in two steps. If
                a number is provided, uses as many slices as `attention_head_dim // slice_size`. In this case,
                `attention_head_dim` must be a multiple of `slice_size`.
        """
        if slice_size == "auto":
            # half the attention head size is usually a good trade-off between
            # speed and memory
            slice_size = self.unet.config.attention_head_dim // 2
        self.unet.set_attention_slice(slice_size)
File:      ~/.local/lib/python3.9/site-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py

cgoldammer · 2022-11-20T21:09:48Z

Running on Paperspace with 8GB GPU, I get the error on this line vae = vae.to(torch_device):

RuntimeError: CUDA out of memory. Tried to allocate 2.00 MiB 
(GPU 0; 7.80 GiB total capacity; 6.00 GiB already allocated; 
2.44 MiB free; 6.61 GiB reserved in total by PyTorch) 
If reserved memory is >> allocated memory try setting max_split_size_mb 
to avoid fragmentation.  See documentation for Memory 
Management and PYTORCH_CUDA_ALLOC_CONF

It looks like one can help by going to float16 for the models:

model = "CompVis/stable-diffusion-v1-4"
vae = AutoencoderKL.from_pretrained(model, subfolder="vae", torch_dtype=torch.float16)
unet = UNet2DConditionModel.from_pretrained(model, subfolder="unet", torch_dtype=torch.float16)

And then further down one this image = vae.decode(latents).sample should become image = vae.decode(latents.type(torch.float16)).sample.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stable Diffusion deep dive notebook can't be run on 8GB GPUs #19

Stable Diffusion deep dive notebook can't be run on 8GB GPUs #19

jarandaf commented Oct 25, 2022

kevinbird15 commented Oct 25, 2022

jarandaf commented Oct 25, 2022

kevinbird15 commented Oct 25, 2022

kevinbird15 commented Oct 25, 2022

cgoldammer commented Nov 20, 2022 •

edited

Loading

Stable Diffusion deep dive notebook can't be run on 8GB GPUs #19

Stable Diffusion deep dive notebook can't be run on 8GB GPUs #19

Comments

jarandaf commented Oct 25, 2022

kevinbird15 commented Oct 25, 2022

jarandaf commented Oct 25, 2022

kevinbird15 commented Oct 25, 2022

kevinbird15 commented Oct 25, 2022

cgoldammer commented Nov 20, 2022 • edited Loading

cgoldammer commented Nov 20, 2022 •

edited

Loading