[Issue]: High VRAM usage during Vae step #3416

zaxwashere · 2024-09-10T17:52:34Z

Issue Description

Vram usage during the VAE step is inconsistent and will spike to >12 gb for an sdxl model. This is atypical for my usage, where an SDXL model will stay at 10gb or less during the vae step with my settings all applied:

fp16 mode, vae slicing and vae tiling = true, vae upcast = false
1024x1024 10 steps dpm ++2m, sdxl timestep presets used, cfg = 3, no attention guidance, no loras applied.

Disabling "use cached model config when available" removes the issue, and generation speeds will be 8 -10 seconds.

VRAM usage in the console does not reflect the usage as seen in task manager or in the webui, attached is a screenshot of the vram usage during a run.
sdnext (1).log

Version Platform Description

13:30:50-670748 INFO Logger: file="C:\Users\zaxof\OneDrive\Documents\GitHub\nvidia_sdnext\sdnext.log" level=DEBUG
size=65 mode=create
13:30:50-672246 INFO Python version=3.10.6 platform=Windows
bin="C:\Users\zaxof\OneDrive\Documents\GitHub\nvidia_sdnext\venv\Scripts\python.exe"
venv="C:\Users\zaxof\OneDrive\Documents\GitHub\nvidia_sdnext\venv"
13:30:50-859782 INFO Version: app=sd.next updated=2024-09-10 hash=91bdd3b3 branch=dev
url=https://github.com/vladmandic/automatic.git/tree/dev ui=dev
13:30:51-186334 INFO Updating main repository
13:30:52-008006 INFO Upgraded to version: 91bdd3b Tue Sep 10 19:20:49 2024 +0300
13:30:52-015505 INFO Platform: arch=AMD64 cpu=AMD64 Family 25 Model 33 Stepping 2, AuthenticAMD system=Windows
release=Windows-10-10.0.22631-SP0 python=3.10.6
13:30:52-017006 DEBUG Setting environment tuning
13:30:52-018506 INFO HF cache folder: C:\Users\zaxof.cache\huggingface\hub
13:30:52-019506 DEBUG Torch allocator: "garbage_collection_threshold:0.80,max_split_size_mb:512"
13:30:52-026016 DEBUG Torch overrides: cuda=False rocm=False ipex=False diml=False openvino=False
13:30:52-027513 DEBUG Torch allowed: cuda=True rocm=True ipex=True diml=True openvino=True
13:30:52-037517 INFO nVidia CUDA toolkit detected: nvidia-smi present

Extensions : Extensions all: ['a1111-sd-webui-tagcomplete', 'adetailer', 'OneButtonPrompt',
'sd-civitai-browser-plus_fix', 'sd-webui-infinite-image-browsing', 'sd-webui-inpaint-anything',
'sd-webui-prompt-all-in-one']

Windows 11, RTX 3060 12gb, 5700x3d, 64gb ddr4, dev branch SDNEXT, firefox browser on desktop, chrome on android for remote access.

Relevant log output

No response

Backend

Diffusers

UI

Standard

Branch

Dev

Model

StableDiffusion XL

Acknowledgements

I have read the above and searched for existing issues
I confirm that this is classified correctly and its not an extension issue

The text was updated successfully, but these errors were encountered:

vladmandic · 2024-09-11T12:27:44Z

i cannot reproduce. i've added some extra logging, please set env variable SD_VAE_DEBUG=true and run. note that sdnext should be restarted after changing use cached model config.
post logs starting with TRACE for both runs.

zaxwashere · 2024-09-11T18:06:42Z

re-ran it with the env variable active.

13:51:08-388186 DEBUG    Sampler: sampler="DPM++ 2M" config={'num_train_timesteps': 1000, 'beta_start': 0.00085,
                         'beta_end': 0.012, 'beta_schedule': 'scaled_linear', 'prediction_type': 'epsilon',
                         'thresholding': False, 'sample_max_value': 1.0, 'algorithm_type': 'sde-dpmsolver++',
                         'solver_type': 'midpoint', 'lower_order_final': True, 'use_karras_sigmas': False,
                         'final_sigmas_type': 'zero', 'timestep_spacing': 'leading', 'solver_order': 2}
13:51:08-390185 DEBUG    Sampler: steps=10 timesteps=[999, 845, 730, 587, 443, 310, 193, 116, 53, 13]
13:51:08-392183 DEBUG    Torch generator: device=cuda seeds=[1158966623]
13:51:08-393184 DEBUG    Diffuser pipeline: StableDiffusionXLPipeline task=DiffusersTaskType.TEXT_2_IMAGE batch=1/1x1
                         set={'timesteps': [999, 845, 730, 587, 443, 310, 193, 116, 53, 13], 'prompt_embeds':
                         torch.Size([1, 154, 2048]), 'pooled_prompt_embeds': torch.Size([1, 1280]),
                         'negative_prompt_embeds': torch.Size([1, 154, 2048]), 'negative_pooled_prompt_embeds':
                         torch.Size([1, 1280]), 'guidance_scale': 3, 'num_inference_steps': 20, 'eta': 1.0,
                         'guidance_rescale': 0.7, 'denoising_end': None, 'output_type': 'latent', 'width': 1024,
                         'height': 1024, 'parser': 'Full parser'}
Progress  1.49it/s █████████████████████████████████ 100% 10/10 00:06 00:00 Base
13:51:15-342317 DEBUG    GC: utilization={'gpu': 71, 'ram': 3, 'threshold': 80} gc={'collected': 386, 'saved': 0.66}
                         before={'gpu': 8.55, 'ram': 2.16} after={'gpu': 7.89, 'ram': 2.16, 'retries': 0, 'oom': 0}
                         device=cuda fn=full_vae_decode time=0.22
13:51:15-857910 TRACE    VAE config: FrozenDict([('in_channels', 3), ('out_channels', 3), ('down_block_types',
                         ['DownEncoderBlock2D', 'DownEncoderBlock2D', 'DownEncoderBlock2D', 'DownEncoderBlock2D']),
                         ('up_block_types', ['UpDecoderBlock2D', 'UpDecoderBlock2D', 'UpDecoderBlock2D',
                         'UpDecoderBlock2D']), ('block_out_channels', [128, 256, 512, 512]), ('layers_per_block', 2),
                         ('act_fn', 'silu'), ('latent_channels', 4), ('norm_num_groups', 32), ('sample_size', 1024),
                         ('scaling_factor', 0.13025), ('shift_factor', None), ('latents_mean', None), ('latents_std',
                         None), ('force_upcast', True), ('use_quant_conv', True), ('use_post_quant_conv', True),
                         ('mid_block_add_attention', True), ('_use_default_values', ['latents_std',
                         'use_post_quant_conv', 'mid_block_add_attention', 'latents_mean', 'shift_factor',
                         'use_quant_conv']), ('_class_name', 'AutoencoderKL'), ('_diffusers_version', '0.20.0.dev0'),
                         ('_name_or_path', '../sdxl-vae/')])
13:51:15-861410 TRACE    VAE memory: defaultdict(<class 'int'>, {'retries': 0, 'oom': 0, 'free': 137363456, 'total':
                         12884377600, 'active': 2672, 'active_peak': 9631253504, 'reserved': 11605639168,
                         'reserved_peak': 11725176832, 'used': 12747014144})
13:51:15-863413 TRACE    VAE decode: name=fixFP16ErrorsSDXLLowerMemoryUse_v10.safetensors dtype=torch.float16
                         upcast=False images=1 latents=torch.Size([1, 4, 128, 128]) time=0.741
13:51:16-051942 DEBUG    Profile: VAE decode: 0.93
13:51:16-298983 DEBUG    GC: utilization={'gpu': 99, 'ram': 3, 'threshold': 80} gc={'collected': 254, 'saved': 3.97}
                         before={'gpu': 11.87, 'ram': 2.16} after={'gpu': 7.9, 'ram': 2.16, 'retries': 0, 'oom': 0}
                         device=cuda fn=vae_decode time=0.25
13:51:16-343487 INFO     Save: image="outputs\text\06720-novaAnimeXL_ponyV40-Score 9 score 8 up score 7 up.jpg"
                         type=JPEG width=1024 height=1024 size=133251
13:51:16-345488 INFO     Processed: images=1 time=7.97 its=1.25 memory={'ram': {'used': 2.16, 'total': 63.9}, 'gpu':
                         {'used': 7.9, 'total': 12.0}, 'retries': 0, 'oom': 0}
13:51:22-375787 INFO     Base: class=StableDiffusionXLPipeline
13:51:22-377290 DEBUG    Sampler: sampler="DPM++ 2M" config={'num_train_timesteps': 1000, 'beta_start': 0.00085,
                         'beta_end': 0.012, 'beta_schedule': 'scaled_linear', 'prediction_type': 'epsilon',
                         'thresholding': False, 'sample_max_value': 1.0, 'algorithm_type': 'sde-dpmsolver++',
                         'solver_type': 'midpoint', 'lower_order_final': True, 'use_karras_sigmas': False,
                         'final_sigmas_type': 'zero', 'timestep_spacing': 'leading', 'solver_order': 2}
13:51:22-379288 DEBUG    Sampler: steps=10 timesteps=[999, 845, 730, 587, 443, 310, 193, 116, 53, 13]
13:51:22-381287 DEBUG    Torch generator: device=cuda seeds=[2858245960]
13:51:22-382287 DEBUG    Diffuser pipeline: StableDiffusionXLPipeline task=DiffusersTaskType.TEXT_2_IMAGE batch=1/1x1
                         set={'timesteps': [999, 845, 730, 587, 443, 310, 193, 116, 53, 13], 'prompt_embeds':
                         torch.Size([1, 154, 2048]), 'pooled_prompt_embeds': torch.Size([1, 1280]),
                         'negative_prompt_embeds': torch.Size([1, 154, 2048]), 'negative_pooled_prompt_embeds':
                         torch.Size([1, 1280]), 'guidance_scale': 3, 'num_inference_steps': 20, 'eta': 1.0,
                         'guidance_rescale': 0.7, 'denoising_end': None, 'output_type': 'latent', 'width': 1024,
                         'height': 1024, 'parser': 'Full parser'}
Progress  1.19it/s █████████████████████████████████ 100% 10/10 00:08 00:00 Base
13:51:31-083449 DEBUG    GC: utilization={'gpu': 71, 'ram': 3, 'threshold': 80} gc={'collected': 385, 'saved': 0.57}
                         before={'gpu': 8.46, 'ram': 2.16} after={'gpu': 7.89, 'ram': 2.16, 'retries': 0, 'oom': 0}
                         device=cuda fn=full_vae_decode time=0.22
13:51:36-590122 TRACE    VAE config: FrozenDict([('in_channels', 3), ('out_channels', 3), ('down_block_types',
                         ['DownEncoderBlock2D', 'DownEncoderBlock2D', 'DownEncoderBlock2D', 'DownEncoderBlock2D']),
                         ('up_block_types', ['UpDecoderBlock2D', 'UpDecoderBlock2D', 'UpDecoderBlock2D',
                         'UpDecoderBlock2D']), ('block_out_channels', [128, 256, 512, 512]), ('layers_per_block', 2),
                         ('act_fn', 'silu'), ('latent_channels', 4), ('norm_num_groups', 32), ('sample_size', 1024),
                         ('scaling_factor', 0.13025), ('shift_factor', None), ('latents_mean', None), ('latents_std',
                         None), ('force_upcast', True), ('use_quant_conv', True), ('use_post_quant_conv', True),
                         ('mid_block_add_attention', True), ('_use_default_values', ['latents_std',
                         'use_post_quant_conv', 'mid_block_add_attention', 'latents_mean', 'shift_factor',
                         'use_quant_conv']), ('_class_name', 'AutoencoderKL'), ('_diffusers_version', '0.20.0.dev0'),
                         ('_name_or_path', '../sdxl-vae/')])
13:51:36-593621 TRACE    VAE memory: defaultdict(<class 'int'>, {'retries': 0, 'oom': 0, 'free': 3857711104, 'total':
                         12884377600, 'active': 2672, 'active_peak': 9631253504, 'reserved': 7885291520,
                         'reserved_peak': 11366563840, 'used': 9026666496})
13:51:36-595622 TRACE    VAE decode: name=fixFP16ErrorsSDXLLowerMemoryUse_v10.safetensors dtype=torch.float16
                         upcast=False images=1 latents=torch.Size([1, 4, 128, 128]) time=5.727
13:51:36-606121 DEBUG    Profile: VAE decode: 5.74
13:51:36-646633 INFO     Save: image="outputs\text\06721-novaAnimeXL_ponyV40-Score 9 score 8 up score 7 up.jpg"
                         type=JPEG width=1024 height=1024 size=150607
13:51:36-648635 INFO     Processed: images=1 time=14.29 its=0.70 memory={'ram': {'used': 2.16, 'total': 63.9}, 'gpu':
                         {'used': 8.41, 'total': 12.0}, 'retries': 0, 'oom': 0}```
                         
It is inconsistent sometimes the vae is fast, other times it takes almost as long as the whole generation.


here is after a restart with "cached config" unchecked.


```14:01:59-877365 INFO     Base: class=StableDiffusionXLPipeline
14:01:59-879361 DEBUG    Sampler: sampler="DPM++ 2M" config={'num_train_timesteps': 1000, 'beta_start': 0.00085,
                         'beta_end': 0.012, 'beta_schedule': 'scaled_linear', 'prediction_type': 'epsilon',
                         'thresholding': False, 'sample_max_value': 1.0, 'algorithm_type': 'sde-dpmsolver++',
                         'solver_type': 'midpoint', 'lower_order_final': True, 'use_karras_sigmas': False,
                         'final_sigmas_type': 'zero', 'timestep_spacing': 'leading', 'solver_order': 2}
14:01:59-880862 DEBUG    Sampler: steps=10 timesteps=[999, 845, 730, 587, 443, 310, 193, 116, 53, 13]
14:01:59-882861 DEBUG    Torch generator: device=cuda seeds=[1523340005]
14:01:59-883862 DEBUG    Diffuser pipeline: StableDiffusionXLPipeline task=DiffusersTaskType.TEXT_2_IMAGE batch=1/1x1
                         set={'timesteps': [999, 845, 730, 587, 443, 310, 193, 116, 53, 13], 'prompt_embeds':
                         torch.Size([1, 154, 2048]), 'pooled_prompt_embeds': torch.Size([1, 1280]),
                         'negative_prompt_embeds': torch.Size([1, 154, 2048]), 'negative_pooled_prompt_embeds':
                         torch.Size([1, 1280]), 'guidance_scale': 3, 'num_inference_steps': 10, 'eta': 1.0,
                         'guidance_rescale': 0.7, 'denoising_end': None, 'output_type': 'latent', 'width': 1024,
                         'height': 1024, 'parser': 'Full parser'}
Progress ?it/s                                              0% 0/10 00:00 ? Base14:02:00-413434 DEBUG    Server: alive=True jobs=0 requests=352 uptime=313 memory=1.91/63.9 backend=Backend.DIFFUSERS
                         state=idle
Progress  1.19it/s █████████████████████████████████ 100% 10/10 00:08 00:00 Base
14:02:08-625569 DEBUG    GC: utilization={'gpu': 66, 'ram': 3, 'threshold': 80} gc={'collected': 393, 'saved': 0.0}
                         before={'gpu': 7.9, 'ram': 1.91} after={'gpu': 7.9, 'ram': 1.91, 'retries': 0, 'oom': 0}
                         device=cuda fn=full_vae_decode time=0.22
14:02:12-525005 TRACE    VAE config: FrozenDict([('in_channels', 3), ('out_channels', 3), ('down_block_types',
                         ['DownEncoderBlock2D', 'DownEncoderBlock2D', 'DownEncoderBlock2D', 'DownEncoderBlock2D']),
                         ('up_block_types', ['UpDecoderBlock2D', 'UpDecoderBlock2D', 'UpDecoderBlock2D',
                         'UpDecoderBlock2D']), ('block_out_channels', [128, 256, 512, 512]), ('layers_per_block', 2),
                         ('act_fn', 'silu'), ('latent_channels', 4), ('norm_num_groups', 32), ('sample_size', 1024),
                         ('scaling_factor', 0.13025), ('shift_factor', None), ('latents_mean', None), ('latents_std',
                         None), ('force_upcast', True), ('use_quant_conv', True), ('use_post_quant_conv', True),
                         ('mid_block_add_attention', True), ('_use_default_values', ['use_quant_conv', 'latents_mean',
                         'mid_block_add_attention', 'use_post_quant_conv', 'latents_std', 'shift_factor']),
                         ('_class_name', 'AutoencoderKL'), ('_diffusers_version', '0.21.0.dev0'), ('_name_or_path',
                         '/home/patrick/.cache/huggingface/hub/models--lykon-models--dreamshaper-8/snapshots/7e855e3f481
                         832419503d1fa18d4a4379597f04b/vae')])
14:02:12-528508 TRACE    VAE memory: defaultdict(<class 'int'>, {'retries': 0, 'oom': 0, 'free': 4344250368, 'total':
                         12884377600, 'active': 2673, 'active_peak': 8134968320, 'reserved': 7390363648,
                         'reserved_peak': 8545894400, 'used': 8540127232})
14:02:12-530504 TRACE    VAE decode: name=fixFP16ErrorsSDXLLowerMemoryUse_v10.safetensors dtype=torch.float16
                         upcast=False images=1 latents=torch.Size([1, 4, 128, 128]) time=4.116
14:02:12-538504 DEBUG    Profile: VAE decode: 4.13
14:02:12-582516 INFO     Save: image="outputs\text\06727-novaAnimeXL_ponyV40-Score 9 score 8 up score 7 up.jpg"
                         type=JPEG width=1024 height=1024 size=166048
14:02:12-584518 INFO     Processed: images=1 time=12.72 its=0.79 memory={'ram': {'used': 1.92, 'total': 63.9}, 'gpu':
                         {'used': 7.95, 'total': 12.0}, 'retries': 0, 'oom': 0}
14:03:31-516232 INFO     Base: class=StableDiffusionXLPipeline
14:03:31-518232 DEBUG    Sampler: sampler="DPM++ 2M" config={'num_train_timesteps': 1000, 'beta_start': 0.00085,
                         'beta_end': 0.012, 'beta_schedule': 'scaled_linear', 'prediction_type': 'epsilon',
                         'thresholding': False, 'sample_max_value': 1.0, 'algorithm_type': 'sde-dpmsolver++',
                         'solver_type': 'midpoint', 'lower_order_final': True, 'use_karras_sigmas': False,
                         'final_sigmas_type': 'zero', 'timestep_spacing': 'leading', 'solver_order': 2}
14:03:31-520232 DEBUG    Sampler: steps=10 timesteps=[999, 845, 730, 587, 443, 310, 193, 116, 53, 13]
14:03:31-522235 DEBUG    Torch generator: device=cuda seeds=[434621457]
14:03:31-523232 DEBUG    Diffuser pipeline: StableDiffusionXLPipeline task=DiffusersTaskType.TEXT_2_IMAGE batch=1/1x1
                         set={'timesteps': [999, 845, 730, 587, 443, 310, 193, 116, 53, 13], 'prompt_embeds':
                         torch.Size([1, 154, 2048]), 'pooled_prompt_embeds': torch.Size([1, 1280]),
                         'negative_prompt_embeds': torch.Size([1, 154, 2048]), 'negative_pooled_prompt_embeds':
                         torch.Size([1, 1280]), 'guidance_scale': 3, 'num_inference_steps': 10, 'eta': 1.0,
                         'guidance_rescale': 0.7, 'denoising_end': None, 'output_type': 'latent', 'width': 1024,
                         'height': 1024, 'parser': 'Full parser'}
Progress  1.19it/s █████████████████████████████████ 100% 10/10 00:08 00:00 Base
14:03:40-225062 DEBUG    GC: utilization={'gpu': 66, 'ram': 3, 'threshold': 80} gc={'collected': 399, 'saved': 0.0}
                         before={'gpu': 7.9, 'ram': 1.9} after={'gpu': 7.9, 'ram': 1.9, 'retries': 0, 'oom': 0}
                         device=cuda fn=full_vae_decode time=0.22
14:03:44-130741 TRACE    VAE config: FrozenDict([('in_channels', 3), ('out_channels', 3), ('down_block_types',
                         ['DownEncoderBlock2D', 'DownEncoderBlock2D', 'DownEncoderBlock2D', 'DownEncoderBlock2D']),
                         ('up_block_types', ['UpDecoderBlock2D', 'UpDecoderBlock2D', 'UpDecoderBlock2D',
                         'UpDecoderBlock2D']), ('block_out_channels', [128, 256, 512, 512]), ('layers_per_block', 2),
                         ('act_fn', 'silu'), ('latent_channels', 4), ('norm_num_groups', 32), ('sample_size', 1024),
                         ('scaling_factor', 0.13025), ('shift_factor', None), ('latents_mean', None), ('latents_std',
                         None), ('force_upcast', True), ('use_quant_conv', True), ('use_post_quant_conv', True),
                         ('mid_block_add_attention', True), ('_use_default_values', ['use_quant_conv', 'latents_mean',
                         'mid_block_add_attention', 'use_post_quant_conv', 'latents_std', 'shift_factor']),
                         ('_class_name', 'AutoencoderKL'), ('_diffusers_version', '0.21.0.dev0'), ('_name_or_path',
                         '/home/patrick/.cache/huggingface/hub/models--lykon-models--dreamshaper-8/snapshots/7e855e3f481
                         832419503d1fa18d4a4379597f04b/vae')])
14:03:44-134241 TRACE    VAE memory: defaultdict(<class 'int'>, {'retries': 0, 'oom': 0, 'free': 4344250368, 'total':
                         12884377600, 'active': 2673, 'active_peak': 8134968320, 'reserved': 7390363648,
                         'reserved_peak': 8545894400, 'used': 8540127232})
14:03:44-136242 TRACE    VAE decode: name=fixFP16ErrorsSDXLLowerMemoryUse_v10.safetensors dtype=torch.float16
                         upcast=False images=1 latents=torch.Size([1, 4, 128, 128]) time=4.121
14:03:44-144242 DEBUG    Profile: VAE decode: 4.13
14:03:44-186753 INFO     Save: image="outputs\text\06728-novaAnimeXL_ponyV40-Score 9 score 8 up score 7 up.jpg"
                         type=JPEG width=1024 height=1024 size=139766
14:03:44-189254 INFO     Processed: images=1 time=12.69 its=0.79 memory={'ram': {'used': 1.92, 'total': 63.9}, 'gpu':
                         {'used': 7.95, 'total': 12.0}, 'retries': 0, 'oom': 0}```

vladmandic · 2024-09-11T22:02:32Z

i can see some the difference with vs without config: 8.1gb vs 9.6gb
but i also see absolutely zero differences in the config itself.
and there is no proof of vram spike above 12gb as originally reported.

also, no matter what i do, i cannot reproduce this.
if someone has an idea or is able to reproduce separately, i'm really curious.

zaxwashere · 2024-09-12T13:15:34Z

I did a fresh installation and the issue persisted. I realized that the configs are cached in users/myusername/.cache/huggingface. I just deleted all of that, but are there any other shared locations for cached data to hide that might be contributing to my problem?

vladmandic · 2024-09-12T14:50:57Z

downloaded config is in users/myusername/.cache/huggingface
if you use "cached config" option, its exactly so this download is not required and it will use config in configs/ (for sdxl, it would be configs/sdxl)
also, you say that issue persists - but none of the logs you've uploaded with SD_VAE_DEBUG enabled show the spike above 10gb.

zaxwashere · 2024-09-12T19:58:51Z

(had to delete my prior comment, formatting got jumbled)

My vram usage spikes above 10 gb per task manager and the webui under the preview image (labeled as GPU active). Vram usage is a bit inconsistent overall, there's probably some GC tweaking that I need to do.

My hunch is that vae tiling isn't being applied, but that's based only on the pattern I see. Vram usage is identical with it on or off when using the cached configuration. Let me know if there's anything else I can try.

SDNext Dev Branch
RTX 3060 12gb		Driver 555.99
Windows 11 Pro 23H2		Torch 2.4.1+cu124
64 gb DDR4 3600mhz cl 18		RainponyXL
Ryzen 5700x3d		sdxl fp16 fixed vae

3 Run averages, 1024x Resolution
	Cached Config Off		Cached config On
Vae Tiling	On	Off	On	Off
Vae decode (secs)	3.08	3.67	3.89	3.91
Active	8103	10983	11001	10935
reserved	7368	7357	7476	7410
used	8452	8444	8560	8494
free	3836	3844	3728	3794

vladmandic · 2024-09-13T13:01:10Z

ah, i may have found it. seems like vae was not typecast to fp16 if config was specified. so even if upcast is disabled, its pointless since its loaded as fp32.

update and try to reproduce. if issue persists, update here and i'll reopen.
and upload full log for both runs with and without config.
before running test, set env variable SD_VAE_DEBUG=true

zaxwashere · 2024-10-01T20:08:25Z

cached config OFF.log
cached config ON.log

Issue still persists. I've attached screenshots of the webui generation info + screenshots of task manager during each run. Cached config uses significantly more vram and starts using shared memory.

I used a fresh instance of sdnext dev without extensions. I ran 2 generations and attached the logs with --debug and sd_vae_debug=true env variable.

vladmandic · 2024-10-01T21:49:48Z

i've reopened if someone wants to take a shot at it.
i consider this very low priority since its not reproducible AND workaround is well known.

tampadesignr · 2024-10-04T00:05:59Z

my system spiked twice and crashed my system, just saying hes not the only one.
when i tested same model in invoke system stayed stable.

vladmandic · 2024-10-04T00:59:23Z

my system spiked twice and crashed my system, just saying hes not the only one.
when i tested same model in invoke system stayed stable.

general statements without logs or any info on platform or settings are not helpful.

tampadesignr · 2024-10-04T01:09:59Z

#3471
couldnt find any info to any of those questions and i feel like the answer to those holds some info related to this
answer those questions in detail and well come back to this

vladmandic · 2024-10-04T01:15:08Z

#3471 couldnt find any info to any of those questions and i feel like the answer to those holds some info related to this answer those questions in detail and well come back to this

that item not related at all.

tampadesignr · 2024-10-04T01:16:57Z

there is an issue with how your system is handling diffusers.

vladmandic · 2024-10-04T01:19:32Z

there is an issue with how your system is handling diffusers.

maybe there is. create an issue and document it. do not post random comments on completely unrelated issues.

vladmandic added question Further information is requested cannot reproduce Reported issue cannot be easily reproducible labels Sep 11, 2024

vladmandic added help wanted Extra attention is needed and removed question Further information is requested labels Sep 11, 2024

vladmandic closed this as completed Sep 13, 2024

vladmandic reopened this Oct 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Issue]: High VRAM usage during Vae step #3416

[Issue]: High VRAM usage during Vae step #3416

zaxwashere commented Sep 10, 2024

vladmandic commented Sep 11, 2024

zaxwashere commented Sep 11, 2024

vladmandic commented Sep 11, 2024

zaxwashere commented Sep 12, 2024

vladmandic commented Sep 12, 2024 •

edited

Loading

zaxwashere commented Sep 12, 2024

vladmandic commented Sep 13, 2024

zaxwashere commented Oct 1, 2024

vladmandic commented Oct 1, 2024 •

edited

Loading

tampadesignr commented Oct 4, 2024

vladmandic commented Oct 4, 2024

tampadesignr commented Oct 4, 2024

vladmandic commented Oct 4, 2024

tampadesignr commented Oct 4, 2024

vladmandic commented Oct 4, 2024

[Issue]: High VRAM usage during Vae step #3416

[Issue]: High VRAM usage during Vae step #3416

Comments

zaxwashere commented Sep 10, 2024

Issue Description

Version Platform Description

Relevant log output

Backend

UI

Branch

Model

Acknowledgements

vladmandic commented Sep 11, 2024

zaxwashere commented Sep 11, 2024

vladmandic commented Sep 11, 2024

zaxwashere commented Sep 12, 2024

vladmandic commented Sep 12, 2024 • edited Loading

zaxwashere commented Sep 12, 2024

vladmandic commented Sep 13, 2024

zaxwashere commented Oct 1, 2024

vladmandic commented Oct 1, 2024 • edited Loading

tampadesignr commented Oct 4, 2024

vladmandic commented Oct 4, 2024

tampadesignr commented Oct 4, 2024

vladmandic commented Oct 4, 2024

tampadesignr commented Oct 4, 2024

vladmandic commented Oct 4, 2024

vladmandic commented Sep 12, 2024 •

edited

Loading

vladmandic commented Oct 1, 2024 •

edited

Loading