Can you provide inference time data for Stable Diffusion (w4a4 vs full precision fp32) on GPU/CPU? #8

badhri-intel · 2024-04-23T18:41:58Z

In the paper, it says using w4a4 quantization can theoretically produce 8x inference speedup. Could you please confirm this for SD or what sort of speedup (inference latency) you observed? Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can you provide inference time data for Stable Diffusion (w4a4 vs full precision fp32) on GPU/CPU? #8

Can you provide inference time data for Stable Diffusion (w4a4 vs full precision fp32) on GPU/CPU? #8

badhri-intel commented Apr 23, 2024

Can you provide inference time data for Stable Diffusion (w4a4 vs full precision fp32) on GPU/CPU? #8

Can you provide inference time data for Stable Diffusion (w4a4 vs full precision fp32) on GPU/CPU? #8

Comments

badhri-intel commented Apr 23, 2024