Tara Suzuki

Posted on Jul 1

How to Run Flux on 8GB VRAM in 2026: The GGUF Low-VRAM Guide

#ai #imagegen #flux #comfyui

Short answer (2026): Yes, you can run Flux on an 8GB GPU. The trick is GGUF quantization in ComfyUI — it shrinks Flux.1-dev from ~23GB down to 5–7GB. Use Q4_K_S (~6.8GB) for the best speed/quality balance on 8GB, grab the quantized GGUF T5 text encoder (not the fp16 one), and launch with --lowvram. For the smallest, fastest option, Flux.2 [klein] 4B runs in ~2.6GB.

Best quant for 8GB: Q4_K_S (sweet spot) or Q5_K_S (higher quality)
Must-do: use the quantized T5, not fp16 (the fp16 T5 alone is ~9GB)
Launch flag: --lowvram
Lightest option: Flux.2 [klein] 4B (~2.6GB at Q4_K_M)

Why GGUF is the answer

Full-precision Flux.1-dev needs ~23GB of VRAM — far beyond an 8GB card. GGUF quantization compresses the model's weights with minimal quality loss, bringing it down to a size that fits. As of early 2026, GGUF support in the image world is primarily a ComfyUI feature via city96's ComfyUI-GGUF extension.

Pick the right quant level

The number after Q trades size for quality — higher = better images but more VRAM:

Quant	Approx size	Notes
Q4_K_S	~6.8 GB	Sweet spot for 8GB — leaves headroom for computation
Q5_K_S	~7+ GB	~95% of original quality; tighter fit
Q8	largest	Highest quality, usually too big for 8GB
Flux.2 [klein] 4B (Q4_K_M)	~2.6 GB	Smallest/fastest, license-friendly, ~4 steps

Q4 already produces genuinely usable images; the jump to Q5 is a small quality gain for a tighter fit.

Step-by-step (ComfyUI)

Install ComfyUI-GGUF. In ComfyUI, open the Custom Nodes Manager, search "GGUF," install ComfyUI-GGUF, and restart. (Or git clone https://github.com/city96/ComfyUI-GGUF into ComfyUI/custom_nodes.)
Download the GGUF model (e.g., Flux.1-dev Q4_K_S) into ComfyUI/models/unet/ (or diffusion_models/).
Download the quantized T5 encoder. This is the step everyone gets wrong — grab the GGUF T5, not the fp16 one. The fp16 T5 alone is ~9GB and won't fit alongside the model on 8GB. Put it in models/text_encoders/.
Add the VAE (ae.safetensors) to models/vae/ and the CLIP-L encoder to models/text_encoders/.
Load a low-VRAM GGUF workflow. ComfyUI Manager's workflow browser ships pre-built Flux GGUF workflows — use one instead of wiring from scratch. It uses the Unet Loader (GGUF) node in place of the standard diffusion loader.
Launch with --lowvram. This enables partial/sequential loading so the whole model never has to sit in VRAM at once — the workhorse flag for 6–8GB cards.
Queue a test prompt. If an image appears in the preview node, you're running Flux on 8GB.

Common mistakes

Using the fp16 T5. The single most common out-of-memory cause on 8GB. Always use the GGUF T5.
fp8 confusion. weight_dtype = fp8_e4m3fn lives in the Load Diffusion Model node and applies to fp8 .safetensors, not GGUF files. The --fp8_e4m3fn-unet command-line flag is often ignored by Flux's loader — set fp8 in the node, not the CLI.
Over-reaching on quant. If you OOM at Q5, drop to Q4_K_S before touching anything else.

New to ComfyUI? First decide if it's even the right tool in our Fooocus vs ComfyUI guide, then follow the full install Flux in ComfyUI walkthrough.

Frequently asked questions

Can you really run Flux on 8GB VRAM?

Yes. With GGUF quantization (Q4_K_S is the 8GB sweet spot) plus the quantized T5 encoder and --lowvram, Flux.1-dev runs on an 8GB card. Flux.2 [klein] 4B runs in as little as ~2.6GB.

What's the best Flux GGUF quant for 8GB?

Q4_K_S (~6.8GB) balances quality and headroom best. Q5_K_S keeps ~95% of original quality but fits more tightly.

Why do I keep running out of memory?

The usual culprit is loading the fp16 T5 text encoder (~9GB) instead of the quantized GGUF T5. Swap it and most 8GB OOM errors disappear.

Is GGUF quality noticeably worse?

At Q5 the loss is minimal for most uses, and even Q4 produces usable images. The size savings far outweigh the small quality dip on consumer GPUs.

Conclusion

An 8GB GPU is no longer a barrier to Flux. Install city96's GGUF nodes, pick Q4_K_S, use the quantized T5, and launch with --lowvram — that's the whole game. Running Flux on a low-VRAM card? Share your GPU and steps/sec in the comments.

PromptZone - Leading AI Community for Prompt Engineering and AI Enthusiasts

How to Run Flux on 8GB VRAM in 2026: The GGUF Low-VRAM Guide

Why GGUF is the answer

Pick the right quant level

Step-by-step (ComfyUI)

Common mistakes

Frequently asked questions

Can you really run Flux on 8GB VRAM?

What's the best Flux GGUF quant for 8GB?

Why do I keep running out of memory?

Is GGUF quality noticeably worse?

Conclusion

Sources

Top comments (0)

Read next

U.S. Military Data Exposed in a16z Startup

Local LLMs 2026: Run Llama, Mistral, Qwen on Your Hardware (Complete Guide)

Tracking GitHub Incidents with Days Counter

xAI's 11% GPU Utilization Explained