Short answer (2026): Yes, you can run Flux on an 8GB GPU. The trick is GGUF quantization in ComfyUI — it shrinks Flux.1-dev from ~23GB down to 5–7GB. Use Q4_K_S (~6.8GB) for the best speed/quality balance on 8GB, grab the quantized GGUF T5 text encoder (not the fp16 one), and launch with --lowvram. For the smallest, fastest option, Flux.2 [klein] 4B runs in ~2.6GB.
- Best quant for 8GB: Q4_K_S (sweet spot) or Q5_K_S (higher quality)
- Must-do: use the quantized T5, not fp16 (the fp16 T5 alone is ~9GB)
-
Launch flag:
--lowvram - Lightest option: Flux.2 [klein] 4B (~2.6GB at Q4_K_M)
Why GGUF is the answer
Full-precision Flux.1-dev needs ~23GB of VRAM — far beyond an 8GB card. GGUF quantization compresses the model's weights with minimal quality loss, bringing it down to a size that fits. As of early 2026, GGUF support in the image world is primarily a ComfyUI feature via city96's ComfyUI-GGUF extension.
Pick the right quant level
The number after Q trades size for quality — higher = better images but more VRAM:
| Quant | Approx size | Notes |
|---|---|---|
| Q4_K_S | ~6.8 GB | Sweet spot for 8GB — leaves headroom for computation |
| Q5_K_S | ~7+ GB | ~95% of original quality; tighter fit |
| Q8 | largest | Highest quality, usually too big for 8GB |
| Flux.2 [klein] 4B (Q4_K_M) | ~2.6 GB | Smallest/fastest, license-friendly, ~4 steps |
Q4 already produces genuinely usable images; the jump to Q5 is a small quality gain for a tighter fit.
Step-by-step (ComfyUI)
-
Install ComfyUI-GGUF. In ComfyUI, open the Custom Nodes Manager, search "GGUF," install ComfyUI-GGUF, and restart. (Or
git clone https://github.com/city96/ComfyUI-GGUFintoComfyUI/custom_nodes.) -
Download the GGUF model (e.g., Flux.1-dev Q4_K_S) into
ComfyUI/models/unet/(ordiffusion_models/). -
Download the quantized T5 encoder. This is the step everyone gets wrong — grab the GGUF T5, not the fp16 one. The fp16 T5 alone is ~9GB and won't fit alongside the model on 8GB. Put it in
models/text_encoders/. -
Add the VAE (
ae.safetensors) tomodels/vae/and the CLIP-L encoder tomodels/text_encoders/. - Load a low-VRAM GGUF workflow. ComfyUI Manager's workflow browser ships pre-built Flux GGUF workflows — use one instead of wiring from scratch. It uses the Unet Loader (GGUF) node in place of the standard diffusion loader.
-
Launch with
--lowvram. This enables partial/sequential loading so the whole model never has to sit in VRAM at once — the workhorse flag for 6–8GB cards. - Queue a test prompt. If an image appears in the preview node, you're running Flux on 8GB.
Common mistakes
- Using the fp16 T5. The single most common out-of-memory cause on 8GB. Always use the GGUF T5.
-
fp8 confusion.
weight_dtype = fp8_e4m3fnlives in the Load Diffusion Model node and applies to fp8.safetensors, not GGUF files. The--fp8_e4m3fn-unetcommand-line flag is often ignored by Flux's loader — set fp8 in the node, not the CLI. - Over-reaching on quant. If you OOM at Q5, drop to Q4_K_S before touching anything else.
New to ComfyUI? First decide if it's even the right tool in our Fooocus vs ComfyUI guide, then follow the full install Flux in ComfyUI walkthrough.
Frequently asked questions
Can you really run Flux on 8GB VRAM?
Yes. With GGUF quantization (Q4_K_S is the 8GB sweet spot) plus the quantized T5 encoder and --lowvram, Flux.1-dev runs on an 8GB card. Flux.2 [klein] 4B runs in as little as ~2.6GB.
What's the best Flux GGUF quant for 8GB?
Q4_K_S (~6.8GB) balances quality and headroom best. Q5_K_S keeps ~95% of original quality but fits more tightly.
Why do I keep running out of memory?
The usual culprit is loading the fp16 T5 text encoder (~9GB) instead of the quantized GGUF T5. Swap it and most 8GB OOM errors disappear.
Is GGUF quality noticeably worse?
At Q5 the loss is minimal for most uses, and even Q4 produces usable images. The size savings far outweigh the small quality dip on consumer GPUs.
Conclusion
An 8GB GPU is no longer a barrier to Flux. Install city96's GGUF nodes, pick Q4_K_S, use the quantized T5, and launch with --lowvram — that's the whole game. Running Flux on a low-VRAM card? Share your GPU and steps/sec in the comments.
Top comments (0)