FLUX.1 Review: The Open-Source Image Model That Can Actually Spell
FLUX.1 is the best open-source image generation model available right now. Built by Black Forest Labs — the team behind Stable Diffusion — it generates photorealistic images with accurate text rendering, something that eluded image models for years. If you’re a developer building image generation into a product, FLUX.1 is your first stop.
Why FLUX.1 Matters
Black Forest Labs launched FLUX.1 in August 2024 and immediately reset expectations for what open-source image models could do. The founding team includes Robin Rombach and other key architects of Stable Diffusion, and they took everything they learned and built something dramatically better.
The model comes in three variants:
- FLUX.1 [pro] — Commercial API, highest quality, best prompt adherence
- FLUX.1 [dev] — Open-source (non-commercial license), 12B parameters, great for research
- FLUX.1 [schnell] — Open-source (Apache 2.0), optimized for speed, 1-4 inference steps
ELI5: Diffusion Models — Imagine starting with a TV screen full of static noise, then slowly cleaning it up until a clear picture appears. That’s diffusion — the AI learns to turn random noise into images, one small step at a time. More steps usually means more detail.
Image Quality
In our testing, FLUX.1 [pro] produces images that are frequently indistinguishable from professional photography. Skin textures look natural, lighting is physically plausible, and compositions feel intentional rather than AI-generated.
The standout feature is text rendering. Ask FLUX.1 to generate a coffee shop storefront with a sign reading “The Daily Grind” and it will actually spell it correctly. Ask for a protest sign, a book cover, a product label — it gets the text right the vast majority of the time. This was essentially impossible with Stable Diffusion XL just a year earlier.
Resolution and output: Default output is 1024x1024, with support for common aspect ratios (16:9, 4:3, 9:16 for mobile). Upscaling pipelines on platforms like fal.ai can push output to 2048x2048 and beyond.
ELI5: CFG Scale (Classifier-Free Guidance) — Think of CFG scale as a “how literally should I follow your instructions?” dial. Low values give the AI more creative freedom. High values make it stick closely to your prompt — but crank it too high and the image starts looking oversaturated and weird.
Where to Use FLUX.1
FLUX.1 is available across every major inference platform, which is a key advantage over Midjourney (Discord-only, no API):
| Platform | Model | Price per Image | Notes |
|---|---|---|---|
| fal.ai | Pro, Dev, Schnell | $0.03-0.05 | Fastest inference, great API |
| Replicate | Dev, Schnell | ~$0.03 | Simple API, pay-per-prediction |
| Together AI | Schnell | ~$0.02 | Cheapest for bulk generation |
| BFL API | Pro | ~$0.05 | Direct from Black Forest Labs |
| Local (ComfyUI) | Dev, Schnell | Free (your GPU) | Full control, no API limits |
In our testing, fal.ai consistently delivered the fastest generation times — under 3 seconds for Schnell, about 10 seconds for Dev at default settings.
FLUX.1 vs the Competition
FLUX.1 vs Midjourney v6.1: Midjourney produces more stylistically distinctive images with stronger artistic direction. FLUX.1 wins on photorealism, text rendering, API availability, and price. If you’re building a product, FLUX.1 is the obvious choice. If you’re creating concept art by hand, Midjourney still has an edge.
FLUX.1 vs DALL-E 3: FLUX.1 produces higher-fidelity images with fewer artifacts. DALL-E 3 is more convenient if you’re already in ChatGPT and has stronger safety guardrails. FLUX.1’s API pricing is significantly cheaper at scale.
FLUX.1 vs Stable Diffusion 3: FLUX.1 is what SD3 should have been. Same DNA (literally the same founding researchers), but FLUX.1 has better prompt adherence, better text rendering, and better photorealism out of the box. SD3 Medium was a disappointment; FLUX.1 was the response.
ELI5: LoRA (Low-Rank Adaptation) — A LoRA is like a small add-on brain for an AI model. Instead of retraining the whole model (expensive), you train a small adapter that teaches it one new thing — like drawing in a specific art style or generating a specific person’s face. LoRAs are tiny files you can swap in and out.
Fine-Tuning and Customization
The open-source variants of FLUX.1 support LoRA training, which means you can customize the model for your specific use case. Common applications:
- Brand consistency: Train on your brand’s visual style for consistent marketing assets
- Product photography: Train on your product for instant mockups and lifestyle shots
- Character consistency: Maintain a consistent character across multiple generations
- Style transfer: Lock in a specific artistic style
Training a FLUX.1 LoRA typically takes 15-30 images and about 30 minutes on a single A100 GPU. Platforms like Replicate and fal.ai offer one-click fine-tuning APIs.
Limitations
FLUX.1 is not perfect. The open-source [dev] model has a non-commercial license, which means you need [pro] for production use. Image generation can produce anatomical oddities — extra fingers still happen, though far less frequently than with older models. And while text rendering is much improved, very long passages of text or small font sizes can still break down.
The model also requires significant GPU resources to run locally. If you don’t have a 12GB+ VRAM GPU, you’re relying on API providers.
ELI5: Checkpoints — A checkpoint is a saved snapshot of a model’s brain at a specific point during training. Think of it like a save file in a video game. The community shares custom checkpoints that have been trained further for specific styles or capabilities.
Who Should Use FLUX.1
Developers building products: FLUX.1’s API ecosystem is unmatched among open-source models. If you’re adding image generation to an app, start here.
Designers and marketers: The text rendering capability alone makes FLUX.1 the best model for marketing assets, social media graphics, and mockups.
Hobbyists and tinkerers: The open-source variants and ComfyUI integration make FLUX.1 the most hackable high-quality image model available.
Not ideal for: Purely artistic/creative exploration (Midjourney’s aesthetic sense is stronger) or users who want a simple chat-based interface (DALL-E 3 via ChatGPT is easier).
Frequently Asked Questions
What is the difference between FLUX.1 Pro, Dev, and Schnell? ▼
FLUX.1 [pro] is the commercial API model with the highest quality. FLUX.1 [dev] is open-source under a non-commercial license — great for research and experimentation. FLUX.1 [schnell] is also open-source and optimized for speed, generating images in 1-4 steps instead of 20-50. For production apps, use Pro. For tinkering, use Dev. For speed, use Schnell.
How much does FLUX.1 cost to use? ▼
FLUX.1 [dev] and [schnell] are free to run locally if you have the hardware (a GPU with at least 12GB VRAM). For API access, pricing varies by platform: fal.ai charges roughly $0.03-0.05 per image, Replicate charges about $0.03 per image, and Together AI offers competitive per-step pricing. FLUX.1 [pro] via the BFL API starts around $0.05 per image.
Can FLUX.1 render text in images accurately? ▼
Yes — FLUX.1 is one of the best image models for text rendering. It can reliably spell words on signs, t-shirts, book covers, and logos. It's not perfect with very long text or unusual fonts, but it's a massive leap over Stable Diffusion and earlier models that mangled text almost every time.
What hardware do I need to run FLUX.1 locally? ▼
FLUX.1 [dev] requires a GPU with at least 12GB VRAM (RTX 4070 or better). FLUX.1 [schnell] can run on 8GB VRAM GPUs with optimizations. For the full [dev] model without quantization, 24GB VRAM (RTX 4090) is recommended. On Apple Silicon Macs, the M2 Pro/Max and later can run it with decent performance.