Sakana AI released Fugu, a compact bilingual model optimized for Japanese and English tasks. The project first gained traction on Hacker News with 142 points and 83 comments.
Model: Sakana Fugu | Parameters: 7B | Speed: 38 tokens/s | License: Apache 2.0
What It Is and How It Works
Fugu combines a 7B transformer backbone with Sakana’s evolutionary model merging technique. The model was trained on a 120B token mix of Japanese web text and English technical corpora. It supports both text generation and lightweight instruction following without separate fine-tunes.
The architecture uses grouped-query attention and a 32k context window. No external retrieval is required for standard prompts.
Benchmarks and Performance Numbers
Early testers report 38 tokens per second on an RTX 4090 at 4-bit quantization. Memory footprint sits at 4.1 GB. On Japanese-to-English translation, Fugu scores 41.2 BLEU on the JESC test set.
| Feature | Sakana Fugu | Llama-3-8B | Qwen2-7B |
|---|---|---|---|
| Tokens/s (4090) | 38 | 31 | 34 |
| Japanese BLEU | 41.2 | 28.7 | 37.9 |
| VRAM (4-bit) | 4.1 GB | 5.2 GB | 4.8 GB |
| License | Apache 2.0 | Llama 3 | Apache 2.0 |
How to Try It
Download the weights from the official repository and run with llama.cpp or vLLM.
git clone https://github.com/sakana-ai/fugu
cd fugu && pip install -r requirements.txt
python -m fugu.chat --model fugu-7b-q4
An Ollama tag is also available: ollama run sakana/fugu.
Pros and Cons
- Strong Japanese performance at small size
- Apache 2.0 license allows commercial use
- Runs on consumer GPUs with low VRAM
- Limited English reasoning compared with larger models
- No built-in tool-calling or agent scaffolding yet
Alternatives and Comparisons
Llama-3-8B and Qwen2-7B remain the main local alternatives. Fugu leads on Japanese benchmarks while trailing slightly on English MMLU. Developers needing bilingual output without 20+ GB VRAM now have a clear third option.
Who Should Use This
Researchers and developers building Japanese-facing chatbots or translation tools will benefit most. Teams focused solely on English reasoning or multi-agent workflows should continue with larger general models.
Bottom Line / Verdict
Fugu gives practitioners a practical, Apache-licensed model that closes the Japanese performance gap at 7B scale.
Sakana’s merging approach suggests further small, high-quality bilingual models will follow within months.
Top comments (0)