PromptZone - Leading AI Community for Prompt Engineering and AI Enthusiasts

Cover image for Sakana Fugu: New Model from Sakana AI
Zuri O'Brien
Zuri O'Brien

Posted on

Sakana Fugu: New Model from Sakana AI

Sakana AI released Fugu, a compact bilingual model optimized for Japanese and English tasks. The project first gained traction on Hacker News with 142 points and 83 comments.

Model: Sakana Fugu | Parameters: 7B | Speed: 38 tokens/s | License: Apache 2.0

What It Is and How It Works

Fugu combines a 7B transformer backbone with Sakana’s evolutionary model merging technique. The model was trained on a 120B token mix of Japanese web text and English technical corpora. It supports both text generation and lightweight instruction following without separate fine-tunes.

The architecture uses grouped-query attention and a 32k context window. No external retrieval is required for standard prompts.

Sakana Fugu: New Model from Sakana AI

Benchmarks and Performance Numbers

Early testers report 38 tokens per second on an RTX 4090 at 4-bit quantization. Memory footprint sits at 4.1 GB. On Japanese-to-English translation, Fugu scores 41.2 BLEU on the JESC test set.

Feature Sakana Fugu Llama-3-8B Qwen2-7B
Tokens/s (4090) 38 31 34
Japanese BLEU 41.2 28.7 37.9
VRAM (4-bit) 4.1 GB 5.2 GB 4.8 GB
License Apache 2.0 Llama 3 Apache 2.0

How to Try It

Download the weights from the official repository and run with llama.cpp or vLLM.

git clone https://github.com/sakana-ai/fugu
cd fugu && pip install -r requirements.txt
python -m fugu.chat --model fugu-7b-q4
Enter fullscreen mode Exit fullscreen mode

An Ollama tag is also available: ollama run sakana/fugu.

Pros and Cons

  • Strong Japanese performance at small size
  • Apache 2.0 license allows commercial use
  • Runs on consumer GPUs with low VRAM
  • Limited English reasoning compared with larger models
  • No built-in tool-calling or agent scaffolding yet

Alternatives and Comparisons

Llama-3-8B and Qwen2-7B remain the main local alternatives. Fugu leads on Japanese benchmarks while trailing slightly on English MMLU. Developers needing bilingual output without 20+ GB VRAM now have a clear third option.

Who Should Use This

Researchers and developers building Japanese-facing chatbots or translation tools will benefit most. Teams focused solely on English reasoning or multi-agent workflows should continue with larger general models.

Bottom Line / Verdict

Fugu gives practitioners a practical, Apache-licensed model that closes the Japanese performance gap at 7B scale.

Sakana’s merging approach suggests further small, high-quality bilingual models will follow within months.

Top comments (0)