Run Claude Code Offline on M3 Pro with Qwen3.6

#ai #llm #tutorial #machinelearning

A Hacker News thread flagged a practical guide for running Claude Code workflows completely offline on an M3 Pro Mac using Qwen3.6. The setup targets air-gapped environments where no external API calls are allowed.

Model: Qwen3.6 | Hardware: M3 Pro (36 GB unified) | Context: 128k tokens
Environment: Air-gapped macOS | License: Apache 2.0

What It Is and How It Works

The handbook details an air-gapped deployment that replaces Claude Code with Qwen3.6 running locally via MLX on Apple Silicon. The system loads quantized weights into unified memory and exposes a local endpoint that accepts the same prompt formats used by Claude Code.

No network access occurs after initial model download. The M3 Pro's 36 GB unified memory holds the 32B-parameter model at 4-bit quantization while leaving headroom for the IDE and terminal processes.

Hardware Benchmarks on M3 Pro

The guide reports concrete numbers from the M3 Pro configuration:

42 tokens per second at 4-bit quantization for 8k context
28 tokens per second at 8k context with 4k output
Peak memory usage: 21.4 GB during generation
Cold start time: 8 seconds from launch to first token

These figures come from direct tests on the 12-core M3 Pro with 36 GB RAM.

Model	Tokens/s (M3 Pro)	Quant	Peak RAM	Context
Qwen3.6	42	4-bit	21.4 GB	128k
Llama 3.1 70B	19	4-bit	38 GB	128k
Qwen2.5-Coder 32B	37	4-bit	19.8 GB	32k

How to Try It

The handbook provides exact steps for the air-gapped setup:

Download the MLX-compatible Qwen3.6 weights on a connected machine.
Transfer the model files via USB to the target M3 Pro.
Install MLX and the required Python environment using the provided requirements.txt.
Launch the local server with the command python server.py --model qwen3.6-4bit --port 8080.
Configure the Claude Code client to point at http://localhost:8080.

Full commands and environment files are listed in the source handbook.

Pros and Cons

Pros: Full offline operation, 42 tokens/s on consumer hardware, Apache 2.0 license, 128k context support.
Cons: Requires manual model transfer, no automatic updates, 21+ GB RAM needed for comfortable use, limited ecosystem compared with hosted Claude.

Alternatives and Comparisons

Developers can also use Llama 3.1 70B or Qwen2.5-Coder 32B in the same air-gapped setup. Qwen3.6 offers the best speed-to-memory ratio on M3 Pro silicon among the three.

Who Should Use This

Teams working in classified or regulated environments that prohibit external API calls will find this setup useful. Individual developers who already own an M3 Pro with 36 GB RAM can adopt it for consistent offline coding assistance. Skip this route if you need the absolute highest coding benchmark scores or frequent model updates.

Bottom line: Qwen3.6 on M3 Pro delivers usable Claude Code replacement speeds in fully air-gapped conditions without requiring enterprise hardware.

The approach shows that current 32B-class models already meet practical thresholds for local software engineering work on Apple Silicon.

PromptZone - Leading AI Community for Prompt Engineering and AI Enthusiasts

Run Claude Code Offline on M3 Pro with Qwen3.6

What It Is and How It Works

Hardware Benchmarks on M3 Pro

How to Try It

Pros and Cons

Alternatives and Comparisons

Who Should Use This

Top comments (0)

Read next

How I Created an AI Product Video with Gemini Omni in One Afternoon

Image Animator AI: Turn Still Images into Short AI Videos Online

Corporate Video Production That Builds Real Brand Authority

Why a Boutique Education Consulting Firm Is the Future of Global Student Success in 2026