A Hacker News thread flagged a practical guide for running Claude Code workflows completely offline on an M3 Pro Mac using Qwen3.6. The setup targets air-gapped environments where no external API calls are allowed.
Model: Qwen3.6 | Hardware: M3 Pro (36 GB unified) | Context: 128k tokens
Environment: Air-gapped macOS | License: Apache 2.0
What It Is and How It Works
The handbook details an air-gapped deployment that replaces Claude Code with Qwen3.6 running locally via MLX on Apple Silicon. The system loads quantized weights into unified memory and exposes a local endpoint that accepts the same prompt formats used by Claude Code.
No network access occurs after initial model download. The M3 Pro's 36 GB unified memory holds the 32B-parameter model at 4-bit quantization while leaving headroom for the IDE and terminal processes.
Hardware Benchmarks on M3 Pro
The guide reports concrete numbers from the M3 Pro configuration:
- 42 tokens per second at 4-bit quantization for 8k context
- 28 tokens per second at 8k context with 4k output
- Peak memory usage: 21.4 GB during generation
- Cold start time: 8 seconds from launch to first token
These figures come from direct tests on the 12-core M3 Pro with 36 GB RAM.
| Model | Tokens/s (M3 Pro) | Quant | Peak RAM | Context |
|---|---|---|---|---|
| Qwen3.6 | 42 | 4-bit | 21.4 GB | 128k |
| Llama 3.1 70B | 19 | 4-bit | 38 GB | 128k |
| Qwen2.5-Coder 32B | 37 | 4-bit | 19.8 GB | 32k |
How to Try It
The handbook provides exact steps for the air-gapped setup:
- Download the MLX-compatible Qwen3.6 weights on a connected machine.
- Transfer the model files via USB to the target M3 Pro.
- Install MLX and the required Python environment using the provided requirements.txt.
- Launch the local server with the command
python server.py --model qwen3.6-4bit --port 8080. - Configure the Claude Code client to point at
http://localhost:8080.
Full commands and environment files are listed in the source handbook.
Pros and Cons
- Pros: Full offline operation, 42 tokens/s on consumer hardware, Apache 2.0 license, 128k context support.
- Cons: Requires manual model transfer, no automatic updates, 21+ GB RAM needed for comfortable use, limited ecosystem compared with hosted Claude.
Alternatives and Comparisons
Developers can also use Llama 3.1 70B or Qwen2.5-Coder 32B in the same air-gapped setup. Qwen3.6 offers the best speed-to-memory ratio on M3 Pro silicon among the three.
Who Should Use This
Teams working in classified or regulated environments that prohibit external API calls will find this setup useful. Individual developers who already own an M3 Pro with 36 GB RAM can adopt it for consistent offline coding assistance. Skip this route if you need the absolute highest coding benchmark scores or frequent model updates.
Bottom line: Qwen3.6 on M3 Pro delivers usable Claude Code replacement speeds in fully air-gapped conditions without requiring enterprise hardware.
The approach shows that current 32B-class models already meet practical thresholds for local software engineering work on Apple Silicon.

Top comments (0)