Source linked

Qwen 3.6 27B Hits 32 tok/s on a Laptop, Beats Mid-2025 Frontier Models

On a Macbook Max M5, the 8-bit quantized 27B model delivers 32 tokens/s with MTP, scoring 37 on Artificial Analysis-matching GPT-5 and Claude Sonnet 4.5.

qwenalibabalocal modelsllama cppartificial analysislarge language models

30 tokens per second on a laptop. That's Qwen 3.6 27B running 8-bit quantized with multi-token prediction on a Macbook Max M5—solidly within typical frontier model API speeds. Piotr Migdał of Quesma calls it the first local model that actually works as general intelligence, and the numbers back him up.

Why 27B Beats the MoE Variant for Quality

Qwen 3.6 ships in two flavors: a 35B mixture-of-experts (A3B) and a dense 27B. The MoE variant cranks 93 tok/s but skimps on reasoning depth. Migdał found it ignored instructions—he asked for a Node package and got a single index.html. The 27B, slower at 32 tok/s with MTP, produces outputs that stand up to real work. He prompted it to build a hexagonal minesweeper with pnpm and it worked on the first go, proper package and all.

Benchmarks: Matching GPT-5 on a Laptop

Artificial Analysis scores Qwen 3.6 27B at 37, placing it alongside mid-2025 GPT-5 and Claude Sonnet 4.5. The 35B MoE scores 32 (early 2025 o3/Claude 4 Sonnet territory). That's a massive jump for a model that fits in 28 GB RAM at 8-bit. For reference, DeepSeek V4 Flash at Q2-Q4 needs 103 GB and only pushes 33 tok/s. Qwen 3.6 27B hits near parity while running on consumer hardware—one Hacker News user reports 50 tok/s on an RTX 5090 at Q6_K with 123k context.

Running It Locally: llama.cpp and Quantization

No need for Ollama. Grab the GGUF from unsloth on Hugging Face (8-bit quantized, MTP support) and fire up llama-server with -ngl 999 -fa on -c 65536 --jinja. Migdał recommends this over Ollama on ethical grounds. The MLX variant for Apple Silicon actually underperforms llama.cpp here—llama.cpp uses 95% GPU efficiently. For vibe coding with OpenCode, point it at http://127.0.0.1:8080/v1 and you're set.

Qwen 3.6 27B makes cloud API subscriptions optional for many development workflows. For the first time, a local model competes with the best cloud offerings on quality while delivering usable latency.


Source: Qwen 3.6 27B is the sweet spot for local development
Domain: quesma.com

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.