Claude-Level AI, Running Locally

🎯 Why Qwen3.6-27B? This model beats Claude 4.5 Opus on vision tasks (82.9 vs 80.7) and comes close on math (94.1 vs 95.1) - all while running locally on your Mac!

What You'll Get

  • 27B parameters - Serious AI power
  • 262K context - Upload entire books
  • 16GB RAM - Fits on Mac mini 32GB
  • 100% local - No cloud, no tracking
  • Free forever - No API costs

Prerequisites

  • Mac with 32GB RAM (or 16GB minimum)
  • 20GB free disk space
  • llama.cpp installed (see previous guide)
  • Stable internet (for initial download)

Step 1: Download the Model

The model is hosted on Hugging Face. We'll use the Q4_K_M quantization (best balance of quality and size).

# Create models directory
mkdir -p ~/.openclaw/workspace/models
cd ~/.openclaw/workspace/models

# Download using curl (16.8GB)
curl -L "https://huggingface.co/unsloth/Qwen3.6-27B-GGUF/resolve/main/Qwen3.6-27B-Q4_K_M.gguf" \\
  -o Qwen3.6-27B-Q4_K_M.gguf
⏱️ Download time: ~30-60 minutes on typical home internet. This is a one-time download!

Step 2: Verify Download

ls -lh Qwen3.6-27B-Q4_K_M.gguf
# Should show ~16G file size

Step 3: Run the Model

llama-server \\
  -m ~/.openclaw/workspace/models/Qwen3.6-27B-Q4_K_M.gguf \\
  -c 262144 \\
  --port 8080 \\
  -ngl 99

Flags explained:

  • -m: Path to model file
  • -c 262144: 262K context window
  • --port 8080: Web UI on port 8080
  • -ngl 99: Offload all layers to GPU (Metal)

Step 4: Access the Web UI

Open http://localhost:8080 in your browser. You'll see a chat interface where you can:

  • Chat with the model
  • Upload documents (PDF, TXT)
  • Adjust temperature, max tokens
  • View generation stats

Step 5: Test the Model

Try these prompts to see what it can do:

Writing:
"Write a 300-word intro for my AI coaching business. Tone: warm, confident, not salesy."
Analysis:
"Summarize this article in 5 bullet points. Focus on actionable insights."
Coding:
"Write a Python function that scrapes a website and extracts all links."

Performance Benchmarks

Benchmark Qwen3.6-27B Claude 4.5 Opus
Math (AIME) 94.1 95.1
Vision (MMMU) 82.9 80.7
Coding (SWE-Bench) 77.2 ~80

Alternative Quantizations

  • Q3_K_M (12GB): If you have less RAM, slightly lower quality
  • Q5_K_M (20GB): Better quality, needs more RAM
  • Q6_K (22GB): Even better, diminishing returns
  • Q8_0 (28GB): Best quality, not worth the extra size for most

Troubleshooting

Model loads slowly?
First load takes time to initialize. Subsequent loads are faster.
Out of memory errors?
Close other apps, or use Q3_K_M quantization (12GB)
Generation is slow?
Reduce context size (-c 131072) or use a smaller model

Resources