What You'll Get
- 27B parameters - Serious AI power
- 262K context - Upload entire books
- 16GB RAM - Fits on Mac mini 32GB
- 100% local - No cloud, no tracking
- Free forever - No API costs
Prerequisites
- Mac with 32GB RAM (or 16GB minimum)
- 20GB free disk space
- llama.cpp installed (see previous guide)
- Stable internet (for initial download)
Step 1: Download the Model
The model is hosted on Hugging Face. We'll use the Q4_K_M quantization (best balance of quality and size).
# Create models directory
mkdir -p ~/.openclaw/workspace/models
cd ~/.openclaw/workspace/models
# Download using curl (16.8GB)
curl -L "https://huggingface.co/unsloth/Qwen3.6-27B-GGUF/resolve/main/Qwen3.6-27B-Q4_K_M.gguf" \\
-o Qwen3.6-27B-Q4_K_M.gguf
⏱️ Download time: ~30-60 minutes on typical home internet. This is a one-time download!
Step 2: Verify Download
ls -lh Qwen3.6-27B-Q4_K_M.gguf
# Should show ~16G file size
Step 3: Run the Model
llama-server \\
-m ~/.openclaw/workspace/models/Qwen3.6-27B-Q4_K_M.gguf \\
-c 262144 \\
--port 8080 \\
-ngl 99
Flags explained:
- -m: Path to model file
- -c 262144: 262K context window
- --port 8080: Web UI on port 8080
- -ngl 99: Offload all layers to GPU (Metal)
Step 4: Access the Web UI
Open http://localhost:8080 in your browser. You'll see a chat interface where you can:
- Chat with the model
- Upload documents (PDF, TXT)
- Adjust temperature, max tokens
- View generation stats
Step 5: Test the Model
Try these prompts to see what it can do:
Writing:
"Write a 300-word intro for my AI coaching business. Tone: warm, confident, not salesy."
Analysis:
"Summarize this article in 5 bullet points. Focus on actionable insights."
Coding:
"Write a Python function that scrapes a website and extracts all links."
Performance Benchmarks
| Benchmark |
Qwen3.6-27B |
Claude 4.5 Opus |
| Math (AIME) |
94.1 |
95.1 |
| Vision (MMMU) |
82.9 |
80.7 |
| Coding (SWE-Bench) |
77.2 |
~80 |
Alternative Quantizations
- Q3_K_M (12GB): If you have less RAM, slightly lower quality
- Q5_K_M (20GB): Better quality, needs more RAM
- Q6_K (22GB): Even better, diminishing returns
- Q8_0 (28GB): Best quality, not worth the extra size for most
Troubleshooting
Model loads slowly?
First load takes time to initialize. Subsequent loads are faster.
Out of memory errors?
Close other apps, or use Q3_K_M quantization (12GB)
Generation is slow?
Reduce context size (-c 131072) or use a smaller model
Resources