llama.cpp is the easiest way to run powerful AI models locally on your Mac. No cloud, no API costs, 100% private.
Step 1: Install llama.cpp
brew install llama.cpp
This installs the llama.cpp server which can run any GGUF format model.
Step 2: Download a Model
We recommend Qwen3.6-27B-Q4_K_M (16GB) - Claude-level performance, runs on Mac mini with 32GB RAM.
# Create models folder
mkdir -p ~/.openclaw/workspace/models
cd ~/.openclaw/workspace/models
# Download Qwen3.6-27B (16GB)
# Use Hugging Face CLI or download from:
# https://huggingface.co/unsloth/Qwen3.6-27B-GGUF
Step 3: Run the Model
llama-server -hf unsloth/Qwen3.6-27B-GGUF:Q4_K_M \\
-c 262144 \\
--port 8080
This starts a local AI server at http://localhost:8080 with:
- -c 262144: 262K context window (huge!)
- --port 8080: Web interface on port 8080
- -hf: Auto-download from Hugging Face
Step 4: Use the Web Interface
Open your browser to http://localhost:8080 and start chatting with your local AI!
Step 5: Connect OpenClaw (Optional)
To use this model with OpenClaw, add it to your config:
# In your OpenClaw config, add:
models:
- id: ollama/qwen3.6-27b-local
provider: ollama
model: qwen3.6-27b
baseUrl: http://localhost:8080/v1
Performance Tips
- Q4_K_M (4-bit): Best balance - 16GB, fast
- Q6_K (6-bit): Better quality - 22GB, slower
- Q8_0 (8-bit): Best quality - 28GB, slowest
- Close other apps to free up RAM
- First run downloads the model (one-time)
Troubleshooting
Model too slow?
Try a smaller model like Qwen2.5-7B or Gemma-7B
Out of memory?
Use a lower quant (Q3_K_M = 12GB) or close other apps
Port already in use?
Change --port 8080 to --port 8081
Resources