Run AI Models Locally on Your Mac

llama.cpp is the easiest way to run powerful AI models locally on your Mac. No cloud, no API costs, 100% private.

Step 1: Install llama.cpp

brew install llama.cpp

This installs the llama.cpp server which can run any GGUF format model.

Step 2: Download a Model

We recommend Qwen3.6-27B-Q4_K_M (16GB) - Claude-level performance, runs on Mac mini with 32GB RAM.

# Create models folder
mkdir -p ~/.openclaw/workspace/models
cd ~/.openclaw/workspace/models

# Download Qwen3.6-27B (16GB)
# Use Hugging Face CLI or download from:
# https://huggingface.co/unsloth/Qwen3.6-27B-GGUF

Step 3: Run the Model

llama-server -hf unsloth/Qwen3.6-27B-GGUF:Q4_K_M \\
  -c 262144 \\
  --port 8080

This starts a local AI server at http://localhost:8080 with:

  • -c 262144: 262K context window (huge!)
  • --port 8080: Web interface on port 8080
  • -hf: Auto-download from Hugging Face

Step 4: Use the Web Interface

Open your browser to http://localhost:8080 and start chatting with your local AI!

Step 5: Connect OpenClaw (Optional)

To use this model with OpenClaw, add it to your config:

# In your OpenClaw config, add:
models:
  - id: ollama/qwen3.6-27b-local
    provider: ollama
    model: qwen3.6-27b
    baseUrl: http://localhost:8080/v1

Performance Tips

  • Q4_K_M (4-bit): Best balance - 16GB, fast
  • Q6_K (6-bit): Better quality - 22GB, slower
  • Q8_0 (8-bit): Best quality - 28GB, slowest
  • Close other apps to free up RAM
  • First run downloads the model (one-time)

Troubleshooting

Model too slow?
Try a smaller model like Qwen2.5-7B or Gemma-7B
Out of memory? Use a lower quant (Q3_K_M = 12GB) or close other apps
Port already in use? Change --port 8080 to --port 8081

Resources