Weaponize the Scout
After this drill, you have used the strongest available frontier model to audit your Harness Spec and map it to at least three different backends — cloud, local, and agentic coding environments.
Why this matters
This drill teaches the most important meta-skill in the course: using the frontier to stay current with the frontier. The specific models named in this drill will be different in six months. Grok 5 will have shipped. The next Claude generation will be out. Some local model you've never heard of will be eating benchmarks for breakfast. That's fine. You won't need to retake a course. You'll run this drill pattern. Feed your Harness Spec to whatever the strongest current model is. Ask it to audit your setup and tell you the best current implementations. Apply judgment. Update your spec and routing table. Done in 30 minutes. This is not a one-time exercise. It's a quarterly habit. The scout is the mechanism that keeps your harness permanently current. **April 2026 landscape:** - **Claude Sonnet/Opus 4.6** — deepest reasoning, best for synthesis and complex architecture decisions - **Gemini 3.1 Pro** — excellent reasoning + math + multimodal, strong long-context, best value tier - **Grok 4.20** — real-time web access, fast agentic, strong for current-events and research tasks with live data - **GPT-5.4 (Pro/Thinking variants)** — all-rounder breadth, best for unfamiliar domains - **Local: Qwen3.5-72B+** — flagship open-weight, runs on Apple Silicon M3 Max/Ultra or consumer GPUs, comparable to mid-tier cloud for most tasks - **Local: GLM-5** — strong multilingual, excellent IS + EN + research - **Local: DeepSeek R2 variants** — code-specialized, exceptional efficiency - **Local stack:** Ollama (simplest setup, Mac/Linux), vLLM (production GPU), LM Studio (desktop GUI), OpenWebUI (browser front-end) - **Claude Code** — terminal-native agentic sessions (what you've been using) - **Cursor** — best full-IDE experience with Claude 4.6 backend - **Continue.dev** — VS Code + local model integration - **Windsurf** — team-grade agentic IDE
How to do it
- 1
Choose your scout: the strongest available frontier model today
April 2026: Claude Opus 4.6 or Gemini 3.1 Pro are both excellent scouts. Use whichever you have access to. The scout's job is to have broader awareness of the current landscape than you do.
- 2
Run the Scout Audit prompt with your Harness Spec
Feed the scout your HARNESS_SPEC.md from Drill 6.1. The prompt below asks it to map your spec to the current best implementations across cloud, local, and agentic environments.
- 3
Apply judgment to the scout's recommendations
The scout proposes. You decide. Not every recommendation is right for your context. If you handle sensitive client data, the scout saying 'use Grok for research' needs to be filtered through 'but Grok is a cloud API with data retention policies.' Local model for that task instead.
- 4
Set up at least one local model
Install Ollama (ollama.com, 5 minutes). Pull Qwen3.5-72B or a smaller variant that fits your hardware. Test your Harness Spec against it. This is your privacy-first backend.
- 5
Update your routing table in HARNESS_SPEC.md
Add a "Current Backend Map" section: task type → recommended tool → rationale → date last reviewed. This is your living routing table. Update it quarterly (or when a major model drops).
The prompt
I have a personal AI harness specification (HARNESS_SPEC.md) that defines how I work. I need you to act as my intelligence officer and audit this spec against the current AI landscape. Here is my spec: [PASTE HARNESS_SPEC.md] Please provide: 1. **Cloud model recommendations** — For each task type in my routing logic, which current frontier model (Claude 4.6 family, Gemini 3.1 Pro, Grok 4.20, GPT-5.4 variants) is best suited and why? Be specific about model strengths vs my specific task types. 2. **Local model options** — Which tasks in my spec would be better served by local models for privacy, cost, or offline reasons? What is the current best open-weight option (Qwen3.5-72B, GLM-5, DeepSeek variants) and what hardware do I need to run it? 3. **Agentic environment fit** — Given my working patterns, should I be using Claude Code, Cursor, Continue.dev, Windsurf, or a combination? What does each give me that the others don't? 4. **Gaps in my spec** — Are there task types common to my domain that my spec doesn't cover? What routing logic am I missing? 5. **Top 3 actions** — The three changes to my setup that would have the highest impact this week. Be specific. Use actual model names, actual tool names. No generic advice.
I want to set up a local AI model for privacy-sensitive work. My hardware: [DESCRIBE YOUR MACHINE — e.g. "MacBook Pro M3 Max 96GB" or "Windows PC, RTX 4090 24GB"]. Based on my Harness Spec: [PASTE HARNESS_SPEC.md relevant sections] Please recommend: 1. The best local model for my hardware and use case 2. The exact Ollama commands to install and run it 3. How to test that my Harness Spec works against it 4. What I gain vs lose compared to cloud models for my specific tasks
Success criteria
- ✓You ran the Scout Audit prompt and received specific backend recommendations
- ✓At least one local model is installed and running (via Ollama or equivalent)
- ✓HARNESS_SPEC.md has a "Current Backend Map" section with task → tool → rationale
- ✓You applied your own judgment to at least one scout recommendation (accepted, rejected, or modified)
- ✓You know the quarterly cadence: when a major model drops, you run this drill again
Common mistakes
Accepting all scout recommendations without judgment
→ The scout knows the landscape. You know your constraints. Privacy requirements, client agreements, cost limits, offline needs — these filter which recommendations are actually right for you. The scout proposes. You decide.
Skipping local model setup because "it seems complicated"
→ Ollama installs in 5 minutes. Pulling Qwen3.5:7b (fits on any machine with 8GB RAM) takes another 5 minutes. You do not need a GPU. The smallest model is a proof of concept for the capability — upgrade to 72B when your hardware allows.
Treating the routing table as permanent after this drill
→ The routing table has a 'date last reviewed' field for a reason. The model landscape changes every 3–6 months. Set a calendar reminder: every major new release → run Drill 6.2 again → update the table. 30 minutes. No new course needed.