Skip to main content
6.4

Live Stress Test

Estimated time: 1 week of real work (30 min admin overhead)Tool: 3+ AI backends of your choice + your normal work tools
After this drill, you can:

After this drill, you have one week of real professional work on record — run across at least three different backends. The harness is proven portable. You graduate self-reliant.

Why this matters

This is the graduation proof. Not a demo. Not a test case. One actual week of your real work, using your real projects, for your real clients or goals — across at least three different backends. This matters because the only way to know your harness works is to use it when something is actually at stake. When Anna is on deadline for a policy brief and the tool she usually uses is down. When Jón needs to deliver client code and his primary backend has a rate limit. When Linh's brand harness needs to run on a model that doesn't log her clients' content. The stress test breaks things. That's the point. Every breakage is a gap in the harness. You fix it. The harness gets stronger. At the end of the week, you write GRADUATION.md — not a report card but a statement of what you built, how it held up, and what you'd do differently. Then you are done with courses. You have a system. The only limit is your imagination.

How to do it

  1. 1

    Define your three backends before the week starts

    Pick three backends you will actively use: (1) Your primary cloud model for standard work, (2) Your local model for privacy-sensitive tasks, (3) A second cloud model for comparison/fallback. Write them down in HARNESS_SPEC.md under "Active Backends."

  2. 2

    Run your real work through all three during the week

    Do not stage tasks. Use your actual Monday morning workload. Route each task through your router. Note when it works, when it fails, when the output quality difference is significant.

  3. 3

    Log breakage points daily (3 minutes per day)

    A task that the router misrouted. A backend that failed at a specific task type. A quality gap that surprised you. A speed difference that changed your behavior. Log it in a simple STRESS_TEST_LOG.md — date, task type, what broke, what you did.

  4. 4

    Run a mid-week router update

    On Wednesday or Thursday, look at your log. Are any routing decisions consistently wrong? Update the router. Deploy the update. Run the rest of the week with the improved routing.

  5. 5

    Write GRADUATION.md

    End of week. Write it honestly. What you built. What backends you ran. What broke. What you fixed. What you would do differently. One paragraph at the end: what "self-reliant" means for your specific professional context.

The prompt

PROMPT — Pre-Week Advisor (anticipate failure modes)Model: Claude Opus 4.6
I'm running a week-long stress test of my multi-backend AI harness. Here is my current setup:

Harness Spec: [PASTE HARNESS_SPEC.md]
Active backends: [LIST YOUR THREE BACKENDS]
Router: [PASTE YOUR ROUTING TABLE]

I need you to act as my week-one advisor. Before I start:
1. What are the 3 most likely failure modes given my task types and my chosen backends?
2. What should I be watching for that I probably won't notice on my own?
3. What is the single most important thing to get right in the first 2 days?

After I complete the week, I'll share my breakage log and you'll help me write GRADUATION.md.
PROMPT — Write GRADUATION.mdModel: Claude Opus 4.6
I have completed my one-week live stress test of my multi-backend harness. Here are my results:

My Harness Spec: [PASTE HARNESS_SPEC.md]
Stress test log: [PASTE STRESS_TEST_LOG.md]
Changes I made mid-week: [DESCRIBE ROUTER UPDATES]

Help me write GRADUATION.md. It should include:
1. What I built (the harness — what it contains, how it works)
2. What I ran it on (the three backends, how each performed)
3. What broke and what I fixed
4. What I would do differently if starting today
5. One paragraph: what "self-reliant" means for my specific professional context

Make it honest, specific, and something I'll actually want to read in a year.

Success criteria

  • One week of real professional work completed across at least 3 backends
  • STRESS_TEST_LOG.md exists with at least 2 logged breakage events
  • Router was updated at least once mid-week based on breakage findings
  • GRADUATION.md written — honest, specific, includes what "self-reliant" means for your context
  • You can teach your harness setup to someone else without notes

Common mistakes

Using staged/artificial tasks instead of real work

Use Monday's actual email backlog. Use the report that is due Friday. The stress test only works if something is at stake. Artificial tasks produce artificial results. Real work breaks real things.

Writing a generic GRADUATION.md ("the course was great, I learned so much")

GRADUATION.md is a professional document, not a course review. Write it as if you're handing a new colleague your harness and explaining how you use it. Specific models, specific task types, specific tradeoffs. Generic is a symptom of not having done the real work.

Not updating the router mid-week when it routes incorrectly

A router that misroutes and is not corrected is worse than no router — it builds false confidence. If you notice a bad routing decision, fix it immediately. The mid-week update is not optional.