# Designing agent guardrails at the frontier
Powerful agents demand powerful safeguards. This manifesto proposes a layered safety architecture for frontier-scale systems.
## 1) Separate powers by process
- Isolate high-risk capabilities into dedicated MCP servers with their own policies.
## 2) Escalate scope explicitly
- Default to least privilege; escalate with evidence and approvals.
## 3) Observe everything
- Stream events, decisions, and deltas; sample traces for postmortems.
## 4) Train for refusals
- Reward intentional non-action when risk thresholds are exceeded.
## 5) Human-in-the-loop as design, not bolt-on
- Route ambiguous acts to human review queues; capture rationales.
## 6) Kill-switch and recovery drills
- Practice failure: inject faults; rehearse rollback and revoke credentials.
## 7) Align incentives in code
- Encode SLOs, budgets, and policy hints into tool contracts.
---
### Reference pattern
- Present an **Action Proposal** → run **Static Checks** → launch **Execution** → stream **Telemetry** → compute **Risk Score** → gate with **Policy** → emit **Audit**.
---
This is not about slowing teams down. It's about unlocking speed safely.
Designing agent guardrails at the frontier
A bold blueprint for building powerful agents with layered safety, oversight, and fail-safes.