$clawproof --check-all --verbose
Your agent is in production.
The question is whether it should be.
Clawproof is a reliability and governance practice for teams shipping AI agents. Ten checks, distilled from real production incidents. A free assessment that takes five minutes. An open-source toolkit you can run today.
5 minutes · 20 questions · no email required
example.report · illustrativegenerated 2026-04-25 14:21 UTC
Clawproof Score
0
/ 100
Production-ready
Most checks are real. Two gaps below would block a clean audit.
Per-check breakdown
01
80%
02
60%
03
40%
04
80%
05
100%
06
80%
07
20%
08
60%
09
80%
10
70%
Run yours →this is a sample report. yours will be specific to your stack.
The operator
WP
Werner Plutat
Legal Engineer. LL.M. Big 4 + DAX. I run AI agents in production every day. These checks come from real operations, not papers.
example.log — illustrative failure patterns3 entries
[ERROR] rate_limiter: request burst exceeded cap → circuit open → Prevented by Check #09 Cost Controls [WARN ] tool_permissions: agent requested scope beyond grant, denied by policy → Prevented by Check #01 Tool Permissions [INFO ] kill_switch: manual halt triggered, in-flight actions cancelled → Prevented by Check #05 Rollback & Kill Switches
illustrative. not real production logs. each entry maps to the check that catches the failure mode.
The 10 checks
What separates an agent that survives from one that doesn't.
01Security
Tool Permissions & Least Privilege
Your agent starts with read access. Within a month, it has keys to everything.
02Quality
Logging & Audit Trails
When something breaks, can you trace what happened, why, and who approved it?
03Security
Prompt Injection & Data Exfil
Untrusted content in, secrets out. The attack surface nobody tests.
04Governance
Human-in-the-Loop & Escalation
When should an agent stop and ask? Define the boundaries before production.
05Operations
Rollback & Kill Switches
Can you stop your agent in 30 seconds? If not, you're not production-ready.
06Security
Secrets Management
API keys in prompts, tokens in logs. Zero secret sprawl or bust.
07Quality
Evaluation & Regression Testing
You upgrade the model. Agent behavior changes. No tests caught it.
08Governance
Data Boundaries & RAG Governance
Your agent can read everything it retrieves. Can it read everything it should?
09Operations
Cost Controls & Rate Limiting
A runaway loop at 3AM. No spending cap. Monday morning surprise.
10Quality
Multi-Agent Coordination
Two agents, one resource, zero coordination. Race conditions aren't just for code.
Open source
Ten checks you can run on your own agent in five minutes.
install (Claude Code)
# add the skill to Claude Code claude skill install clawproof-checks # run all ten checks against your repo clawproof --check-all --verbose
install (yaml only)
# clone the repo git clone github.com/lexbeam-software/clawproof-checks cd clawproof-checks # inspect any check cat checks/03-prompt-injection.yaml
$clawproof --assess
Twenty questions. Five minutes. A score that tells you what to fix first.
The free assessment is the same questionnaire we use with paying clients. The score is generated locally in your browser. No email, no calendar invite, no follow-up.
Newsletter · monthly · ~600 words
One operator's notes on running agents in production. No funnel, no upsell.