$clawproof --check-all --verbose

Your agent is in production.
The question is whether it should be.

Clawproof is a reliability and governance practice for teams shipping AI agents. Ten checks, distilled from real production incidents. A free assessment that takes five minutes. An open-source toolkit you can run today.

Score your agent→Read the 10 checks

5 minutes · 20 questions · no email required

example.report · illustrativegenerated 2026-04-25 14:21 UTC

Clawproof Score

/ 100

Production-ready

Most checks are real. Two gaps below would block a clean audit.

Per-check breakdown

Tool Permissions

80%

Logging & Audit

60%

Prompt Injection

40%

Human-in-the-Loop

80%

Kill-Switches

100%

Secrets Management

80%

Evaluation & Regression

20%

Data Boundaries

60%

Cost Controls

80%

Multi-Agent Coordination

70%

Run yours →this is a sample report. yours will be specific to your stack.

The operator

Werner Plutat

Legal Engineer. LL.M. Big 4 + DAX. I run AI agents in production every day. These checks come from real operations, not papers.

linkedin.com/in/wplu ↗

example.log — illustrative failure patterns3 entries

[ERROR] rate_limiter: request burst exceeded cap → circuit open
        → Prevented by Check #09 Cost Controls

[WARN ] tool_permissions: agent requested scope beyond grant, denied by policy
        → Prevented by Check #01 Tool Permissions

[INFO ] kill_switch: manual halt triggered, in-flight actions cancelled
        → Prevented by Check #05 Rollback & Kill Switches

illustrative. not real production logs. each entry maps to the check that catches the failure mode.

The 10 checks

What separates an agent that survives from one that doesn't.

view all ten →

01Security

Tool Permissions & Least Privilege

Your agent starts with read access. Within a month, it has keys to everything.

02Quality

Logging & Audit Trails

When something breaks, can you trace what happened, why, and who approved it?

03Security

Prompt Injection & Data Exfil

Untrusted content in, secrets out. The attack surface nobody tests.

04Governance

Human-in-the-Loop & Escalation

When should an agent stop and ask? Define the boundaries before production.

05Operations

Rollback & Kill Switches

Can you stop your agent in 30 seconds? If not, you're not production-ready.

06Security

Secrets Management

API keys in prompts, tokens in logs. Zero secret sprawl or bust.

07Quality

Evaluation & Regression Testing

You upgrade the model. Agent behavior changes. No tests caught it.

08Governance

Data Boundaries & RAG Governance

Your agent can read everything it retrieves. Can it read everything it should?

09Operations

Cost Controls & Rate Limiting

A runaway loop at 3AM. No spending cap. Monday morning surprise.

10Quality

Multi-Agent Coordination

Two agents, one resource, zero coordination. Race conditions aren't just for code.

Open source

Ten checks you can run on your own agent in five minutes.

★ on GitHub →

install (Claude Code)

# add the skill to Claude Code
claude skill install clawproof-checks

# run all ten checks against your repo
clawproof --check-all --verbose

install (yaml only)

# clone the repo
git clone github.com/lexbeam-software/clawproof-checks
cd clawproof-checks

# inspect any check
cat checks/03-prompt-injection.yaml

Read the project page →

$clawproof --assess

Twenty questions. Five minutes. A score that tells you what to fix first.

The free assessment is the same questionnaire we use with paying clients. The score is generated locally in your browser. No email, no calendar invite, no follow-up.

Start the assessment →Browse the checks first

Newsletter · monthly · ~600 words

One operator's notes on running agents in production. No funnel, no upsell.

Your agent is in production.The question is whether it should be.

What separates an agent that survives from one that doesn't.

Ten checks you can run on your own agent in five minutes.

Twenty questions. Five minutes. A score that tells you what to fix first.

Your agent is in production.
The question is whether it should be.