AI agents are deployed in production. The systems governing them have not kept up. This paper describes an infrastructure layer that defines, enforces, and audits what agents are allowed to do — without trusting the agent to cooperate.
In the past two years, AI agents have moved from research demos into production. They write code, send email, move money, file tickets, and increasingly take action on behalf of people and companies. The agents have become more capable. The systems around them have not.
Today, when something goes wrong, the answer is almost always the same: retrain the model, rewrite the prompt, add a human reviewer. These are relational interventions — they depend on trust between developer and model, or between user and agent.
AI agent safety is, at its core, a governance problem rather than an alignment problem. Alignment is necessary. It is not sufficient.
What is missing is the layer between the agent’s decision to act and the system’s commitment to that action.
RLHF and post-training safety make outputs more likely to be acceptable. But they operate inside the model and offer no record of why a specific decision was made.
Input and output filters work for the app that owns the prompt. They fail when agents act across systems the prompt author does not control.
Works until volume increases, until the human starts approving by reflex, or until the agent acts in ways the human cannot evaluate.
The common failure: all three trust the agent, the developer, or the reviewer to do the right thing. This is relational oversight. It scales linearly with attention and breaks under load.
Covenant sits between an agent and the systems it acts on. Every action passes through it. The framework does five things, in order.
Every actor has a verifiable identity. Agents cannot impersonate other agents. Capabilities are bound to identities, not sessions or tokens.
Each identity has a defined set of capabilities. An agent that has not been granted the ability to send email cannot send email. Deny by default.
Policies are written in a DSL with temporal semantics. They express what is forbidden across sequences of actions, not just single calls. Compiled to deterministic monitors.
Violations get graduated responses. Warning, then throttle, then suspend. Each tier narrows what the agent can do per unit time, bounding damage.
Every decision is logged in W3C provenance format. A graph of causes and effects that can be queried, audited, and presented as evidence.
Intercepts every action at the boundary between agents and resources. No agent cooperation required.
Text-based policies compiled into deterministic monitors. Runtime cost is a state transition, not an LLM call.
Every event signed and chained. Auditors reconstruct the causal chain behind any decision. Tamper-evident.
A customer service agent attempts a third refund to the same customer within 24 hours. The agent’s prompt does not forbid this. From the agent’s perspective, it is being helpful.
Agent is valid. Refunds are within its capabilities. Quota not exceeded. So far, the action proceeds.
A rule fires: “no agent may issue more than two refunds to the same customer within 24 hours.” Two prior events found. Verdict: escalate.
The refund enters a human review queue. The provenance layer records the agent, the operation, the triggering events, the policy, and the verdict.
Six weeks later, the answer is in the log — the policy, the human who decided, and the timestamp of every step. No retraining needed.
Covenant does not make models better. It does not reduce hallucination or improve reasoning. It governs what the model can do once it tries.
It makes human oversight tractable. The human role shifts from approving routine actions to reviewing edge cases and audit trails.
What it provides is a structural defense that does not depend on agent cooperation, and an evidentiary record that does not depend on agent honesty.
Guarantees hold over observed events. An agent that acts through channels Covenant does not mediate falls outside the enforcement surface.
Company A’s agent acts on company B’s systems with company C’s data. Without common governance, disputes reduce to taking the agent’s word.
The EU AI Act, US state-level laws, and sectoral regimes all converge on: demonstrable controls and audit trails for automated decisions.
A governance layer at the model is captive to one vendor. A layer at the system is portable across all of them.
Covers the problem, architecture, a concrete walkthrough, honest limitations, and the research program ahead.