Agent Policy

Editing the policy that governs your agent.

Agent Policy is the cockpit. Every rule that decides what your agent can do without asking, what it must ask about, and what it cannot do, lives here. Open it at /dashboard/agent-policy.

What Agent Policy is

A list of plain-language rules. Each rule says: when my agent tries to do this, Guardian should respond like this. The cockpit is the human-facing surface where you read, understand, change, and add rules.

The complexity of Guardian lives in its engine: 56 risks, five matcher classes, capability manifests. None of that surfaces here. The cockpit’s whole job is to make a ruleset something you read top to bottom and understand without a manual.

How a rule reads

Every rule reads as one sentence:

When my agent tries to [do X]  ->  [Allow / Ask me first / Block]

Concrete examples from the default policy:

When my agent tries to delete many files at once       ->  Ask me first
When my agent tries to run a command as administrator  ->  Ask me first
When my agent tries to install a package system-wide   ->  Allow
When my agent tries to alter or delete ORBIT's audit
  records                                              ->  Block

That is the whole vocabulary. You never see a rule ID, a regex, a matcher name, or any other engine artifact in the cockpit. If you do, that is a bug; tell us.

The 50 default rules

Target launch behavior, not yet live in this build. The current /dashboard/agent-policy surface is a smaller preview. The full Guardian default set lands before public launch; this section describes the launch behavior you should expect.

At signup, your policy is seeded with the Guardian default set: one rule for every v1-eligible risk in the catalog. They are grouped by category (Files and folders, Administrator access, Passwords and keys, Git history, and so on; 14 groups in total).

Default rules are always present. You can change a default rule’s verdict freely; you cannot delete it. That is on purpose: deleting a default rule would mean a real risk goes ungoverned and silently invisible. Guardian does not do that. Every v1 risk is always in the list. The only question you answer is how Guardian responds to it.

The list is the safety floor. It is never shorter than the floor.

Allow / Ask me first / Block

Target launch behavior, not yet live in this build. The verdict vocabulary below is the launch model. The current /dashboard/agent-policypreview supports the same three states but uses the labels “Allow,” “Ask me,” and “Block,” and any rule change (add, edit, remove, verdict change) goes through a “Confirm policy change” review step before taking effect. SMS confirmation is simulated in this build, not yet wired.

Three verdicts. One control per rule, three states.

  • Allow. Your agent does this without asking. The action is still recorded; you can review it in /auditlater. Allow is the supported way to say “stop bothering me about this.” The rule stays in the list and you can flip it back any time.
  • Ask me first. Your agent pauses, Guardian texts you, and the action runs (or does not) based on your reply. If you do not reply within 120 seconds, the action is blocked. Fail-closed by design.
  • Block. Your agent is not allowed to do this. Guardian tells the agent the action was blocked, and the action is recorded as blocked.

At launch, you change a default rule’s verdict from the rule row and the change applies after a single confirm step. Inline rule-row verdict toggles are part of the launch cockpit and are not yet in this preview.

Custom rules: narrowing a default

Target launch behavior, not yet live in this build. The narrowing model below describes how custom rules work at launch, on top of the full default ruleset. The current /dashboard/agent-policy preview is a simpler rule editor: the+ Add rule button opens a form for one rule (action, plain-language description, verdict, and the fallback if you do not reply in time), and any add/edit/remove goes through a Confirm policy change review before taking effect. Narrowing on top of a default rule and adapter-evaluability fallback are part of the launch cockpit, not this preview.

A custom rule narrows an existing default to a condition you care about and gives that condition its own verdict. Example:

When my agent tries to delete many files at once,
  and the files are outside your project folder      ->  Block

The default rule for “delete many files at once” is still Ask me first. The custom rule is an extra row that says and if the files are outside the project folder, do not even ask me, just block.

Custom rules can be created and deleted freely. Deleting one is safe: the base default rule is still there governing everything the custom rule did not cover. No risk goes ungoverned.

At launch, you use the + Rule button on the cockpit. Three short steps: pick the action, pick the condition (from a fixed plain-language menu), pick the verdict. The cockpit shows you the finished sentence before you confirm.

The condition menu at launch is the same fixed list regardless of which adapter you installed. If you pick a condition your installed adapter cannot evaluate, the rule still saves; Guardian falls back to the underlying default verdict for that action and the receipt records the condition as not-evaluable. Adapter-aware menu gating (only offering conditions the installed adapter can evaluate) is planned but not shipped. Both the fixed-condition menu and the not-evaluable receipt path are part of the launch cockpit and are not in this preview.

Letting your agent edit the policy

Ships at GA, gated on adversarial self-protection testing (ADR 0006). AI-assisted policy editing, where your agent proposes rule changes in plain language with a human-review banner before anything moves, is a launch surface, not post-launch work. It turns on for your tenant once the human-review path has been built and adversarially tested as its own governed action. Until that gate clears, you edit the Agent Policy yourself through the cockpit using the steps in the previous section.

The reason this path is gated and not the default: AI-assisted policy editing is a social-engineering surface. An agent that has been manipulated, prompt-injected, or simply confused can write a plain-language diff that sounds reasonable and quietly weakens your protection. That is the exact path an attacker would take. We will not enable it on your tenant until the adversarial self-protection tests in ADR 0006 pass. Until then, your Agent Policy only changes when you change it.

Reset and start over

Target launch behavior, not yet live in this build. The reset controls described below land alongside the full default-rules set.

Two reset operations:

  • Reset a single default rule.Returns its verdict to the Guardian default. The rule’s row shows a small “changed” dot when its verdict differs from the Guardian default; clicking that dot resets it.
  • Reset all to defaults.Available from the cockpit overflow menu. Restores every default rule’s verdict and, with a clear second confirmation, removes all custom rules. This is the “I want to start over” button.

No reset destroys past audit records. Your history of governed actions stays intact.

Mobile is read-only at launch

Target launch behavior, not yet live in this build. Mobile-specific behavior described below is part of launch and not yet shipped.

On mobile, the cockpit shows the rule list and rule detail, including recent activity per rule. You can review approval requests and reply to them; you cannot create, edit, or delete rules from mobile at launch.

Editing on mobile is post-launch.