Engineering

One Policy Gate for an Autonomous Agent

Q: How does the allowlist work?

Each project stores an allowed_actions object (raise_pr, comment_on_jira, comment_on_pr, approve_pr, request_changes_on_pr). The gate maps a tool name to one of those actions and permits the call only if the flag is true. Read-only tools aren't gated.

Q: Can it run a different model or a local coding agent?

Yes. A USE_CLI_AGENT toggle swaps the Bedrock API loop for a local coding-CLI agent, and the model is a configurable Bedrock inference profile. The policy gate is identical either way, so changing the model doesn't change the guardrails.

June 22, 20267 min read

Autonomous Agent Guardrails: One Gate, One Log

The problem is not a model suggesting a pull request. The problem is a model that can open the PR, comment on the ticket, and approve someone else's change before anyone notices the missing permission check.

TL;DR

The agent loop wraps an off-the-shelf agent core running on Amazon Bedrock, with the model's credentials and context resolved from the specific project, not shared across tenants. The autonomous agent guardrails are a beforeToolCall hook that checks requested actions against the project's allowed_actions allowlist. Actions not on the list are hard-blocked and audit-logged; allowed calls are logged too. The architecture rests on three separated concerns: what the agent is capable of, what it is permitted to do, and how well it does it. The weak point is tool classification: mutating tools must declare an action before they can run, and today an unmapped tool passes as read-only.

This is Part 3. Part 1 was the path from Slack mention to PR; Part 2 was the queue underneath it. This is what runs when a worker claims a task: the agent loop, and the function that governs what the agent can do in the world.

The mental model worth holding onto before the code: capability, permission, and judgment are three separate controls. The agent core supplies capability. The beforeToolCall gate enforces permission. A downstream human review handles judgment. Conflating any two of those is where agent safety tends to break.

The loop is boring on purpose

The agent runner wraps a general-purpose agent core (the tool-calling loop, message history, and model plumbing are not where the value is, so they are a dependency). What the runner owns is the policy: which model, whose credentials, what context, and what the agent is allowed to do. The hosted model side runs through Amazon Bedrock, while the tool loop still has to obey the same local policy boundary.

export async function runAgent(ctx: RunContext, flow: { ... }): Promise<RunResult> {
  if (process.env.USE_CLI_AGENT === "1") {
    // swap the API loop for a local coding-CLI agent; same gate applies
  }
  const { tenant } = ctx;
  const agent = new Agent({
    model: ctx.config.model,              // a cross-region Claude inference profile on Bedrock
    credentials: [REDACTED:secret]      // resolved for THIS project, never shared
    beforeToolCall: async ({ toolCall }) => {
      const verdict = isToolAllowed(toolCall.name, tenant.allowedActions);
      if (!verdict.allowed) {
        await audit.blocked(tenant.id, toolCall.name);
        return { block: true, reason: verdict.reason };
      }
      await audit.allowed(tenant.id, toolCall.name);
    },
  });
  // run the flow's instructions through the loop and return the result
}

Two things to notice. The credentials and context are resolved from tenant (the project), so the model that triages the payments channel sees payments' repos and tokens and nothing else. There is also a single escape hatch, USE_CLI_AGENT, that swaps the Bedrock API loop for a local coding-CLI agent without changing anything about the gate. The model is swappable. The policy is not.

beforeToolCall is the intended door

Actions the agent can take in the world (open a PR, comment on an issue, approve a PR, request changes) are all tools. The agent core invokes beforeToolCall before tool calls, making it the intended place where "is this allowed?" gets asked. Whether it is genuinely the only door depends on the tool-to-action map being complete (more on that below).

The allowlist is per project, a small explicit shape stored as JSON on the project row:

type AllowedActions = {
  raise_pr: boolean;
  comment_on_jira: boolean;
  comment_on_pr: boolean;
  approve_pr: boolean;
  request_changes_on_pr: boolean;
};

The check maps a tool name to the action it represents and looks it up:

export function isToolAllowed(tool: string, allowed: AllowedActions): Verdict {
  const action = ACTION_FOR_TOOL[tool];      // e.g. "open_pull_request" -> raise_pr
  if (!action) return { allowed: true };      // read-only tools aren't gated
  return allowed[action]
    ? { allowed: true }
    : { allowed: false, reason: `${action} not permitted for this project` };
}

Read-only tools (reading a file, fetching a stack trace) are not in ACTION_FOR_TOOL and pass straight through. Mutations are gated. A project that may investigate but not act simply has mutation flags set to false, and the agent can look at everything and change nothing.

Why the gate needs to be centralized

The tempting alternative is to check permissions where each action happens: a guard in the open-PR tool, another in the comment tool, another in the approve tool. That spreads security-critical logic across tools, and the day someone adds a new tool and forgets the guard is the day the allowlist has a hole nobody notices.

Routing tool calls through beforeToolCall avoids that scatter. Here is the sharp edge worth being honest about, though: today, an unmapped tool passes as read-only. Consider merge_pull_request or write_file_to_repo added without a corresponding entry in ACTION_FOR_TOOL. Both would sail through the gate unblocked. Audit would record the call, but nothing would stop it.

Flipping the default to block unmapped tools and require each to declare its action would close that gap. The cost is real: any genuinely read-only tool not yet mapped would also be blocked until registered, adding friction during development. Because the gate is centralized, though, that fix is a one-line change in one place rather than an audit of every tool.

The audit log is the other half. Calls (allowed or blocked) are recorded against the project: what the model tried, whether it was permitted, when. "What has this agent done, and what did it try to do that we stopped?" becomes a query rather than a forensic reconstruction. For an autonomous agent acting on real repos, that record is not optional. It is how trust gets established after the fact. NIST's AI Risk Management Framework makes the same basic argument at scale: trust follows from governable systems, not only capable ones.

The honest limits

The allowlist is coarse. It is per project and per action, not per repo-path or per file, so a project permitted to raise_pr may raise a PR touching anything in its repos. The gate's correctness depends on the tool-to-action map being complete (the gap described above, and the argument for blocking unmapped tools by default). beforeToolCall governs what the agent does, not how well. Quality is a separate review step (the reaction-based gate from Part 1).

That separation is intentional, and it maps back to the three-part model from the top: capability, permission, judgment. It shows up in Building Press, Part 4: Review is where the human gates the irreversible and in A Slack Mention Becomes a Pull Request. The gate delivers one thing: a guarantee that there is one place where "can it?" is decided, and one log of everything it asked.

FAQ

What is beforeToolCall and why is it the whole safety model?

It's a hook the agent core runs before tool invocations. Routing the agent's actions through that one function means mutations are checked against the project's allowlist in a single place, and when a new tool is added you change one gate rather than hunting for scattered guards.

How does the allowlist work?

Each project stores an allowed_actions object (raise_pr, comment_on_jira, comment_on_pr, approve_pr, request_changes_on_pr). The gate maps a tool name to one of those actions and permits the call only if the flag is true. Read-only tools aren't gated.

Can it run a different model or a local coding agent?

Yes. A USE_CLI_AGENT toggle swaps the Bedrock API loop for a local coding-CLI agent, and the model is a configurable Bedrock inference profile. The policy gate is identical either way, so changing the model doesn't change the guardrails.

What are the gate's weaknesses?

It's coarse (per-project, per-action, not per-path), it depends on the tool-to-action map being complete (an unmapped tool can slip through as read-only, which argues for blocking unmapped tools by default), and it governs permission rather than quality. Output quality is handled by the separate human reaction-based review.