- 9 min read

Agents don't need better prompts. They need better permissions

For the last year, most “agent” demos have had the same vibe. Give the model a tool, watch it call the tool, clap when it completes the task, then quietly skip the part where that tool touches anything important.

That framing is getting harder to sustain.

On 5 March, GitHub put Copilot coding agent into Jira as an assignee that can read the issue, ask questions, work independently, and open a draft pull request. In the same March window, Atlassian made its Rovo MCP server generally available, with domain allowlists, IP allowlist support, audit logs, and browser-based OAuth 2.1. The MCP roadmap, updated on 5 March, now explicitly calls out audit trails, enterprise-managed auth, and gateway patterns as enterprise readiness priorities, with least-privilege security work tracked as a separate community interest area.

That’s the market telling you something.

The big learning is simple: once the model can read tickets, browse the web, touch internal systems, and write back into your workflow, the hard engineering problem is no longer “how do I prompt it better?” It’s “what exactly is this thing allowed to do when it reads something malicious, misleading, or simply wrong?”

The industry finally looks like it believes this

Look closely at the recent agent launches and the interesting bits are the boring admin bits, not the “reasoning” claims.

GitHub’s Jira integration is built around installation, repository scoping, existing review rules, and agent visibility inside the workflow. Atlassian’s GA write-up talks like enterprise software: trusted domains, security policies, allowlists, audit logs, OAuth flows, token reliability for long-running sessions.

Anthropic shipped Claude Code’s Auto Mode on 12 March with the same energy. The headline was sandboxing: filesystem isolation, network isolation, and a risk classifier that decides which actions need human approval. They reported an 84% reduction in permission prompts, without removing guardrails: the model classifies actions by consequence before executing them.

That’s not accidental. Once an agent stops being fancy autocomplete and starts acting like a semi-autonomous teammate, you end up rebuilding the same controls you would use for an employee, a contractor, or an integration with too many privileges. Who can access what? For how long? Through which boundary? With what logs? Under whose approval?

In other words: the minute your agent becomes useful, it also becomes governance.

Prompt injection is not really a prompt problem

OpenAI’s 11 March security post made the clearest version of this argument I’ve seen from a frontier lab so far: the most effective real-world prompt injection attacks increasingly look more like social engineering than simple prompt overrides.

That framing matters because it kills a very persistent fantasy in agent design: that you can solve the whole problem with better instructions at the top of the context window.

You can’t.

Even the best public numbers are sobering. Anthropic’s prompt injection benchmarks show a 1% attack success rate on their strongest model against an adaptive attacker. That sounds low until you scale it: a production agent processing thousands of untrusted inputs a day is eating dozens of successful injections every day.

And this is already happening. In January 2026, researchers found three prompt injection vulnerabilities in Anthropic’s own official MCP Git server. An attacker who could influence what the agent reads (say, through a crafted commit message or README) could weaponise those flaws without ever touching the system directly. The model does not need to be stupid for this to work. It just needs to be trusting.

If an agent reads external email, issue comments, documentation, websites, and uploaded files, then it lives in an adversarial environment. That’s true even if your users are lovely and your team means well. So the question stops being “can the model perfectly distinguish good instructions from bad ones?” and becomes “what happens when it can’t?”

That’s how we already think about humans in risky workflows. Customer support staff can be manipulated. Finance staff can be phished. Ops staff can be rushed into bad calls. We don’t respond to that by writing a more inspirational handbook and hoping for the best. We add spending limits, approval steps, audit logs, policy checks, scoped access, and monitoring.

Agents deserve the same treatment. Probably stricter.

The real agent stack is auth, policy, and runtime controls

Another OpenAI post from the same day, “From model to agent: Equipping the Responses API with a computer environment”, is revealing for a different reason. The security story is isolation, not “our model is so aligned it can be trusted with anything.”

Their description of the runtime amounts to a security architecture doc in disguise:

  • the model proposes tool calls, but the platform executes them
  • the agent runs inside an isolated container
  • outbound traffic goes through a central policy layer
  • secrets are injected only for approved destinations
  • network access is explicitly restricted

That’s the right mental model. The model is a planner operating inside a box. The platform holds the authority.

The same pattern shows up in OpenAI’s 28 January post on link safety. The stronger guarantee they aimed for was “this exact URL is already known to be public from an independent web index”, rather than “this domain feels reputable”. If not, the system falls back to user control.

I like that a lot because it’s so unglamorous: no magical classifier that promises to detect all bad intent, no heroic claim that the model just “knows” when a link is safe. Just a tighter safety property, narrower automation, and a checkpoint when certainty drops.

That’s the kind of engineering that ages well.

What too many teams are still getting wrong

I keep seeing teams talk about agents as if the core challenge is orchestration. Which planner should we use? How many tools should be exposed? Should the model reflect before acting? Should we give it scratchpads, tasks, memory, sub-agents?

Fine. Some of that matters. But I’m genuinely tired of sitting in architecture reviews where the entire security model is “the system prompt says not to do bad things.”

If your agent can read a Jira issue, open a repo, inspect docs, browse a site, and post back into your systems, then your real design problem is the blast radius when the plan is based on poisoned context.

There are already numbers on this. Autonomous agents now account for more than one in eight reported AI breaches. And 31% of organisations don’t even know whether they had an AI security breach last year. That’s the state of things.

That blast radius expands very quickly:

  • A malicious web page can try to get the agent to exfiltrate data through a link.
  • An issue comment can smuggle instructions that look operationally reasonable.
  • A connected tool can expose far more data than the current task needs.
  • A write-capable integration can turn a bad inference into a real customer-facing change.

If your answer to all of that is “we have a strong system prompt”, you’re not building a robust agent. You’re building a breach with good copy.

What I would actually prioritise

If I were shipping an agent into a real workflow today, I would care less about giving it more freedom and more about making its freedom legible.

Separate the instruction plane from the content plane. User intent, system policy, and untrusted external content should not sit in one undifferentiated soup and then compete for the model’s attention like they’re morally equivalent.

How agent inputs should flow External content must never bypass the policy layer
Trusted
User requestsSystem policy
Untrusted
External emailIssue commentsWeb pagesUploaded files
Policy layer
AuthScopeValidateAuditApprove
High-consequence actions
Send dataMerge codeClose ticketFollow linkPublish
No direct path from untrusted content to actions. The model finding content persuasive is not authorisation.
Separate the instruction plane from the content plane. Put deterministic controls at every sink.

Put deterministic controls at the sinks. The dangerous moment isn’t only when the agent reads hostile content. It’s when it sends data out, mutates state, merges code, closes a ticket, or follows a link silently in the background.

Scope access like you’re onboarding a new contractor, not enabling autocomplete. Read-only by default. Narrow repo or project access. Short-lived credentials. Explicit elevation for writes.

Make the behaviour inspectable. Session logs, action traces, and audit trails are part of the product now. If you can’t answer “why did the agent do that?” quickly, you don’t really control the system.

Keep human checkpoints where consequence outruns convenience. If an action can leak sensitive data, spend money, publish something, or affect production, the friction is the control surface.

OWASP published a Top 10 specifically for agentic applications late last year. The list reads like a summary of everything above: goal hijacking, tool misuse, privilege abuse, memory poisoning, cascading failures. Their core principle is “least agency.” Give the agent enough capability to do the job. Not a byte more.

The boring teams are going to win this phase

I don’t think the winners of the next year of agent tooling will be the teams with the most dramatic demos. I think they’ll be the teams that treat agents like capable but fallible operators inside a tightly designed environment.

The strongest agent stack won’t just be model plus prompt plus tools. It’ll be model plus prompt plus tools plus auth plus policy plus observability plus approvals plus scoped execution.

The real agent stack The gap between the columns is the actual engineering work
What most teams ship Enough for a demo, not for production
Model
Prompt
Tools
What production agents need The boring parts are the product
Model
Prompt
Tools
Auth
Policy
Observability
Approvals
Scoped execution
Three layers get you a demo. Eight get you a system you can trust.

That sounds less exciting. It’s also a lot closer to the truth.

The industry is slowly admitting this now. GitHub is wiring agents into issue workflows with review rules intact. Atlassian is shipping MCP with audit logs and allowlists. Anthropic is shipping sandboxed agent runtimes with risk classification. The MCP spec community is prioritising enterprise auth and observability. OpenAI is publicly saying prompt injection looks like social engineering, and its concrete mitigations are things like runtime isolation, restricted egress, URL verification, and monitoring.

Good. That’s the adult version of agent engineering.

Because once a model can actually do things, the question is whether the rest of your system assumes it will eventually be fooled, and contains the damage when that happens.

That’s the real product.