OpenAI runs Codex with sandboxing, approvals and telemetry
Enterprise
OpenAI has published how it runs Codex, its coding agent, without spraying secrets across the network or letting the model go walkabout. The controls are what you would expect from any productionised Large Language Model (LLM) agent: put it in a sandbox, gate the risky actions, lock down the network, and watch everything it does. Sensible, not sexy.
What they built
The deployment pattern is built on four families of control. Sandboxing isolates where code runs and narrows what the agent can touch. Approval workflows put a human in the loop for higher-risk operations. Network policies restrict where data can flow and what the agent can talk to. Agent-native telemetry records what the agent did and why, giving operations and audit something concrete to inspect.
They frame this as support for threat modelling, governance and containment. In plain terms: expect the model to make bad decisions sometimes, limit the blast radius, and make sure you can see and explain its behaviour when it does.
How it breaks
If you are attacking this setup, you try to expand capability or hide intent. That means prompting the agent to perform actions its sandbox nominally forbids, or chaining allowed actions to reach the same end. It means leaning on the narrow set of permitted network destinations to stage exfiltration through whatever is still open. It means gaming the approval workflow, either by crafting outputs that look routine or by social-engineering the human gatekeeper. And it means degrading or dodging telemetry, for example by triggering high-volume, low-value events so the meaningful ones get missed.
Telemetry itself is a double-edged sword. It enables oversight, but it also collects sensitive material: code, prompts, context, maybe customer data. Retention rules, access controls and data minimisation become part of the attack surface and the compliance headache. Get them wrong and you have built a very tidy evidence trail for the wrong people.
There are limits here. This is an outline of controls, not a results paper. No red-team data, no measurable trade-offs, no operational cost curves. As an enterprise pattern, it is the baseline auditors will soon expect for coding agents. As research, it is mostly a signpost: the interesting work is in how these controls are tuned, penetrated and monitored in the messy middle of real deployments. Watch this space.