/goal: The Six-Hour Codex Run That Survived a Five-Hour Pause
TL;DR
/goalshipped in Codex CLI v0.128.0 on April 30, 2026 as a named headline feature.- It introduces persisted goals: a goal state that survives terminal restarts, laptop sleeps, and multi-hour pauses without re-prompting.
- Runtime continuation means Codex injects a developer message on resume rather than waiting for you to type anything.
- I ran a real session on a TypeScript monorepo. Wall time: about 6h 44min. Actual model compute: about 41 minutes. Final status:
TASK_COMPLETE. - The session burned roughly 6.8M cumulative input tokens at a ~94% cache hit rate. Auto-context-compaction fired once, configurable via
model_auto_compact_token_limit.
I did not plan to run Codex overnight. I started a session at 9:19 PM Berlin time on April 30, watched one turn run for 57 seconds, then closed the laptop and went to bed. When I came back five and a half hours later, /goal was already running again. It had picked up exactly where it left off. I had not re-prompted anything.
That is the thing about /goal that does not come through in a changelog entry. It is not just a new command. It is a different contract between you and the agent.
What Shipped on April 30
Codex CLI v0.128.0 (tagged rust-v0.128.0) dropped on April 30, 2026. The headline from the release notes: “Added persisted /goal workflows with app-server APIs, model tools, runtime continuation, and TUI controls for create, pause, resume, and clear.”
That one sentence packs a lot in, so let me pull it apart.
Persisted goals are the core idea. Previous Codex sessions were ephemeral. Close the terminal, lose the thread. /goal stores the active goal in app-server state, so it outlives the process.
App-server APIs is the plumbing behind that persistence. Codex now talks to a local server layer that tracks goal state.
Model tools means the model itself gets tools for interacting with the goal lifecycle. It can signal completion, request continuation, and inspect goal state as part of its reasoning.
Runtime continuation is the behavior I saw that night. When you resume (or when Codex detects the session is alive again), it injects a developer message prompting the model to continue working. You do not have to type anything.
TUI controls rounds out the surface area. The terminal UI gets explicit create, pause, resume, and clear actions for goal management. You can pause a running goal intentionally, not just by closing the lid.
The rest of v0.128.0 is worth a quick mention. Scrollback reflow now works on terminal resize instead of the text getting mangled. A new codex update command handles CLI self-updates. The composer shows plan-mode nudges when a task seems like a good candidate for planning. TUI keymaps are now configurable. Permission profiles are expanded. The --full-auto flag is deprecated in favor of explicit approval profiles. The desktop app also got polish improvements the same week, though the focus of this post is the CLI. Plan mode itself landed earlier, in v0.122.0 on April 20, 2026. /goal builds on top of that foundation.
What /goal Actually Does
The basic mechanic is straightforward. You type /goal followed by your prompt. Codex stores the goal and starts working. If the session is interrupted (network hiccup, closed laptop, deliberate pause), the goal persists. When the session comes back, Codex resumes automatically via runtime continuation.
The model signals completion with TASK_COMPLETE or the task_complete tool. Until that happens, the goal stays active.
What actually makes this different from a long-running --continue session is the persistence layer. Before /goal, a closed terminal meant a dead session. You could approximate continuity by carefully managing context files and re-injecting prompts, which is basically what the Ralph Wiggum Loop does in a scrappier way. /goal makes continuity a first-class feature.
A few config knobs matter here. In ~/.codex/config.toml, the model_auto_compact_token_limit key sets the threshold for automatic context compaction. The [features] block is where feature flags live. The model_reasoning_effort key sets reasoning effort for the session. If you want hands-off autonomous runs, you will also need approval_policy and sandbox_mode configured correctly. I will get to that.
The TUI also changes. You get visible goal state. You can pause a running goal intentionally without killing the process. Resume picks it back up with runtime continuation.
A Real Six-Hour Run
Here is what a real session actually looked like.
The project was a TypeScript monorepo I am working on. A voice interview system with several end-to-end scenarios that needed to work correctly under a set of defined conditions.
I run Codex with approval_policy = "never" and sandbox_mode = "danger-full-access" for autonomous /goal sessions. These two settings are the precondition for hands-off long runs: the model does not stop to ask permission, and it has full filesystem access to do its work. This is only sane in a trusted project directory with clean git state going in.
The /goal prompt was around 600 words. I wrote it using a structured approach: XML-style blocks organizing the goal, an explicit reading list of ten or more files the model should consult first, working rules (check git status before edits, prefer rg over grep, use apply_patch), a done_when contract spelling out four concrete success criteria, and explicit anti-pattern fences. One of those fences: “do not add string-matching patches to pass one transcript.” If you have worked on voice systems, you know why that fence needs to exist.
Writing a prompt like that is itself a task. If you want to see how I approach prompt design for this kind of work, The Interview Method covers the workflow.
Model: gpt-5.5. Reasoning effort: high.
Session timeline:
- 9:19 PM -
/goalsubmitted. - 9:20 PM - First turn running. I watched it for 57 seconds, then interrupted (
turn_aborted). - 5.5 hours - I closed the laptop. No re-prompting.
- ~2:50 AM - When I came back,
/goalhad already injected a developer message (“Continue working toward the active thread goal”) and was running. Autonomous. - Context compaction fired once, at approximately 6.7M cumulative input tokens.
- Cumulative tokens: ~6.8M input, ~10K output, ~2.6K reasoning tokens. Cache hit rate: ~94%.
- Wall time: 6h 44min. Actual model compute: ~41 minutes across turns.
- Final status:
TASK_COMPLETE. All four target end-to-end voice scenarios passed verification.
Manual transcript review found no prompt loops, no liveness spirals, no premature closes. The model worked through the scenarios methodically and called it done when the criteria were met.
One real-world ceiling worth noting. A TTS first-byte timing field I wanted captured could not be measured, because the upstream library does not emit the relevant runtime event. The model documented this honestly. Explicit nulls in the artifact, with a note explaining why the field was missing. It did not paper over the gap. /goal can give you an autonomous run, but it cannot bypass what the external environment actually exposes.
The ~94% cache hit rate is the number that makes the economics work. 6.8M input tokens sounds alarming until you realize that the actual incremental cost at that cache rate is a fraction of the nominal number.
/goal vs the Ralph Wiggum Loop
I wrote about the Ralph Wiggum Loop a while back. Geoffrey Huntley coined it, and his original post is still the canonical reference: the technique is essentially while :; do cat PROMPT.md | claude-code; done with git history as memory. It solves the same core problem /goal solves: how do you keep an AI agent working on something longer than a single context window?
The approaches are different in character.
| Dimension | Ralph Wiggum Loop | /goal |
|---|---|---|
| Setup | Shell script or plugin, external orchestration | Built into Codex CLI |
| State persistence | Git history, files on disk | App-server APIs, native goal state |
| Resume behavior | Manual re-invocation | Automatic runtime continuation |
| Context management | Fresh context per iteration (by design) | Compaction within session |
| Reasoning continuity | Stateless between iterations | Continuous within session |
| Model | Claude Code | Codex with gpt-5.5 |
| Good for | Tasks that benefit from fresh eyes each pass | Long-horizon tasks with accumulating context |
The Ralph Wiggum Loop is genuinely useful. The stateless-by-design property is sometimes an advantage: each iteration approaches the problem without carrying forward incorrect intermediate conclusions. If the model gets confused, the next iteration starts clean.
/goal bets on continuity instead. The model builds up a picture of the codebase across turns and does not have to re-read everything from scratch on each pass. For tasks where reasoning accumulates (debugging a subtle interaction, navigating a complex state machine), continuity wins. For tasks that are naturally iterative and convergent (adding tests, fixing lint), Ralph’s fresh-context model often works just as well.
Neither is the right default. They are tools for different shapes of problem.
When /goal Is the Wrong Choice
A few situations where I would not reach for /goal.
Undefined success criteria. The done_when contract is not optional. If you cannot write four concrete success criteria before you start, the model has no way to know when it is done. It will either declare TASK_COMPLETE prematurely or loop indefinitely. Write the contract first.
Exploratory work. Early-stage “figure out what this codebase is doing” work benefits from human-in-the-loop. You learn things as the model surfaces them. /goal is for execution, not exploration.
Security-critical paths. I run with approval_policy = "never" and sandbox_mode = "danger-full-access". That setup is only appropriate in project directories I trust completely. Authentication systems, payment flows, anything touching sensitive data: keep approval in the loop.
Unclear external dependencies. If your task depends on an external system you are not sure about, find out first. The TTS timing field I mentioned above is the mild version. The more expensive version is a six-hour run that hits a wall at hour five because the external API does not support what you assumed it would.
Short tasks. /goal has overhead. A task you can finish in ten minutes of interactive Codex is not improved by wrapping it in a persisted goal. The complexity is not worth it below some threshold. My rough heuristic: if the task would not comfortably span two or more separate sessions in the old model, it probably does not need /goal.
The Mindset Shift
Old: Autonomous AI runs are sessions you monitor, ready to intervene when things go sideways. New: Autonomous AI runs are contracts you write upfront, then get out of the way.
The shift is from supervisor to architect. The quality of the /goal session is determined almost entirely before the first turn runs. The prompt quality, the success criteria, the anti-pattern fences, the reading list. Once it starts, your job is mostly done. If you wrote the contract well, the model executes. If you did not, no amount of monitoring will save it.
That is a different skill than interactive prompting. It is closer to writing a spec than having a conversation.
Conclusion
/goal is the most significant thing Codex has shipped since plan mode. The persistence layer and runtime continuation are what make it different from a long --continue session in practice. Six hours and forty-four minutes of wall time with forty-one minutes of actual compute is only possible because the model kept its context, the cache held, and the goal survived a five-hour gap without me touching anything.
The economics work out because of cache hit rates. The quality works out because of upfront prompt discipline. Neither of those things is automatic.
This is the first post in a two-part series. The companion post covers the workflow side: how I prep specs and prompts before they reach /goal. From SPEC.md to /goal: My Codex + GPT-5.5 Workflow.