Index
4 min read Year 2026

How Codex Solves Compaction Differently — Encrypted Summaries and Session Handover

Claude Code's 'Compacting conversation...' problem meets Codex's encrypted summary and session handover pattern. A deep dive into context management architecture.

Quick take

Claude Code's 'Compacting conversation...' problem meets Codex's encrypted summary and session handover pattern. A deep dive into context management architecture.

Use Claude Code long enough and you’ll hit the “Compacting conversation…” message. After that, answers start drifting and wait times spike. The 200K token context window fills up faster than you’d expect.

Word kept going around that OpenAI’s Codex handles this problem more cleverly, so I dug into every public analysis I could find.

Summarization Still Means Forgetting

When conversations grow long, AI forgetting earlier parts is a structural limitation. The context window caps at 200K tokens, and a single coding session blows past that easily. Even with summarization, the original conversation is gone — accuracy inevitably drops.

I’ve personally experienced this dozens of times: ask about “that function we discussed earlier” after compaction, and you get a completely wrong answer.

  • Claude Code’s default 200K token window gets consumed in one large refactoring session
  • Summary replaces original → detailed context lost → answer quality degrades
  • Tool call results getting flattened in summaries is especially devastating

Codex’s Compaction Was an “Encrypted Summary”

Kangwook Lee, CAIO at Krafton, reverse-engineered the Codex internal pipeline using two prompt injections, and the results were fascinating.

When the Codex model’s compact() API is called, a separate LLM on the server summarizes the conversation and returns the result AES-encrypted. On the next turn, this encrypted blob gets decrypted, prefixed with a handoff prompt saying “here’s a summary of the previous conversation,” and fed to the model.

  • Nearly identical content to the open-source Codex CLI’s compaction prompt for non-codex models
  • The reason for encryption remains unclear — possibly contains tool call restoration data
  • Reproducible in 35 lines of Python (script published by Kangwook Lee)
  • OpenAI’s official API supports server-side automatic compaction via the compact_threshold setting

The Real Difference Is How Sessions Get Handed Over

More interesting than compaction itself is cross-session context transfer. One developer’s automation was impressive — I call it the “session handover” pattern.

Right before compaction, write tools get blocked and only user messages and thinking blocks are extracted from the JSONL session log. This reduces volume by 98% compared to the original. Then three sub-agents find gaps in the summary by searching the original logs and compile everything into a resume-prompt.md file.

When VS Code’s file watcher detects this file, a new session opens automatically and inherits the previous context seamlessly.

  • Pre-compact hook blocks writes before compaction → prevents code modifications in incomplete state
  • JSONL → MD conversion preserves only user messages + system messages + thinking blocks
  • Sub-agents perform gap analysis and retrieve missing information from the original logs
  • Reported 10x improvement in build efficiency

The Real Game Is Session Log Search and KV Cache

Session data accumulates as JSONL files, so the decisive factor is how accurately you can retrieve the context you need from them. The answer isn’t better summarization — it’s retrieval-based search across past sessions.

Factor in KV cache hit rates, and you can reuse the same prompt prefix to cut both cost and response latency simultaneously. When I designed my own session folder structure, session-id-based archiving had the biggest impact on search speed. Integrating QMD — which I covered yesterday — for pre-indexing looks like a promising direction too.

  • Preserving raw JSONL enables precise search when needed
  • resume-prompt.md includes previous session summary + gap analysis results + modified file list
  • Fixing system prompt and handoff prompt prefixes maximizes KV cache hits
  • Session archiving automation maintains context across dozens of consecutive sessions

The Real Bottleneck in AI Coding Is Context Management

The true bottleneck in AI coding tools isn’t model performance — it’s context management. Designing a system that retrieves what was forgotten matters more than perfecting summarization.

Compaction inevitably loses information. What matters is building both a search pipeline that can retrieve lost information and a handover architecture that transfers context between sessions without gaps.

Based on analysis by Kangwook Lee, CAIO.

Join the newsletter

Get updates on my latest projects, articles, and experiments with AI and web development.