Source linked

LLM Context Windows son principalmente marketing - el real corte es 100k Tokens

El rendimiento cae drásticamente por encima de los 100,000 tokens independientemente del tamaño anunciado, haciendo de grandes ventanas de contexto un truco de marketing en el que los agentes de codificación no pueden confiar.

large language modelscontext windowsclaude codecoding agentsattention mechanismllm performance

Every LLM vendor ships a context window number that keeps getting bigger — 200k, 1M, 2M — but the part that actually works tops out around 100k tokens, and nobody tells you that.

Garrit, writing a sharp personal note on his blog, splits an LLM's context into two zones: a smart zone where attention is sharp, and a dumb zone where the model starts forgetting what you told it five minutes ago. The cutoff sits somewhere around 100k tokens, and it doesn't matter how big the advertised window is. Studies like RULER and Chroma's report on context rot confirm that effective context is a fraction of the box number and that performance degrades gradually as you fill the window.

Why Coding Agents Are the First to Hit the Wall

A modern agent burns through tokens fast. A few file reads, a long debug session, a sprawling test run, and you're at 100k before lunch. That's exactly when the model starts hallucinating or ignoring earlier constraints — not because the agent is buggy, but because the underlying attention mechanism doesn't actually solve long-range retrieval. Vendors paper over this with architectures that scale the window but leave the effective working set unchanged.

Tools like Claude Code now auto-compact: when the session gets long, the agent summarizes the history and starts fresh. That helps, but auto-compaction kicks in after you've already spent time in the dumb zone, and the summary is itself produced by a model that's already degraded. Better than nothing, but I'd rather avoid the situation altogether.

The Smart Workaround: Treat Context as a Budget

Garrit's approach is brutally simple: open a new session and pass it a spec you wrote yourself. That's a much higher signal handoff than any automated summary, because you get to decide what matters going forward. It's the breadcrumb approach applied to agents — leave an artifact that the next session, or the next person, can pick up cleanly.

Projects like obra/superpowers and mattpocock/skills take this further by structuring entire agent workflows around small, named artifacts: PRDs, plans, skills, sub-agent handoffs. Each one moves information out of the live session and into something the next session can read. The working session stays in the smart zone because the context window holds only what's immediately needed.

The number on the box gets bigger every release. The usable part doesn't keep up. Assume only the first chunk of your context window is really working for you, and everything you can move out of the live session and into a written artifact is one less thing for attention to fight over.


Source: Don't trust large context windows
Domain: garrit.xyz

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.