The Cheapest Way to Speed Up Your AI Coding Agent Is Already in the Repository
Somewhere in your engineering organisation, an AI coding agent is opening a repository it has never seen before. It does not know which files matter. It does not know your naming conventions, your module boundaries, or the fact that the payments service should never be called directly from the user layer. So it explores. It reads files speculatively, issues repeated model requests as it builds a mental map of the codebase, and generates output tokens narrating its own navigation. Your team pays for every one of those tokens. The whole process takes longer than it should, and nobody is tracking why.
Researchers at Singapore Management University, Heidelberg University, the University of Bamberg, and King's College London published an empirical study in early 2026 addressing exactly this failure mode. They measured what happens when an AI coding agent operates on a repository that has an AGENTS.md file, compared to the same agent, on the same task, in the same codebase, without one. Across 124 pull requests drawn from 10 repositories, the presence of an AGENTS.md file was associated with a median runtime reduction of 28.64% and a median output token reduction of 16.58%. Both results were statistically significant under a Wilcoxon signed-rank test. The efficiency gain came not from changing the model, the infrastructure, or the agent system, but from adding a markdown file that developers write in plain English.
What an Agent Does When Nobody Tells It Anything
AGENTS.md is not a new technology. It is a filename convention, now adopted by more than 60,000 repositories, that tells an AI coding agent what a project is, how it is structured, and what conventions govern it. OpenAI's Codex agent reads it during initialisation. GitLab Duo does the same. Anthropic's Claude Code has its own equivalent in CLAUDE.md. GitHub Copilot uses copilot-instructions.md. The format has converged across platforms on the same underlying idea: give the agent a project briefing before it starts work.
What happens without that briefing is the point. The researchers describe their hypothesis clearly: agents without AGENTS.md must infer project organisation through exploratory navigation, leading to more planning iterations, repeated model requests, and higher token consumption. This is not a failure of the model. It is a predictable consequence of sending a capable reasoner into an unfamiliar environment with no orientation. The agent does what any thorough analyst would do when handed a codebase and a task and nothing else. It reads around.
The efficiency cost of that exploration is real, measurable, and largely avoidable.
The study design isolates this effect cleanly. Each pull request was run twice: once with AGENTS.md present in the repository root, once without it, with all other variables held constant. Same repository snapshot. Same task description. Same agent. Same model (gpt-5.2-codex). The only thing that changed was whether the agent had a project briefing at the start.
Where the Savings Come From, and Where They Do Not
The results carry a counterintuitive structure worth examining carefully.
| Metric | Without AGENTS.md | With AGENTS.md | Change |
|---|---|---|---|
| Median wall-clock time | 98.57 seconds | 70.34 seconds | −28.64% |
| Median output tokens | 2,925 | 2,440 | −16.58% |
| Median input tokens | 116,609 | 120,587 | +3.41% |
Input token medians are slightly higher with AGENTS.md present. This is expected: the file itself adds text to the context window, so the agent reads a little more at the start of each task. The savings are not from feeding the model less. They come entirely from what the model does not need to generate. Fewer exploratory steps. Fewer planning narrations. Fewer repeated requests to orient itself in the codebase.
The authors note an important asymmetry in the output token result. The mean output token reduction is 20.08%, while the median reduction is 16.58%. That gap tells you the effect is concentrated in a small number of high-cost runs where AGENTS.md prevents the agent from going deep into exploratory spirals. Across more typical runs, the reduction is present but smaller. The wall-clock time result shows the opposite pattern: mean and median reductions are closely aligned (20.27% and 28.64% respectively), which the authors interpret as a general shift toward faster completion rather than a tail effect. Both results are statistically significant. The input token, cached input token, and total token results are not.
The Honest Boundaries of What This Study Shows
The study was conducted on one agent system, one model, and a deliberately constrained task set. Pull requests were capped at 100 lines of code and five modified files. Documentation-only and configuration-only PRs were excluded. Repositories with multiple AGENTS.md files or subdirectory configurations were excluded. The AGENTS.md effect was treated as binary, present or absent, with no analysis of which content properties drive the gains.
Those constraints are appropriate for a first empirical study of this kind. They make the causal isolation credible. They also mean the results cannot be directly projected onto large refactoring tasks, multi-module changes, or agent systems other than Codex without further replication. The authors are explicit that this is part of their ongoing research agenda.
On output quality: the study did not run a full correctness evaluation. Fifty randomly sampled agent outputs were manually inspected to confirm they were non-empty and non-trivial. The authors describe this as a sanity check, not a quality assessment, and flag a full correctness evaluation as out of scope for this paper. Teams considering AGENTS.md as a cost-reduction tool should not read the efficiency results as an endorsement of output quality under any condition.
One further risk that the paper does not address but practitioners should name: an AGENTS.md file that is outdated, inaccurate, or internally inconsistent may not merely fail to help. It could actively mislead the agent about project structure, encoding stale conventions or incorrect architectural descriptions that the agent then treats as authoritative. The efficiency gains assume the file is maintained. That maintenance is itself a new engineering obligation.
How the Study Was Run
The experimental pipeline is worth understanding because it illustrates the kind of controlled infrastructure required to produce credible agent efficiency measurements.
- The researchers started from a corpus of 132 repositories studied in prior work on agent context files, filtering to 89 repositories with exactly one root-level AGENTS.md.
- Those 89 were filtered further using a local LLM (gpt-oss-120b via Ollama) to classify AGENTS.md content, retaining only repositories where the file covered conventions and best practices, architecture and project structure, or project description. Manual verification followed. This yielded 26 qualifying repositories.
- Ten were randomly sampled. For each, up to 15 merged pull requests were selected, all created and merged after AGENTS.md was introduced in that repository, all modifying code files only, all within the size constraints.
- For each pull request, the repository was restored to its pre-merge state inside a fresh Docker container. If the PR lacked a usable description, a GitHub-issue-style task description was generated using the local LLM.
- The agent ran twice per task: once with AGENTS.md in place, once with it removed. Wall-clock time and four token categories were recorded for each run.
The Docker isolation and per-task fresh cloning are not procedural decoration. They are what makes the comparison valid. Without them, cached state or incidental ordering effects could contaminate the results. The paired within-task design means each pull request serves as its own control.
What This Looks Like at Scale in a Real Engineering Organisation
Consider a platform engineering team running an AI coding agent across a portfolio of twenty internal services. Each service generates thirty to fifty autonomous pull requests per month as the agent handles routine maintenance, dependency updates, and minor feature work. At the token prices current as of 2026 for production-grade coding models, the difference between a 20% output token reduction and no reduction compounds meaningfully over a month of continuous agent operation.
But the more operationally significant number may be the runtime reduction. A median task that runs in 70 seconds instead of 98 seconds sounds modest in isolation. Across a CI/CD pipeline with concurrent agent tasks, faster completion means fewer blocking waits, tighter feedback loops for developers reviewing agent output, and more headroom to run additional agent passes on the same infrastructure budget. The 28.64% median runtime reduction is not a convenience improvement. It is headroom that compounds with scale.
The variability reduction matters too. Standard deviation of wall-clock time dropped from 182 seconds to 137 seconds (a 24.91% reduction). Standard deviation of output tokens dropped from 6,988 to 5,162 (26.13% reduction). More predictable agent runs make capacity planning more tractable and reduce the incidence of runaway tasks that consume disproportionate resources before timing out.
What Executives Running Agent Programs Should Act On
This paper is an empirical benchmark study, not a deployment blueprint. The appropriate strategic response is not a project plan. It is a calibration of assumptions.
First: if your organisation is running AI coding agents and does not have AGENTS.md files (or equivalent) in those repositories, you are absorbing a measurable and avoidable efficiency cost. The intervention required to test this is trivial. Write a markdown file. Commit it to the repository root. Measure what changes. The study provides a controlled baseline; your own repositories will tell you whether the effect holds for your agent system, your codebase size, and your task distribution.
Second: the convergence of AGENTS.md across OpenAI Codex, GitLab Duo, Claude Code, and GitHub Copilot suggests this is becoming a de facto standard rather than a vendor-specific feature. Teams that develop the internal practice of writing and maintaining these files now are building an asset that will compound as agent use expands, not just within Codex workflows but across whichever agent platforms dominate in two years.
Third: the content of the file matters in ways this study does not yet resolve. The researchers selected repositories whose AGENTS.md files covered conventions, architecture, and project description. Whether a file limited to one of those categories, or one that includes security constraints and testing requirements, produces different efficiency profiles is an open empirical question. The paper's finding is that presence versus absence matters. The harder question, which properties of the file drive which outcomes, is the next study.
The deeper implication is about how engineering organisations should think about agent configuration. Prompt engineering is ephemeral: instructions live in individual invocations, are invisible to reviewers, and vanish when the conversation ends. AGENTS.md is version-controlled, reviewable in pull requests, diffable, and collaboratively maintained. It shifts agent guidance from something that lives in an individual developer's mental model to something that lives in the repository alongside the code it describes. That shift has implications for auditability and consistency that extend well beyond the efficiency numbers in this paper.
An AI coding agent that knows your project is faster and cheaper than one that has to figure it out. The interesting question is not whether that is true. This study shows it is. The interesting question is who in your organisation is responsible for making sure the agent knows.
Published by Agents Applied. Research coverage for executives navigating the operational realities of AI deployment.