Your Agents Have Config Rot
Config rot isn't just a server problem anymore. When your AI agent configurations drift silently over weeks, the symptoms look like model hallucination — but the root cause is pure infrastructure neglect.
Config rot is the quiet failure mode that experienced engineers have learned to recognize in production systems. It happens when a configuration file that was intentional and correct at creation time gradually accumulates small, undocumented changes until it no longer reflects technical reality. The config doesn't fail loudly. It fails subtly, in edge cases, in production, on a Friday.
Most developers have experienced config rot in their servers — running a Terraform plan against a long-lived environment and seeing dozens of unexpected diffs, or discovering a security group rule that was quietly removed six months ago.
The same failure mode is now emerging in AI agent configurations. And it's harder to detect because the failure looks like something else entirely.
Why Agent Drift Looks Like Hallucination
When a server's configuration drifts, the server fails. You get a 500, a timeout, or a connection refusal. The signal is clear.
When an AI agent's configuration drifts, the model doesn't fail. It adapts. Claude is a capable model and it will try to be helpful with whatever instructions it currently has.
The core trap: Agent config rot symptoms are blamed on the model, but the cause is in the configuration file.
If the security rules were quietly removed from your code-reviewer agent, Claude will keep performing code reviews — it just won't apply the security criteria anymore. If someone shortened the system prompt by removing the "scope restrictions" paragraph, the model will still review code. It will also start making suggestions in areas it was previously constrained from touching.
A developer notices the agent's output feels inconsistent — sometimes it catches security issues, sometimes it doesn't. They open tickets about model quality. They waste days debugging prompts, not configuration files.
The Three Phases of Configuration Rot
Agent configuration rot typically progresses through three identifiable phases:
| Phase | What happens | Why it's invisible |
|---|---|---|
| 1. Accumulation | Small individual changes — a tool addition here, a description edit there. Git history fills with vague messages like "update agent config" | Each change seems harmless in isolation |
| 2. Incoherence | The accumulated changes produce a configuration nobody specifically chose. Agent has tools that conflict with its rules. Prompt references resources that no longer exist | No single change appears to be the problem |
| 3. Invisible Failure | Configuration is incoherent but still produces output. The developer calibrates expectations to the drifted behavior without realizing it | The drifted state becomes the "baseline" |
The third phase is the most dangerous. Once you've unconsciously accepted the degraded behavior as normal, there's no trigger to investigate. The rot becomes permanent.
Diagnosing Drift with xcaffold status
xcaffold treats agent configurations as compiled artifacts. Your .xcaf source files are the single source of truth, and xcaffold apply compiles them into the provider-native directories (.claude/, .cursor/, .gemini/, .github/, .agents/). The state of every compiled artifact is tracked via SHA-256 hashes in .xcaffold/project.xcaf.state.
xcaffold status compares those recorded hashes against the actual content of the files on disk right now. Any discrepancy — even a single changed character — is flagged as drift.
A clean workspace looks like this:
$ xcaffold status
sandbox · last applied 3 days ago
PROVIDER FILES STATUS
antigravity 28 ✓ synced
claude 90 ✓ synced
copilot 1 ✓ synced
cursor 54 ✓ synced
gemini 55 ✓ synced
Sources 52 .xcaf files · no changes since last apply
✓ All providers are in sync.
Every provider's output matches the compiled state. Nothing has been tampered with since the last xcaffold apply.
Now compare that to a workspace where someone manually edited or deleted files:
$ xcaffold status
sandbox · last applied 3 days ago
PROVIDER FILES STATUS
antigravity 28 ✓ synced
claude 90 ✗ 1 modified
copilot 1 ✓ synced
cursor 54 ✓ synced
gemini 55 ✗ 1 modified
Sources 52 .xcaf files · no changes since last apply
Drift detected in 2 providers:
claude
✗ missing CLAUDE.md (root)
gemini
✗ missing GEMINI.md (root)
→ Run 'xcaffold apply' to restore.
Run 'xcaffold status --target <name>' for details.
Two providers have drifted. The root-level instruction files for Claude and Gemini are missing — perhaps a git clean removed them, or someone deleted them thinking they were unnecessary. Without drift detection, this would surface as a vague degradation in agent behavior across two different tools.
You can drill into a specific provider to see the full picture:
$ xcaffold status --target claude
sandbox · claude · applied 3 days ago
89 synced · 1 modified · 52 sources unchanged
✗ missing CLAUDE.md (root)
Sources 52 .xcaf files · no changes since last apply
→ Run 'xcaffold apply --target claude' to restore.
Run 'xcaffold status --target claude --all' to see all files.
The fix is a single command: xcaffold apply --target claude. The compiled state is restored from the .xcaf sources, and the hash manifest is updated.
What a Healthy Configuration Lifecycle Looks Like
Config rot is not inevitable. It's the result of a process failure: allowing direct modification of artifacts that should be generated, not edited.
With Harness-as-Code, the lifecycle is enforced at the process level:
- All changes flow through
.xcafsource files — never directly to.claude/or.cursor/files xcaffold applycompiles the updated sources into provider-native output directories- CI runs
xcaffold statusafter the apply step to validate the compiled state - Pre-commit hooks can block commits that modify files inside managed output directories
The key insight is that .claude/ should behave like /dist or target/ in a build system — generated output that's never manually edited. The source of truth lives in .xcaf files, and the compiled output is a deterministic artifact of that source.
This separation also makes code review meaningful. When a developer opens a PR, the reviewer can look at the .xcaf diff and understand the intent. Reviewing a raw markdown agent file gives you no baseline — you're reading prose and hoping it looks right.
Config Rot at the Multi-Provider Level
The drift problem compounds when you're compiling to multiple providers. A developer using both Claude Code and Cursor needs aligned agent definitions across .claude/ and .cursor/. Hand-maintaining both directories is error-prone even for a disciplined solo developer — and in practice it never stays clean.
With multi-target compilation, a single .xcaf source generates all provider directories simultaneously. Running xcaffold status confirms all outputs are clean in one pass — every provider is listed in the overview table, and any drift is immediately visible.
If one provider's directory is drifted and the others aren't, you know exactly which target's output was tampered with. The overview table tells you at a glance: Claude has 1 modified file, everything else is synced. You don't need to compare directories manually or write shell scripts to diff file trees.
And because xcaffold status uses exit codes — 0 for clean, 1 for drift — it integrates directly into CI pipelines:
# CI step: fail the build if any provider has drifted
xcaffold status || exit 1
No custom scripting. No regex parsing of output. A single binary check that either passes or fails.
The Terraform Analogy, Completed
If you've used Terraform, this workflow is immediately familiar. Terraform plan shows you what would change. Terraform apply makes the change. Terraform state tracks what exists. If someone manually modifies infrastructure outside Terraform, the next plan reveals the drift.
xcaffold follows the same model for agent configurations:
| Terraform | xcaffold |
|---|---|
.tf files | .xcaf files |
terraform plan | xcaffold status |
terraform apply | xcaffold apply |
terraform.tfstate | .xcaffold/project.xcaf.state |
| Provider resources | .claude/, .cursor/, .gemini/, .github/, .agents/ |
The parallel is not superficial. The same engineering principles that make infrastructure-as-code reliable — declarative sources, compiled artifacts, state tracking, drift detection — apply directly to agent configurations. The only question is whether you adopt them before or after config rot has already degraded your agents.
Config rot is a management problem before it's a tooling problem. You can't prevent people from editing files directly unless the architecture makes that the wrong path to take. xcaffold makes the wrong path obvious by creating a clear separation between source (.xcaf files) and output (.claude/, .cursor/). The status command makes rot immediately visible, and the compilation pipeline makes the correct path — edit the source, run apply — trivially easy.
Ready to try xcaffold? Detect drift in your agent configs today. Run xcaffold status.