A live exploration of agent-first development — based on a real Go repo, a real runner, and real failure modes. Nothing is settled yet.
01The three-repo topology — TASK, TARGET, LOG
02What kind of agent are you actually using?
03The adversarial problem — two agents, one PR
04Context window: what do you put in?
05Web search is a liability — or is it?
06Subagents: isolated windows, ephemeral state
07Quick-fire: test yourself
01 — topology
Three Repos. One Sandbox Account.
task repo · control
spec-socialpredict-tasks
/ (orchestration)
├── AGENTS.md
├── TASKS.json
├── codex-runner.sh
├── defaults.env
├── .codex/
│ └── agents, profiles
├── .codex-runs/
│ ├── events/*.ndjson
│ ├── messages/*.txt
│ ├── context/*.ndjson
│ └── RUNLOG.ndjson
└── scripts/
Runner polls TASKS.json. Picks next ready task by dependency order. Dispatches a Codex session. Monitors context every 15s. Checkpoints at 70%, resumes with session ID. Writes all artifacts here.
target repo · fork
socialpredict
/ (fork of real repo)
├── backend/
│ └── scripts/
│ └── guardrails.sh
├── .codex-reports/
│ └── tasks/
│ └── SP-001/
│ ├── meta.json
│ ├── summary.json
│ ├── conversation.ndjson
│ └── decisions.ndjson
└── AGENTS.md
Code changes happen here. Agent opens PRs against this fork. Guardrails run pre-commit. Reports written to .codex-reports/. Human reviews and either merges or closes entirely.
sandbox account
pwdel-auto
Separate GitHub identity
Not your main account.
All automated commits,
PRs, pushes, and
reviews come from
this identity only.
Naming pattern:
username-auto
username-harness
username-bot
Forks the real repo here.
The human in the loop is exactly one thing: PR review. You see the diff. You approve or you close it completely. Closing means thrown out — no partial merges, no "fix it later." The task goes back to pending.
02 — agent types
What Kind of Agent Are You Actually Using?
type 01
Gate
Makes or blocks a decision. Pass/fail criteria layered on top of policy. Blocker / high / follow-up severity levels. Work cannot proceed until gates clear.
require 2 reviews when backend/ changes
type 02
Policy
Follows explicit rules. Naming patterns, required artifacts, concrete triggers. Good for consistency and repo-specific discipline. Predictable but rigid.
all handlers must have OpenAPI annotations
type 03
Heuristic
Use good judgement. Leaves things to the LLM. Flexible where rules are hard to specify — but can drift, hallucinate preferences, or conflict silently with policy agents.
prefer idiomatic Go patterns where applicable
Click a card to explore each type. In the socialpredict harness, many agents blend all three orientations — the interesting question is which type is dominant and whether that matches your intent.
03 — the adversarial problem
Two Agents. Same PR. No Coordination.
policy + gate · non-flexible
Go Best Practices Agent
cyclomatic complexity: 12 → FAIL (max 10)
go vet: PASS
function length: 87 lines → FAIL (max 60)
SOLID: interface segregation OK
heuristic · style-oriented
Go Style Guide Agent
readability: clear and idiomatic
naming conventions: OK
complexity: "acceptable for this domain"
error handling: idiomatic
Both agents reviewed the same function. Agent A flags cyclomatic complexity as a hard blocker. Agent B calls it acceptable for the domain context. Neither agent knows the other exists. How do you resolve this?
Choose how the human resolves this conflict. None of these options are obviously correct — that's the point.
04 — context window
The Context Budget. What Do You Put In?
total context used0%
balanced
pre-compaction frequency
neveraggressive
context saved
0%
reasoning quality
100%
soft-threshold: 60% → wrap up
hard-threshold: 70% → SIGTERM
→ session_id stored → resume
context poll: every 15s
Adjust the sliders to see how different inputs compete for the same limited window. In the real runner, once you cross 70% the session is interrupted and resumed from a checkpoint.
05 — information sources
Web Search Is A Liability.
Or a tool. Depends entirely on whether you control what goes in. Explore the trust spectrum.
Open web search under an agent framework introduces non-determinism — results change run-to-run, content can be adversarial, and sources lie. In this harness: web search is off by default. Anything that goes in must be on an explicit allowlist. For Go style rules: scrape the official guide once, verify it, version it in a knowledge base, and never touch the open web again.
pick a query type:
Select a query type to see the recommended source.
06 — subagents and parallel context windows
Every Agent Gets Its Own Window.
Each Codex or Claude agent runs in an isolated environment with its own full context budget. Spawning subagents doesn't share your main window — it opens new ones. When an agent shuts down, its window is cleared. So how do you keep state alive?
main dispatcher
dispatcher_agent
Context: task prompt + AGENTS.md + summary.json
window:
45%
specialist A
go-lint agent
Fresh window. Loads only what it needs.
window:
30%
specialist B
test-runner agent
Separate process. Own token budget.
window:
55%
↑ These three agents run in parallel. None share state. Each closes and clears when done.
how to keep an agent alive / persist state
approach 01
Run persistently
Host as a long-running process. Supervisor restarts on crash.
approach 02 · recommended
Serialize state externally
Checkpoint to storage. Reload on restart with summarization.
approach 03
Retrieval-augmented rehydration
Embeddings in a vector DB. Fetch only what's relevant.
approach 04
Keep-alive / heartbeat
Periodic pings prevent idle shutdown on supported platforms.
Click a strategy to explore it. The key insight: context windows are ephemeral by default. Any state you want to survive shutdown must be explicitly serialized somewhere outside the agent process.