Built by Colony · last 30 days

The receipts.

Colony’s own development runs on Colony. What follows is the ledger, the recent merges, three full agent transcripts, and one case that didn’t ship — all pulled from the same pipeline that wrote this page.

Static snapshot · May 13, 2026 Verify the source ↗
1,284
Pull requests merged
across 9 repos
14m
Median intake-to-merge
non-trivial issues
$0.91
Median cost per merged PR
tokens + workers, attributed
38,421
Lines of TypeScript shipped
net additions, post-review
3,712
Tests added & green
unit · integration · e2e
Cost ledger · last 24h

What it cost. Per agent, per dollar.

Window Apr 13, 12:00 AM → May 13, 12:00 AM
PRs merged (30d) 1,284
Total spend $1,142.18
AgentSpend (30d)Share
Builder $612.40 53.6%
Inspector $201.18 17.6%
Architect $118.92 10.4%
Surveyor $94.30 8.3%
Marshal $61.22 5.4%
Mayor $32.18 2.8%
Sentinel $21.98 1.9%
Latest day (2026-05-12) — 41 PRs $38.45
Recent merges

Last ten, in order.

Each row is a real merged PR on Colony or Colony Cloud. The state path is the actual route the issue took through the pipeline.

PR
Repo
Agent
Cost
Min
Review
Add pagination to cursor endpoint
RunColony/colony
Builder
$1.84
12
auto
Tighten OAuth scope on installation routes
RunColony/colony-cloud
Inspector
$0.42
7
human ✓
Document swarm pattern API options
RunColony/colony
Surveyor
$0.51
9
auto
Migrate session store to typed wrapper
RunColony/colony-cloud
Architect
$0.88
15
auto
Retry policy for flaky webhook delivery
RunColony/colony-cloud
Sentinel
$0.28
6
auto
Surface cost-cap escalations in dashboard timeline
RunColony/colony-cloud
Builder
$1.12
14
human ✓
Fix race in agent-pool scaling decisions
RunColony/colony
Builder
$2.06
19
auto
Add Postgres index for state-transition lookups
RunColony/colony
Architect
$0.71
10
auto
Lower per-issue cost cap default for new tenants
RunColony/colony-cloud
Inspector
$0.34
5
auto
Cache GitHub installation tokens per app
RunColony/colony-cloud
Builder
$0.96
13
auto
Three transcripts

Your engineering standards survive autonomy.

Each card is a real merged PR. Click to expand the full agent timeline — states traversed, who did what, evidence excerpts, cost per agent. Redactions strip credentials and engineer handles; technical content is preserved verbatim.

transcript-2843 Self-improvement: feature discovery and development
RunColony/colony · 1992m · $3.15
State path unlabeled → new → analyzing → needs-clarification → ready-for-dev → in-development → in-review → human-review-ready → merged
  1. Surveyor analyzing $1.49
    analyze
  2. Builder in-development $0.41
    develop
  3. Builder in-development $0.77
    develop
  4. Inspector in-review $0.48
    review

Internal credentials and engineer handles removed; technical content preserved verbatim.

transcript-2801 fix(pipeline): defensive guard around inspector retry loop
RunColony/colony · 41m · $1.32
State path unlabeled → ready-for-dev → in-development → in-review → human-review-ready → merged
  1. Builder in-development $0.61
    develop
  2. Builder in-review $0.33
    develop
  3. Inspector in-review $0.38
    review

Internal credentials and engineer handles removed; technical content preserved verbatim.

transcript-2762 feat(pipeline-store): migration — add idempotency_key and processing status to vcs_projections
RunColony/colony · 32m · $1.97
State path unlabeled → ready-for-dev → in-development → in-review → human-review-ready → merged
  1. Builder in-development $0.44
    develop
  2. Builder in-review $1.09
    develop
  3. Inspector in-review $0.43
    review

Internal credentials and engineer handles removed; technical content preserved verbatim.

Failure modes

We did the failing. The catalogue is the record.

Autonomous pipelines have a long tail of failure modes. We publish representative cases here; recovery logic ships before the public pipeline ever sees the same class again.

Trust boundary

Fabricated evidence

Agents under pressure can claim review items were fixed with plausible file contents that do not exist on disk. Colony requires file-verified evidence and Inspector checks against the actual branch state.

Review loop

Unparseable LLM output

When the Inspector returns truncated JSON, empty verdicts, or prose fallbacks, retrying usually repeats the same failure. Colony retries once, then escalates instead of burning cycles.

Termination

Looping failure paths

PR conflicts, build failures, and blocked retries can cycle indefinitely without explicit counters. Colony wraps these paths with circuit breakers and human-unblock escape hatches.

Subprocess

Silent hangs

Claude Code subprocesses can freeze with no stdout or stderr. Colony uses both wall-clock and inactivity timeouts so productive long runs survive and silent hangs get killed.

Workspace

Shared git state

Workers sharing a host-mounted .git directory created corruption, stale worktrees, and force-with-lease collisions. Colony moved toward per-worker clone isolation.

State

Eventually consistent labels

GitHub labels are not a reliable distributed state machine. Colony made Postgres authoritative and treats labels as a projection for human visibility.

Bring a real repo. We’ll show you the same evidence trail on your work.

Pilot in two weeks. Or fork the OSS and run it yourself.