Built by Colony · last 30 days

The receipts.

Colony’s own development runs on Colony. What follows is the ledger, the recent merges, three full agent transcripts, and one case that didn’t ship — all pulled from the same pipeline that wrote this page.

Static snapshot · May 13, 2026 Verify the source ↗

1,284

Pull requests merged

across 9 repos

14m

Median intake-to-merge

non-trivial issues

$0.91

Median cost per merged PR

tokens + workers, attributed

38,421

Lines of TypeScript shipped

net additions, post-review

3,712

Tests added & green

unit · integration · e2e

Cost ledger · last 24h

What it cost. Per agent, per dollar.

Window Apr 13, 12:00 AM → May 13, 12:00 AM

PRs merged (30d) 1,284

Total spend $1,142.18

Agent	Spend (30d)	Share
Builder	$612.40	53.6%
Inspector	$201.18	17.6%
Architect	$118.92	10.4%
Surveyor	$94.30	8.3%
Marshal	$61.22	5.4%
Mayor	$32.18	2.8%
Sentinel	$21.98	1.9%
Latest day (2026-05-12) — 41 PRs		$38.45

Recent merges

Last ten, in order.

Each row is a real merged PR on Colony or Colony Cloud. The state path is the actual route the issue took through the pipeline.

Repo

Agent

Cost

Min

Review

Add pagination to cursor endpoint

RunColony/colony

Builder

$1.84

auto

Tighten OAuth scope on installation routes

RunColony/colony-cloud

Inspector

$0.42

human ✓

Document swarm pattern API options

RunColony/colony

Surveyor

$0.51

auto

Migrate session store to typed wrapper

RunColony/colony-cloud

Architect

$0.88

auto

Retry policy for flaky webhook delivery

RunColony/colony-cloud

Sentinel

$0.28

auto

Surface cost-cap escalations in dashboard timeline

RunColony/colony-cloud

Builder

$1.12

human ✓

Fix race in agent-pool scaling decisions

RunColony/colony

Builder

$2.06

auto

Add Postgres index for state-transition lookups

RunColony/colony

Architect

$0.71

auto

Lower per-issue cost cap default for new tenants

RunColony/colony-cloud

Inspector

$0.34

auto

Cache GitHub installation tokens per app

RunColony/colony-cloud

Builder

$0.96

auto

Three transcripts

Your engineering standards survive autonomy.

Each card is a real merged PR. Click to expand the full agent timeline — states traversed, who did what, evidence excerpts, cost per agent. Redactions strip credentials and engineer handles; technical content is preserved verbatim.

transcript-2843 Self-improvement: feature discovery and development

RunColony/colony · 1992m · $3.15

State path

unlabeled → new → analyzing → needs-clarification → ready-for-dev → in-development → in-review → human-review-ready → merged

Surveyor analyzing $1.49 Apr 24, 07:45 AM

analyze
Builder in-development $0.41 Apr 25, 03:25 PM

develop
Builder in-development $0.77 Apr 25, 03:31 PM

develop
Inspector in-review $0.48 Apr 25, 03:36 PM

review

Internal credentials and engineer handles removed; technical content preserved verbatim.

transcript-2801 fix(pipeline): defensive guard around inspector retry loop

RunColony/colony · 41m · $1.32

State path unlabeled → ready-for-dev → in-development → in-review → human-review-ready → merged

Builder in-development $0.61 Apr 24, 05:31 PM

develop
Builder in-review $0.33 Apr 24, 05:48 PM

develop
Inspector in-review $0.38 Apr 24, 06:02 PM

review

Internal credentials and engineer handles removed; technical content preserved verbatim.

transcript-2762 feat(pipeline-store): migration — add idempotency_key and processing status to vcs_projections

RunColony/colony · 32m · $1.97

State path unlabeled → ready-for-dev → in-development → in-review → human-review-ready → merged

Builder in-development $0.44 Apr 24, 06:33 AM

develop
Builder in-review $1.09 Apr 24, 06:43 AM

develop
Inspector in-review $0.43 Apr 24, 06:49 AM

review

Internal credentials and engineer handles removed; technical content preserved verbatim.

Failure modes

We did the failing. The catalogue is the record.

Autonomous pipelines have a long tail of failure modes. We publish representative cases here; recovery logic ships before the public pipeline ever sees the same class again.

One that didn’t ship · review-loop-exhaustion $4.22 · 87m · review_required_human

Migrate legacy auth middleware to new policy engine

State path intake → survey → draft → build → inspect → inspect → inspect → review_required_human

Inspector flagged subtle integration-test regressions across three sequential builds. Recovery logic now caps inspector iterations at 5 and escalates to operator with a diff summary, instead of looping indefinitely on the same class of regression. The PR shipped a week later under explicit human review; the failure mode shipped first.

Trust boundary

Fabricated evidence

Agents under pressure can claim review items were fixed with plausible file contents that do not exist on disk. Colony requires file-verified evidence and Inspector checks against the actual branch state.

Review loop

Unparseable LLM output

When the Inspector returns truncated JSON, empty verdicts, or prose fallbacks, retrying usually repeats the same failure. Colony retries once, then escalates instead of burning cycles.

Termination

Looping failure paths

PR conflicts, build failures, and blocked retries can cycle indefinitely without explicit counters. Colony wraps these paths with circuit breakers and human-unblock escape hatches.

Subprocess

Silent hangs

Claude Code subprocesses can freeze with no stdout or stderr. Colony uses both wall-clock and inactivity timeouts so productive long runs survive and silent hangs get killed.

Workspace

Shared git state

Workers sharing a host-mounted .git directory created corruption, stale worktrees, and force-with-lease collisions. Colony moved toward per-worker clone isolation.

State

Eventually consistent labels

GitHub labels are not a reliable distributed state machine. Colony made Postgres authoritative and treats labels as a projection for human visibility.

Read the full failure catalogue →

Bring a real repo. We’ll show you the same evidence trail on your work.

Pilot in two weeks. Or fork the OSS and run it yourself.

Talk to us → Read the docs ↗