Autonomous development pipeline

The orchestration layer for AI coding agents.

Colony is the resident pipeline. Seven specialized agents handle intake through merge — with a deterministic state machine, a per-issue cost ledger, and an audit trail for every decision. Built by itself, in production.

Talk to us → Read the receipts

Source available Self-hosted core

Cloud Managed tenancy · invitation

Most engineering work is orchestration, not authorship.

Built by Colony, on Colony · last 30 days

The pipeline that ships this site, ledgered. Numbers refresh from the Cloud audit trail.

1,118

Pull requests merged

across the Colony pipeline

99m

Median intake-to-merge

non-trivial issues

$5.05

Median cost per merged PR

tokens + workers, attributed

Read the breakdown — Colony builds Colony →

The field

Agents are powerful. Production is where they fall over.

Four failure modes the demo never shows. They compound. They get worse with scale.

Demos are not production.

Sandboxed agents ship code that fails on the third pull request — flaky containers, fabricated evidence in review, subprocess hangs that burn cost in silence.

You can’t see what it cost.

Without per-issue cost attribution and an audit trail, “the AI agents shipped these PRs” turns into a CFO question with no defensible answer.

“Merged” is not “shipped.”

Required reviews, branch protection, the cultural norm of not bypassing the human gate — merge discipline does not survive autonomy by default. It has to be engineered in.

The long tail kills demos.

Review-loop exhaustion. Fabricated test runs. Subprocess hangs. Cost-cap escapes. Eight categories of failure catalogued from running our own pipeline — each one encoded as recovery logic before it ships. Read the catalogue →

The mechanism

A pipeline that behaves like infrastructure.

Not a copilot, not a task runner.

A resident orchestration layer with a state machine, a ledger, and an audit trail. Issues come in. Merged pull requests come out. Every decision in between is recorded.

State machine.

Thirteen states from intake through monitoring. Postgres is the source of truth; GitHub labels are a projection of it, not the other way around.

Specialized agents.

Mayor, Surveyor, Architect, Builder, Inspector, Marshal, Sentinel. Each has one job. The pipeline is the integration.

Per-issue cost ledger.

Every prompt, every tool call, every dollar attributed to the issue, the agent, and the tenant. Defensible spend, by construction.

Full audit trail.

Every prompt, every response, every state transition retained for the lifetime of the issue. Not a log file — a record.

What it produces

Throughput you can plan around. Spend you can defend. Discipline that survives autonomy.

throughput

Predictable.

Colony has been running its own development for months. 2,500+ pull requests merged. 250,000+ lines of TypeScript shipped. 10,000+ tests, green.

See how that works →

spend

Defensible.

Per-issue, per-agent, per-tenant attribution. When someone asks why AI spend is what it is, the answer is in the ledger, not in someone’s head.

Observability →

discipline

Engineered.

Inspector agent enforces evidence requirements. Marshal agent respects branch protection. The human-defined gate stays human-defined. Your engineering standards survive autonomy.

Governance →

A claim we can audit

Colony builds Colony.

We don’t ship a demo and run our own development on something else. Colony’s pipeline does Colony’s engineering — issues filed, picked up by the Mayor, surveyed, drafted, built, inspected, merged. Same agents you’d run.

What you’re looking at on this site was shipped by it. The audit trail is open to pilot teams.

Read the case study →

cost_ledger · recent merges

3 repos · 3 tenants

PR	Agent	Dur.	$
Ledger runbook documents two instructions that don't work	Builder	57m	$2.72
Enforce the repository budget cap, nested inside the tenant cap, at both poll guard sites; update docs	Builder	2h	$16.62
docs: Replace "AGPL" with "source available"	Builder	21d	$13.80
Separate the three questions the label layer asks through one boolean: introduce LabelVocabulary	Builder	4h	$15.13
Add per-repo monthly cost aggregation and cross-process seeding (getRepoMonthlyCost + sprint-master repo_spend on /health)	Builder	46m	$7.11
Latest day (2026-08-01)			$55.38

Deployment

Self-host the core, or let us run it.

Three modes — no lock-in, the pipeline is the same. Your code stays where it is. See the full comparison →

Colony Self-Hosted

Run it yourself.

Source-available core. The pipeline is yours to operate, audit, and extend once you have access.

Full pipeline source
Postgres state machine
Community Discord
Self-managed upgrades

Colony Cloud

Let us operate.

Managed tenancy today. Hybrid tenancy enters public preview Q3 2026 for teams with stricter data residency. Per-tenant ledger and audit, your dashboard, your branch protection.

Managed worker pools
Per-tenant audit trail
Per-tenant cost ledger
Pilot in two weeks

Talk to us about a pilot →

Colony-as-a-Service

Outcomes, white-glove.

You supply the requirements; we deliver the code. A managed engagement where Colony’s team and pipeline operate against your backlog, with weekly delivery cadence and no operational lift on your side.

Requirements intake & refinement
Weekly PR delivery cadence
Same cost ledger, your visibility
Scoped per engagement

Talk to us about CaaS →

Questions worth answering

Things you’re probably wondering.

Is this just a wrapper around GitHub Copilot / Claude Code / Cursor?

Colony is the orchestration layer. It picks issues out of a backlog, decides what work to take, runs analysis and planning before code is written, executes the implementation, reviews the result against your conventions, and merges it through your branch protection. The model that drives the developer agent is swappable — Claude, GPT, or whatever your enterprise procurement has already approved. The orchestration is the product; the model is a dependency.

Does this replace developers?

Colony expands your team; it does not replace it. The right question isn’t “how many developers can we eliminate” — it’s “what can this team accomplish with the pipeline’s capacity added to its own.”

In practice, teams that deploy Colony successfully don’t reduce headcount. They redirect developer capacity toward work that requires human judgment — architecture, technical strategy, complex problem-solving, exploratory and creative work that Colony handles poorly. Teams that try to use Colony to justify headcount reduction typically see quality degrade as the human review layer thins.

What stops the agents from doing something we don’t want?

Three configurable controls: per-issue cost caps that escalate to a human when an issue exceeds budget; an automerge threshold that defines how confidently Colony has to act before merging without human approval; and human-review-required labels that route any matching issue through mandatory human review. Start conservative and loosen as the escape rate proves out.

The Inspector agent enforces evidence requirements — tests must run, conventions must hold, the diff must justify itself. The Marshal agent respects branch protection: required reviews, status checks, code-owner approvals. Nothing reaches main without crossing the gate you configured.

Where does our source code actually run?

Three options. Colony Cloud: managed worker pools in our infrastructure, single-tenant per customer. Hybrid tenancy (recommended for enterprise): the orchestration engine runs in our cloud, but the agent workers that read and write your code run inside your own cloud — your AWS, your Azure AI Foundry instance, your existing model contracts. Source code never transits Colony’s managed infrastructure. Self-hosted: run the whole thing yourself.

Either way, we don’t train on your code. Security & trust has the long version.

Pilot-specific questions — week one, measurement, loose specs — live on the pilot page. Conventions on governance. Audit and compliance on security & trust.

Most engineering work is orchestration. We should talk about yours.

Talk to us → Read the docs

Beehive Media, LLC · Source available