Predictable.
Colony has been running its own development for months. 2,500+ pull requests merged. 250,000+ lines of TypeScript shipped. 10,000+ tests, green.
See how that works →Colony is the resident pipeline. Seven specialized agents handle intake through merge — with a deterministic state machine, a per-issue cost ledger, and an audit trail for every decision. Built by itself, in production.
Most engineering work is orchestration, not authorship.
The pipeline that ships this site, ledgered. Numbers refresh from the Cloud audit trail.
Four failure modes the demo never shows. They compound. They get worse with scale.
Sandboxed agents ship code that fails on the third pull request — flaky containers, fabricated evidence in review, subprocess hangs that burn cost in silence.
Without per-issue cost attribution and an audit trail, “the AI agents shipped these PRs” turns into a CFO question with no defensible answer.
Required reviews, branch protection, the cultural norm of not bypassing the human gate — merge discipline does not survive autonomy by default. It has to be engineered in.
Review-loop exhaustion. Fabricated test runs. Subprocess hangs. Cost-cap escapes. Eight categories of failure catalogued from running our own pipeline — each one encoded as recovery logic before it ships. Read the catalogue →
Not a copilot, not a task runner.
A resident orchestration layer with a state machine, a ledger, and an audit trail. Issues come in. Merged pull requests come out. Every decision in between is recorded.
Thirteen states from intake through monitoring. Postgres is the source of truth; GitHub labels are a projection of it, not the other way around.
Mayor, Surveyor, Architect, Builder, Inspector, Marshal, Sentinel. Each has one job. The pipeline is the integration.
Every prompt, every tool call, every dollar attributed to the issue, the agent, and the tenant. Defensible spend, by construction.
Every prompt, every response, every state transition retained for the lifetime of the issue. Not a log file — a record.
Colony has been running its own development for months. 2,500+ pull requests merged. 250,000+ lines of TypeScript shipped. 10,000+ tests, green.
See how that works →Per-issue, per-agent, per-tenant attribution. When someone asks why AI spend is what it is, the answer is in the ledger, not in someone’s head.
Observability →Inspector agent enforces evidence requirements. Marshal agent respects branch protection. The human-defined gate stays human-defined. Your engineering standards survive autonomy.
Governance →We don’t ship a demo and run our own development on something else. Colony’s pipeline does Colony’s engineering — issues filed, picked up by the Mayor, surveyed, drafted, built, inspected, merged. Same agents you’d run.
What you’re looking at on this site was shipped by it. The audit trail is open to pilot teams.
Read the case study →Three modes — no lock-in, the pipeline is the same. Your code stays where it is. See the full comparison →
Open-source core under AGPL-3.0. The source is public. The pipeline is yours to operate, audit, and extend.
Managed tenancy today. Hybrid tenancy enters public preview Q3 2026 for teams with stricter data residency. Per-tenant ledger and audit, your dashboard, your branch protection.
You supply the requirements; we deliver the code. A managed engagement where Colony’s team and pipeline operate against your backlog, with weekly delivery cadence and no operational lift on your side.
Colony is the orchestration layer. It picks issues out of a backlog, decides what work to take, runs analysis and planning before code is written, executes the implementation, reviews the result against your conventions, and merges it through your branch protection. The model that drives the developer agent is swappable — Claude, GPT, or whatever your enterprise procurement has already approved. The orchestration is the product; the model is a dependency.
Colony expands your team; it does not replace it. The right question isn’t “how many developers can we eliminate” — it’s “what can this team accomplish with the pipeline’s capacity added to its own.”
In practice, teams that deploy Colony successfully don’t reduce headcount. They redirect developer capacity toward work that requires human judgment — architecture, technical strategy, complex problem-solving, exploratory and creative work that Colony handles poorly. Teams that try to use Colony to justify headcount reduction typically see quality degrade as the human review layer thins.
Three configurable controls: per-issue cost caps that escalate to a human when an issue exceeds budget; an automerge threshold that defines how confidently Colony has to act before merging without human approval; and human-review-required labels that route any matching issue through mandatory human review. Start conservative and loosen as the escape rate proves out.
The Inspector agent enforces evidence requirements — tests must run, conventions must hold, the diff must justify itself. The Marshal agent respects branch protection: required reviews, status checks, code-owner approvals. Nothing reaches main without crossing the gate you configured.
Three options. Colony Cloud: managed worker pools in our infrastructure, single-tenant per customer. Hybrid tenancy (recommended for enterprise): the orchestration engine runs in our cloud, but the agent workers that read and write your code run inside your own cloud — your AWS, your Azure AI Foundry instance, your existing model contracts. Source code never transits Colony’s managed infrastructure. OSS: run the whole thing yourself.
Either way, we don’t train on your code. Security & trust has the long version.
Pilot-specific questions — week one, measurement, loose specs — live on the pilot page. Conventions on governance. Audit and compliance on security & trust.
Most engineering work is orchestration. We should talk about yours.