Pilot · invitation

Run Colony against a slice of your work.

Proof Pilot — 6 weeks, $35k flat-fee. One repo, one team, narrow scope. You give us the backlog; we come back with a scoped reply in two business days. Teams that need embedded deployment, custom integration, or multi-repo work from day one run Production Pilots, scoped per engagement starting at $100k+.

What you’re buying

Three things. Line-itemed.

AI staff augmentation that buys autonomous-pipeline output.

Twenty-four hours a day, against the repositories you give it access to. The pipeline picks up issues, writes the code, reviews it, merges PRs. You pay for output.

Fixed fee, transparent line items.

The Proof Pilot is $35k flat for six weeks. No T&M surprises; worker capacity, platform tier, and managed services are priced up front. LLM tokens pass through at cost plus a small markup — visible in the same cost ledger your team sees. Production Pilots are scoped per engagement.

No lock-in.

No automatic renewal. Time-limited by design. At the end you have three paths: convert to annual, continue month-to-month, or walk away. Code produced is yours under work-for-hire.

Pilot SLO — what we’d commit to

Four ranges, written into the contract.

These are the commitment ranges we’d put into a pilot success-criterion sheet, verified against the six-metric basket with your own pre-Colony baselines. Bracketed, not single-point — the shape of your backlog moves the number inside the range.

Throughput — 2–4×

Issues closed per week, same team size, vs. your pre-Colony baseline. Measured weekly per active repo.

Cycle time — down 40–60%

Median time from issue filed to PR merged on Colony-handled work, by end of month two.

Agent attribution — 60–80%

Share of issues handled end-to-end without human steering, by end of month two.

Escape rate — ≤ baseline

Defects from Colony work at or under your pre-Colony defect rate by month three.

Cost per issue and review cycles round out the basket. We track them, we report them, we don’t commit to a numeric range because both are heavily shaped by your stack.

Who fits cleanly

Two shapes of team.

If your team looks like neither, talk to us anyway — we can often tell quickly whether a pilot will land.

Engineering teams running their own pipeline.

You’re an engineering leader at a small-to-mid team. AI coding tools are already in your daily flow; you’ve hit the ceiling of one-at-a-time assistance. You want implementation backlog progressing in parallel. For teams →

Platform and integrator teams building on Colony.

You’re a platform engineering group or an integrator shop wrapping Colony into a product or an internal service. You want orchestration as substrate, not a copilot. For platforms →

Questions teams ask before they sign

How a pilot actually runs.

Q What does week one look like?

Pick one repository with a single owning team and a well-defined backlog. Designate an Operator (typically a tech lead or architect) to own the conventions file. Write a thin initial conventions file — thin is fine; it grows. Start Colony with conservative risk envelopes: tight cost caps, narrow automerge threshold, human-review-required on compliance-sensitive paths. Hold a 30-minute team orientation. Establish metric visibility (throughput, cycle time, review cycles) from day one. Most pilots are issuing their first PR within a week.

Q Our tickets have always been loose specs. Is that a blocker?

Common starting position; not a blocker. Colony can’t ask a question in a hallway, but the Surveyor agent will request clarification when an issue is underspecified. The fastest fix is LLM-assisted issue drafting — PMs and BAs use Claude Code or Copilot Chat to translate plain-language intent into the five-element format the pipeline needs (context, requirements, acceptance criteria, test criteria, file references). Teams that adopt this workflow get clarification rates near zero within two weeks.

Q How do we measure whether it’s working?

Six metrics, read together: throughput (issues closed per week), cycle time (filed to merged), cost per issue, escape rate (defects that originated in Colony work), agent attribution (% handled end-to-end without human steering), and review cycles (passes before merge). The measurement definitions are documented at the measurement basket; four of the six have committed ranges in the Pilot SLO above.

The single most common mistake: optimizing for one metric in isolation. Throughput up and escape rate up means broken code shipping faster. Read the basket.