B2B SaaS Outbound

The v1 hero score. Two agents in a chain — cold-leads finds and contacts, reply-triage classifies and advances. Both write to the same Pipeline. The Pipelines UI is the operator’s view of what the score is doing in real time.

What the score does

Sunday night you sign up, define your ICP, connect Gmail + Apollo, and pick this score. Monday at 09:00 the cold-leads agent runs. By Wednesday you have replies. By Friday you have a booked demo. The Pipeline shows you exactly what Maestro did, separate from any other system you have.

That’s the bar. The score is a single page on the wall — when it works, this is the case study; when it doesn’t, here’s what to fix.

Agents

cold-leads ──── handoff ───→ reply-triage
   │                              │
   └─────── writes to ────────────┘
                  │
                  ▼
           pipeline pl_saas

Cold leads · SaaS

Runs on cron (default 0 9 * * MON-FRI — 9am ET weekdays). Each run:

Apollo — find_leads matching the ICP (titles, seniorities, headcount, location). Default per_page 10.
Pipeline — find_contact to skip anyone already in the pipeline.
Apollo — enrich_domain for company context (funding, tech stack, headcount).
Compose — draft_personalized_opener grounded in the candidate + company signal.
Pipeline + (Gmail) — see Review mode vs send mode below.
Notify — fire one summary notification (send_attention in review mode, send_event in send mode).

Hard cap: 3 drafts per run in v1. Raise this once the agent has been validated on real prospects for a week. The cap is enforced by the agent’s instructions, not by the runtime.

Skills: apollo, compose, gmail, pipeline, notify.

Review mode vs send mode

The cold-leads agent ships in review mode by default. The “Mode:” line at the top of the agent’s instructions controls which branch the LLM follows:

Review mode (default) — agent drafts the opener, writes the contact at stage ready, logs a note activity with payload.draft: true containing the full subject + body. After the run, fires notify.send_attention (“3 drafts ready for review”). The operator opens each contact’s detail page, reviews/edits the draft inline, clicks Send via Gmail. The send is dispatched by the API (not the agent) via the same Gmail OAuth bundle; the activity flips in place to kind: contacted, the contact’s stage advances to contacted. Operator is in the loop.
Send mode — agent calls gmail.send_email directly during the run, writes the contact at stage contacted, logs a contacted activity with { subject, preview, gmail_message_id, gmail_thread_id }. After the run, fires notify.send_event. No human in the loop.

To switch modes, edit the cold-leads agent’s instructions: replace the “REVIEW branch” block with the “SEND branch” block (the inactive-mode branch is included in the seed instructions for reference). Save. Next run uses the new mode.

The recommended onboarding path: ship review mode for the first ~5–10 successful runs (you build trust in the LLM’s drafts and your ICP filter precision), then flip to send mode once the drafts are consistently good without edits.

Reply triage

Runs on cron (default */30 * * * * — every 30 minutes). Each run:

Gmail — list_inbox for is:unread threads.
For each thread:
- Pipeline — find_contact matching the sender’s email. Skip if not a tracked contact.
- Skip if the contact’s stage is already replied, triaged, booked, disqualified, or unsubscribed (already handled).
- Gmail — read_thread to get the latest message body.
- Compose — classify_reply_intent. Returns one of interested, not_interested, out_of_office, wrong_person, unsubscribe, needs_review.
- Pipeline — log_activity with kind triaged, new_stage mapped from the intent, payload { intent, confidence, explanation }.

Critical constraint: reply-triage does not draft replies. Classification and stage advancement only. The human writes the actual response. This is a deliberate v1 design — keeping the human in the loop on every send protects your sender reputation while we build the voice-modeling and feedback loops needed to draft replies that sound like you. Reply drafting is on the roadmap for v2.

Skills: gmail, compose, pipeline.

Handoff

Cold-leads names reply-triage as its handoff target. In v1, both agents run on independent cron schedules and coordinate through the Pipeline (cold-leads writes contacts; reply-triage reads them). Automatic handoff-on-completion is on the roadmap.

Stage mapping

The score defines this stage progression for every contact:

new → enriching → ready → contacted → replied → triaged → booked
                                                disqualified
                                                unsubscribed

Event	Stage transition	Logged activity
cold-leads sends opener	`new` → `contacted`	`contacted`
reply-triage classifies “interested” or “needs_review”	`contacted` → `replied`	`triaged`
reply-triage classifies “not_interested” / “wrong_person”	`contacted` → `disqualified`	`triaged`
reply-triage classifies “unsubscribe”	`contacted` → `unsubscribed`	`triaged`
reply-triage classifies “out_of_office”	unchanged	none (will reprocess on next run)
operator manually books a meeting	`replied` → `booked`	manual `note` (UI not yet wired)

new, enriching, ready are reserved for richer pre-contact pipelines (e.g. when an enrichment agent fronts the cold-leads agent). v1 cold-leads jumps straight to contacted.

Configuring for your install

The seed agents ship with example sender context and an example ICP. To use the score for your own outreach:

Edit the cold-leads agent instructions (Maestro → Agents → Cold leads · SaaS → Edit). Replace the sender section, the ICP filters, and the bonus signals with your own. The runtime hands these instructions to Claude every iteration; the agent acts on what’s in this text.
Edit the reply-triage agent instructions if you want to customize the intent → stage mapping.
Add the secrets the skills need:
- apollo_api_key (Apollo skill — see docs/skills/apollo.md)
- google_oauth_client_id + google_oauth_client_secret and connect Gmail (see docs/skills/gmail.md)
- ANTHROPIC_API_KEY is configured at install time
Set the agents’ status to running (or scheduled if you want them on cron only) when you’re ready to go live. They ship as idle so configuration happens before any emails go out.

Cost estimate

Per cold-leads run with 3 sends:

Operation	Calls	Approx cost
`apollo.find_leads`	1	covered by Apollo plan
`apollo.enrich_domain`	~3	covered by Apollo plan
`compose.draft_personalized_opener` (Haiku)	3	$0.0015
`gmail.send_email`	3	free (your own Gmail)
`pipeline.*` writes	~7	free (Postgres)
Anthropic agent loop (~25 iterations)	1 run	~$0.05 (Sonnet 4.6)

So ~$0.05 per cold-leads run, dominated by the agent loop’s Sonnet calls. Set MAESTRO_MODEL=claude-haiku-4-5-20251001 to drop this to ~$0.005/run if Haiku reasoning is good enough for the orchestration logic.

Reply-triage runs are smaller (~$0.02 each) because there’s less per-thread work.

At the default schedule (cold-leads weekday mornings, reply-triage every 30 min), monthly Anthropic spend lands around $5–10 before scaling sends.

Verifying the score works

After installing (and with the agent in review mode, the default):

Run cold-leads manually from the dashboard. Watch the run timeline — it should show ~20 LLM steps + tool calls (no Gmail sends in review mode, so a few fewer steps than send mode).
Bell icon should show an unread attention notification — “3 drafts ready for review”.
Click into the SaaS pipeline. Three new contacts should be there at stage ready.
Open one of the contacts. The draft (subject + full body) renders in a brass-bordered review card above the activity timeline. Read it.
Optional: edit the subject or body inline.
Click Send via Gmail. Within ~2 seconds the card disappears, the activity flips to contacted, the contact’s stage advances to contacted, and the email lands in your Gmail Sent folder.
Reply to that email yourself with “yes, would Tuesday at 2pm work?” Run reply-triage manually.
Bell icon pings: “Sarah Chen replied — needs response”.
Pipelines UI — the contact advanced to stage replied with a triaged activity logged.

That’s the full hero-score loop, end-to-end, on real infrastructure with the human review safety net.

After ~5–10 review-mode runs that produce sendable drafts without significant edits, flip the agent to send mode (see Review mode vs send mode above) and runs become fully autonomous.

Out of v1 scope

Auto-drafted replies. Stays human in v1.
Paced sending across hours (the gmail.queue_paced_send operation). v1 sends in-loop during the cold-leads run; pacing across a day requires a persistent send queue + scheduler. Lands in a future release.
Multi-pipeline scores. A single score writes to a single pipeline. To run “Cold leads · Healthcare” alongside SaaS, clone the agents to a second cold-leads-health agent pair.
Handoff auto-triggering. Cold-leads names reply-triage in handoffToAgentId but the runtime doesn’t auto-trigger it on completion. Both agents run on independent crons in v1.
In-app notifications when intent=‘interested’. Activity is logged; the UI doesn’t yet pop a banner. a future release.

Pipelines — the data model the score reads + writes.
Apollo / Compose / Gmail / Pipeline — the four skills the agents use.
Skills and tools — why the LLM-backed compose calls cost more than the deterministic apollo / gmail ones.