Skills overview
A skill is a Python package that exposes one or more operations an agent can call. Operations are typed (Pydantic in, Pydantic out), credentialed (secrets resolved from the vault, never passed by the LLM), and surfaced in the Maestro UI as a catalog the user can browse, configure, and test.
This page is for operators wanting to understand what skills are and authors wanting to write one. For the conceptual split between deterministic tools and LLM-backed reasoning, read Skills and tools first.
What ships in v1
| Skill | Operations | Notes |
|---|---|---|
http | get, post | Built-in escape hatch for ad-hoc HTTP calls. |
web-research | search, extract | Tavily-backed. See Web research. |
gmail | list_inbox, read_thread, send_email, label_thread | OAuth flow. See Gmail. |
apollo | find_leads, enrich_person, enrich_domain | API-key auth. See Apollo. |
compose | draft_personalized_opener, classify_reply_intent | LLM-backed. See Compose. |
pipeline | find_contact, add_contact, log_activity | Agent reads + writes against the Pipeline. See Pipeline. |
notify | send_event, send_attention | In-app notifications to the operator. See Notify. |
The Maestro UI’s Skills and Tools catalogs read from the same registry, grouped differently.
Where skills live in the repo
skills/
├── sdk/
│ └── src/maestro_skills/
│ ├── __init__.py # @skill, @operation
│ ├── registry.py # Discovery from a directory of manifests
│ ├── secrets.py # EnvSecretStore + DbSecretStore + ChainStore
│ └── transport.py # Shared HTTP transport (auth, retry, redaction)
└── catalog/
├── http/
│ ├── skill.toml
│ └── http.py
├── web-research/
│ ├── skill.toml
│ └── tavily.py
└── gmail/
├── skill.toml
└── gmail.py
The runtime calls Registry.from_directory("skills/catalog") at boot. Every directory containing a skill.toml becomes a registered skill.
The manifest
# skill.toml
name = "gmail"
version = "0.1.0"
description = "Read, send, and label messages in a connected Gmail inbox."
icon = "mail"
class = "gmail.Gmail"
# Concurrency limit — the runtime caps simultaneous calls per skill.
# Use a low number for rate-limited APIs.
concurrency = 4
# Secrets the skill needs. Names are how skill code looks them up.
[[secrets]]
name = "google_oauth_client_id"
kind = "string"
description = "Google Cloud OAuth client ID"
[[secrets]]
name = "google_oauth_client_secret"
kind = "api_key"
description = "Google Cloud OAuth client secret"
[[secrets]]
name = "gmail_oauth"
kind = "oauth2"
description = "Per-user OAuth token bundle (created by the Connect Gmail flow)"
Manifests are the source of truth for what a skill needs. The Skills UI reads them to show secret status (green when configured, amber when missing) and to gate the Connect / Test buttons.
Writing an operation
from maestro_skills import skill, operation, Secrets
from pydantic import BaseModel, Field
class ListInboxIn(BaseModel):
query: str = Field("", description="Gmail search syntax, e.g. 'is:unread newer_than:7d'")
max_results: int = Field(25, ge=1, le=100)
class ThreadSummary(BaseModel):
id: str
subject: str
sender: str
snippet: str
received_at: int # epoch ms
@skill(name="gmail", version="0.1.0")
class Gmail:
secrets: Secrets
@operation(id="list_inbox", kind="tool", description="Recent threads matching `query`.")
async def list_inbox(self, input: ListInboxIn) -> list[ThreadSummary]:
token = await self.secrets.require_oauth("gmail_oauth")
# ... call Gmail API, return results
What the SDK does with this:
- Generates JSON Schema from
ListInboxInandThreadSummary. The agent gets the input schema in the Anthropictoolsarray; the output schema is published in the catalog. - Resolves secrets —
require_oauthreturns the current OAuth bundle and triggers refresh-on-401 transparently. - Records the call — the run timeline gets a step with
kind="skill_op",tool="gmail", and the input/output payloads (with secrets redacted). - Enforces concurrency — at most
concurrencysimultaneous calls to this skill, queueing the rest.
Operation kinds
@operation(id="...", kind="tool") # Deterministic. Default.
@operation(id="...", kind="llm") # LLM-backed. Surfaces a different pill in the run timeline.
Use kind="llm" whenever the operation calls a language model. The annotation is how the UI distinguishes cost categories — accurate labeling makes run timelines easy to read at a glance.
Testing a skill
In development:
# Unit test the operation directly
pytest skills/catalog/gmail/tests/
# Test against the live API from the Maestro UI
# Skills → Gmail → Test → list_inbox
The Test modal in the UI lets you fill out the operation’s input form and inspect the response. The call goes through the real runtime, real secrets, real API — useful for confirming credentials are wired up before you point an agent at it.
Concurrency, rate limits, and retries
The shared HttpTransport in the SDK handles:
- Per-skill concurrency caps (from the manifest’s
concurrencyfield). - Exponential backoff on
429and 5xx, capped at 60s. - Auth strategies —
api_key(header),bearer(header),basic(header),oauth2(header + refresh-on-401). - Secret redaction in request/response logs.
- Per-host rate limits declared in the skill code (e.g. Apollo: 50 req/min).
Skills that use HttpTransport get all of this for free. Skills that talk to non-HTTP services (gRPC, S3, etc.) are responsible for their own equivalent.
Discovery and reload
The registry is built at runtime startup. Adding or modifying a skill requires restarting the Python runtime — there is no hot-reload in v1. (The TS API and web app do hot-reload, just not the runtime.) For development, uv run python -m maestro_runtime re-reads the catalog every restart, so iteration is fast.
What goes in description fields
The description on the manifest, the operation, and the input fields is read by the LLM agent at runtime. It’s not just docs — it’s how the model decides whether to use this operation and what to pass to it.
Good descriptions:
- “Recent inbox threads matching
query. Use Gmail search syntax —is:unread newer_than:7dfor new replies,from:[email protected]for bounce messages.” - “Apply or remove a Gmail label. Use
Maestro/repliedto mark threads handled by reply-triage; the label is created automatically if absent.”
Vague descriptions:
- “Lists inbox.”
- “Apply label.”
The model will try operations whose descriptions sound relevant. Be specific so it picks the right one.
Related
- Gmail — full skill walkthrough including OAuth.
- Web research — simpler example (just an API key).
- Skills and tools — the conceptual split.
- Secrets — how secrets are encrypted and resolved.