Operator Handbook¶
This is the portable, agent-facing version of the ductile operator skill
(skills/ductile/). If you are an AI agent that has been told to operate a
Ductile deployment but cannot load the skill manifest directly, this page
gives you the same substance.
The agent reads it. The human points at it.
Operating frame: the gateway is the supervisor¶
Ductile is built on Armstrong's supervisor model. The gateway:
- Isolates plugins via spawn-per-command — one plugin cannot corrupt another.
- Detects errors from the outside via exit code, stdout JSON, and stderr.
- Restarts without intervention via the queue (at-least-once delivery).
- Hot-upgrades config via
system reloadwithout dropping in-flight work.
As operator, you do not fight the supervisor; you use it.
Reload over debug-in-place¶
When a runtime looks wedged, the default move is reload, not poke. The gateway is designed to be restartable; debugging a stuck process while it holds the PID lock is harder, less informative, and risks corrupting the SQLite WAL.
ductile system reload # SIGHUP, in-process hot swap
# if that does not resolve:
ductile system status # confirm the new generation is alive
# if still wedged, restart the service supervisor (launchd / systemd /
# docker compose / whatever runs ductile on this host)
(For why this is the right discipline, see
reload_rca.md — the reload deadlock RCA is the
canonical example of why hot-swap must be deterministic.)
Runtime context — your deployment¶
Ductile is not opinionated about how you deploy it. Wherever you run it, you'll have:
| What | Default | Typically |
|---|---|---|
| Binary | (built by you) | on $PATH or at a project-local path |
| Config dir | ~/.config/ductile/ |
overridable with --config <dir> or $DUCTILE_CONFIG_DIR |
| State DB | <config-dir>/ductile.db |
SQLite, WAL mode |
| API port | 127.0.0.1:8081 |
from service.api_port in config.yaml |
| Service supervisor | none enforced | launchd, systemd, docker compose, supervisord |
| Auth token | none by default | from tokens.yaml, surfaced via env var of your choice |
Action for the operator setting this up. Build your own runtime-context table for the gateways you operate — instance name, binary path, config dir, DB path, port, service supervisor, auth token env var. Keep it next to your deployment docs, not in this handbook.
CLI command reference¶
Pattern: ductile <noun> <action> [flags].
System¶
ductile system start # Start gateway (foreground)
ductile system status [--json] # Health: PID, state DB, plugins
ductile system reload # Hot-swap config in a running gateway (SIGHUP)
ductile system watch # Real-time TUI monitor
ductile system reset <plugin> # Reset circuit breaker
ductile system skills [--config <dir>] # Export LLM skill manifest (Markdown)
ductile system selfcheck [--json] # Read-only integrity invariants
ductile system backup --to <file.tar.gz> # Atomic snapshot (VACUUM INTO)
ductile system doctor # Startup and runtime health checks
Config¶
ductile config check [--json] [--strict] # Validate syntax, policy, integrity
ductile config lock # Authorize state (update .checksums)
ductile config show [entity] # Show resolved config or entity
ductile config get <path> # Dot-notation read
ductile config set <path>=<value> # Modify (use --dry-run to preview)
ductile config init # Initialize config directory
ductile config backup / restore # Archive / restore configuration
ductile config token / scope # Manage API tokens and scopes
ductile config plugin / route / webhook # Manage routing artefacts
Job¶
ductile job inspect <job_id> [--json] # Lineage, baggage, artifacts
ductile job logs [--json] # Query stored job logs
# Filters: --plugin --command --status --submitted-by
# --from --to (RFC3339) --query --limit --include-result
Plugin¶
ductile plugin list [--api-url URL] [--json] # Discover loaded plugins
ductile plugin run <name> # Manual execution
API (direct gateway calls)¶
ductile api /jobs
ductile api /plugin/echo/poll -f message="hello"
ductile api /pipeline/youtube-wisdom -f url="…"
ductile api /system/reload -X POST
ductile api /healthz
# Flags: -X METHOD, -f key=value, -H Header:val, -b BODY, --api-url, --api-key
Top-level¶
ductile skills # Export capability registry as LLM Markdown
ductile version # Version + commit + build time
Universal flags¶
| Flag | Purpose |
|---|---|
--json |
Machine-readable output (all read commands) |
-v, --verbose |
Internal logic, path resolution, baggage merges |
--dry-run |
Preview mutations without committing |
--config <dir> |
Override config directory |
The config lock ritual¶
Every config or plugin manifest edit goes through:
ductile config check # validate
ductile config lock # authorize new state (updates .checksums)
ductile system reload # apply without restart
This is the cross-skill ritual. Plugin authoring hands off here. Incident response often discovers a forgotten-to-lock root cause. Owning this ritual is owning the seam between authoring and operating.
Config integrity (tiered)¶
| Tier | Files | On mismatch |
|---|---|---|
| High Security | tokens.yaml, webhooks.yaml, scopes/*.json |
Hard fail (refuses to start) |
| Operational | config.yaml, plugins/*.yaml, pipelines/*.yaml |
Warn & continue |
Entity addressing¶
Use <type>:<name> syntax with config show/get/set:
ductile config show plugin:withings
ductile config show pipeline:video-wisdom
ductile config set plugin:withings.enabled=false
ductile config show plugin:* # list all plugins
Selfcheck — six read-only invariants¶
config_discovery— config dir resolvesconfig_load— config parsespid_lock— PID file matches a running processdb_integrity—PRAGMA integrity_checkdb_schema— required tables/columns/indexes match embedded baselinequeue_terminal_freshness— no stale terminal-statejob_queuerows past retention
WAL safety: when the gateway holds the PID lock, checks 4-6 are
skipped with detail: "skipped: active gateway holds PID lock — quiesce
before selfcheck". The skip is correct behaviour, not a bug.
Real-green pattern: run selfcheck offline against the new binary BEFORE installing. Once installed and running, expect "skipped" on 4-6 — the proof of correctness is that the gateway started at all, because the schema validator runs at startup and refuses to open the DB on mismatch.
Backup — atomic point-in-time snapshot¶
Scopes (nested ladder; each adds to the previous):
db— DB snapshot only (SQLiteVACUUM INTO, safe under concurrent writers)config(default) —db+ ductile config dirplugins—config+ every dir underplugin_rootsall—plugins+ every file underenvironment_vars.include
Each archive embeds BACKUP_MANIFEST.txt with version, commit, hostname,
source paths, SHA256 of source DB, included/excluded items + reasons.
Refuses to overwrite an existing --to destination.
Inspect a manifest without re-extracting:
Migrations & schema¶
internal/storage/schema.sql is embedded in the binary; the schema
validator runs at startup and refuses to open a DB missing any required
table, column, or index. Schema changes ship as Python scripts at
scripts/migrate-*.py, idempotent by design, run with the service
quiesced.
Always backup before migration:
LLM capability discovery (system skills)¶
Ductile is designed for LLM operation. Get the current live manifest:
ductile system skills --config <your-config-dir>
# or set DUCTILE_CONFIG_DIR and run: ductile system skills
Outputs Markdown listing all plugin commands with endpoints, schemas, and
semantic anchors (mutates_state, idempotent, retry_safe) plus all
configured pipelines. See
DUCTILE_SKILLS_SCHEMA_V1.md for the
contract that output obeys.
Common workflows¶
Trigger a pipeline via API¶
curl -X POST http://<host>:<port>/pipeline/<name> \
-H "Authorization: Bearer $DUCTILE_TOKEN" \
-H "Content-Type: application/json" \
-d '{"payload": {"key": "value"}}'
Trigger a plugin directly (bypasses routing)¶
curl -X POST http://<host>:<port>/plugin/<name>/poll \
-H "Authorization: Bearer $DUCTILE_TOKEN" \
-d '{"payload": {}}'
Inspect a failed job (routine — no incident analysis needed)¶
Check gateway health¶
Architecture summary (operator view)¶
- Governance hybrid: control plane is SQLite
event_contextbaggage; filesystem state is plugin-managed. The core does not provision per-job workspaces. - Spawn-per-command: each plugin invocation is a fresh process (polyglot: bash, python, go, any executable).
- At-least-once: jobs survive crashes and are recovered on restart.
- Immutable audit:
origin_*baggage keys can never be overwritten by plugins.
Job statuses¶
queued → running → succeeded / failed / timed_out / dead
If you see dead or persistent failed, treat as an incident: hand off
to root-cause analysis (the ductile-rca skill) rather than continuing
routine operation.
When to load other skills¶
| Companion skill | When to load it |
|---|---|
ductile-plugin-developer |
The work requires touching a plugin's code, manifest, or pipeline composition — not just its config. |
ductile-rca |
Symptoms are not understood. Stuck, hanging, tripped, missing, wrong. Routine job inspect for a known-good system does not need RCA. |
surface-contract |
Docs and code have drifted; you need to audit and re-align them. |
Full ductile incident lifecycle (ductile-rca + this handbook +
ductile-plugin-developer) is real and worth keeping in mind.
Reference docs¶
In the same docs site:
- Architecture — the technical deep dive
- Deployment — host-local deployment + backup patterns
- Operator Guide — day-to-day commands with examples
- Health Check — invariants checked by
selfcheck - SQL Tightening Log — schema-change audit trail
- Reload RCA — canonical worked example of the reload deadlock RCA pattern
In the repo:
AGENTS.md— the contributor contract; the design grounding behind these commandsCONSTITUTION.md— the five pillars; this handbook is Pillar 1 (Run)