# Ductile > An automation runtime designed to be operated by AI agents. Ductile is a self-hosted automation runtime designed to be operated by AI agents. You describe a goal in natural language; your AI agent authors a YAML pipeline, runs it, watches the logs, and iterates until your goal is met. A single Go binary orchestrates polyglot plugins via a JSON-over-stdin/stdout subprocess protocol; every surface (NOUN ACTION CLI, REST API with OpenAPI discovery, queryable execution ledger) is shaped so an LLM can drive it. Sized for personal-scale automation (50–500 jobs/day). # Introduction # Getting Started with Ductile Welcome to **Ductile**, an automation runtime AI agents can run, debug, and build for — and humans can audit. This guide will help you get up and running in minutes. See [`CONSTITUTION.md`](https://ductile.run/CONSTITUTION.md) for why the system is shaped this way. ______________________________________________________________________ ## 1. Installation Ductile is written in Go and requires version **1.25.4** or newer. 1. **Clone the repository:** ```bash git clone https://github.com/mattjoyce/ductile.git cd ductile ``` 1. **Build the gateway:** ```bash go build -o ductile ./cmd/ductile ``` This creates a single executable named `ductile` in your project root. ______________________________________________________________________ ## 2. Basic Usage (The Echo Showcase) After building the binary, you can run the included `echo` plugin to verify the system. ### Step 1: Verify Plugin Discovery Ductile discovers plugins from `plugin_roots`. For this repo, the local `plugins/` directory includes `echo`: ```bash ls -F plugins/echo/manifest.yaml ``` ### Step 2: Configure the Plugin Ductile uses a directory-based config layout (typically `~/.config/ductile/`). This repo ships example files in `config/` — copy that folder to your config dir and edit. ```bash cp -R ./config ~/.config/ductile ``` ```yaml # ~/.config/ductile/config.yaml excerpt plugin_roots: - "~/.config/ductile/plugins" - "./plugins" include: - api.yaml - plugins.yaml - pipelines.yaml - webhooks.yaml ``` ```yaml # ~/.config/ductile/plugins.yaml excerpt plugins: echo: enabled: true schedules: - id: default every: 5m jitter: 30s config: message: "Hello from Ductile!" ``` ### Step 2b: Add an External Plugin Root (Optional) You can mount additional plugin volumes and add them to `plugin_roots` in priority order: ```yaml plugin_roots: - "~/.config/ductile/plugins" - "./plugins" - "/opt/ductile/plugins-private" ``` Container example: ```bash docker run --rm \ -v "$PWD/config:/config" \ -v "$PWD/plugins:/app/plugins" \ -v "/srv/ductile-private-plugins:/opt/ductile/plugins-private:ro" \ ductile:latest ./ductile system start --config-dir /config ``` ### Step 3: Start the Gateway Run the service in the foreground (defaults to `~/.config/ductile`): ```bash ./ductile system start ``` Or explicitly point to a config directory: ```bash ./ductile system start --config-dir ~/.config/ductile ``` You will see logs indicating the scheduler has started. After 5 minutes (or however you configured it), you'll see the echo job execute and complete. ### Step 4: Graceful Shutdown Press `Ctrl+C` to stop the gateway. It will wait for any in-flight jobs to finish before releasing the process lock. ______________________________________________________________________ ## 3. CLI Principles Ductile is designed to be operated by both humans and LLMs. All commands follow a strict **NOUN ACTION** hierarchy: - **Hierarchy:** `ductile job inspect`, `ductile config lock`, `ductile system status`. - **Verbosity:** Use `-v` or `--verbose` for detailed logic traces. - **Safety:** Use `--dry-run` for any mutation to preview changes. - **Machine-Readability:** Use `--json` to get structured data for scripts or agents. ______________________________________________________________________ ## Next Steps - **Operators:** Read the [Operator Guide](https://ductile.run/OPERATOR_GUIDE/index.md) to learn about monitoring and system maintenance. - **Developers:** Visit the [Plugin Development Guide](https://ductile.run/PLUGIN_DEVELOPMENT/index.md) to start building your own skills. - **Architects:** Deep dive into the [Architecture](https://ductile.run/ARCHITECTURE/index.md) and [Pipelines](https://ductile.run/PIPELINES/index.md) model. # The 8 Idioms of Ductile These are the design rules. Not preferences — the discipline that keeps Ductile small enough to reason about and predictable enough for an agent to drive. When a change feels off, it usually violates one of these; when proposing a new plugin, pipeline DSL feature, or core surface, name the idioms it serves. The 8 are arranged: **foundation → flow → discipline → audience.** Read them in order; later idioms presuppose earlier ones. For why Ductile exists at all, see [`../CONSTITUTION.md`](https://ductile.run/CONSTITUTION.md). For the contributor mechanics, see [`../AGENTS.md`](https://ductile.run/AGENTS.md). This document is the authoring contract those two reference. ______________________________________________________________________ ## Foundation — what the system is ### 1. Every unit of work is a queued job. There is no other path. Webhooks, schedules, API calls, plugin output routes, even `ductile run ` invocations from the CLI — all enqueue into the single SQLite-backed FIFO and dispatch from there. **Why.** A single source of truth makes execution deterministic, retryable, and observable by construction. Side channels (a "quick" direct call that bypasses the queue) destroy all three properties at once: you lose the retry budget, the trace, and the ledger entry that RCA depends on. **How it shows up.** `internal/queue/queue.go` is the only writer to `job_queue`. Every producer — scheduler, webhook receiver, router, CLI — funnels through `Enqueue`. No code path serves a plugin's output without queueing first. ### 2. Spawn-per-command. No daemon plugins. For every command: fork the plugin's entrypoint, write JSON to its stdin, read JSON from its stdout, kill the process. Lifetime is one invocation. **Why.** This single decision buys language agnosticism, crash isolation, and the absence of shared mutable plugin state in one move. There are no zombie processes to manage, no plugin memory leaks to chase, no protocol versions to negotiate over a long-lived connection. A plugin that breaks breaks one job; the supervisor stays up. **How it shows up.** `internal/dispatch/dispatcher.go` calls `spawnPlugin` with a fresh subprocess per invocation. Timeouts are SIGTERM → 5s grace → SIGKILL. The wire protocol envelope is the entire contract between core and plugin. ### 3. Core owns orchestration; plugins own side-effects. Routing, fan-out (`split:`), conditional branching (`if:`), payload remapping (`with:`) — all live in YAML pipelines, evaluated by the core. Plugins do domain work and emit facts. Plugins do not decide what runs next. **Why.** Orchestration must be inspectable: a human reads the pipeline file in `git diff`, an agent reads it via the API, and both see the same flow. If orchestration lives inside plugins, neither can audit it without reading source in three languages. Side-effects are necessarily messy and plugin-local; orchestration is necessarily structural and shared. **How it shows up.** Authored `if:` predicates compile into an internal `core.switch` hop in the dispatcher. Plugin code does not branch on payload to decide downstream routing; that decision is in YAML. The legacy `plugins/switch/` reference plugin remains for compatibility but its own manifest names it "Legacy payload classifier. Prefer pipeline `if:` conditions for new authoring." ______________________________________________________________________ ## Flow — how data moves ### 4. Events are the contract; payloads are the currency. Event types are stable, named, and routed. Payloads carry the data, shaped by the producing plugin's declared `fact_outputs`. Renaming an event is a breaking change. Adding a new event is not. **Why.** Stable contracts let plugins evolve independently. A consumer that subscribes to `github.webhook.pull_request` should not care that the underlying GitHub plugin is now in version 0.7.x. The event type is the join key; the payload shape is documented in the manifest. **How it shows up.** The router (`internal/router/`) matches on event type only — exact match, no wildcards. Payload validation against `fact_outputs` is the producer's responsibility at emit time. ### 5. Value, state, and identity are kept separate. `config` is a value (static, env-interpolated, immutable for the run). `plugin_facts` is an append-only series of values (the durable record of what a plugin observed). `plugin_state` is a derived view (a compatibility/cache projection of the latest fact). `pipeline` is an identity (a stable name for a series of executions; its runs are values). `job` is an identity (a queued unit; its status is state). **Why.** Conflating these is the code path that breaks under retry and crash. The classic mistake is "let's just update the row" — but the question is whether the row is a value (you can't update; you append a new one) or state (you can, with care). The plugin_facts vs plugin_state split is this idiom made concrete: facts are values you append; state is the latest-value view rebuilt from them. **How it shows up.** `internal/state/` separates the two storage shapes. The manifest's `compatibility_view` declaration tells the core how to project facts into a state view for backward compatibility. Pipelines have stable names (identity); pipeline runs are immutable records (values). ______________________________________________________________________ ## Discipline — how to extend it ### 6. Idempotent by design. At-least-once delivery is the contract. Every command must be safe to repeat. Plugins that need uniqueness use a `dedupe_key` on the event, not an "exactly-once" myth. **Why.** Retries are guaranteed by the architecture: the queue replays crashed jobs, the scheduler can fire twice if the clock jitters, the webhook receiver may see the same delivery twice if the sender retries. A plugin that corrupts state on the second call is a plugin that will corrupt state in production. There is no path to "exactly once" — only "idempotent + retried." **How it shows up.** The router carries `dedupe_key` through the event envelope; the queue deduplicates pending jobs by `(plugin, dedupe_key)` within the active window. Plugins that mutate external state are expected to either accept duplicate calls cleanly or use the dedupe key themselves. ### 7. Composable over configurable. Many small plugins chained with `with:` remap > one plugin with twenty flags. Many short pipelines > one long pipeline with eighteen conditionals. Configuration surfaces grow forever; composition surfaces grow only where there's a real new shape. **Why.** "Simple is the goal, not easy." A plugin with a giant option matrix looks easy ("you can do anything!") but is hard to reason about, test, and operate. A small plugin with one job composes with others to do the same work, but each piece is independently verifiable. **How it shows up.** The pipeline `with:` step was added specifically to avoid the proliferation of one-off plugin aliases that differ only in how they relabel their input. Step-level remapping does the relabeling; the underlying plugin stays focused. ______________________________________________________________________ ## Audience — who operates it ### 8. Every surface is agent-drivable. NOUN ACTION CLI, OpenAPI on every endpoint, structured JSON I/O, queryable execution ledger, exit codes that mean something. No path through Ductile requires reading source or clicking a canvas. The agent is the primary operator; the human is the auditor. **Why.** This is the alignment paragraph in the Constitution made practical. Observability is not a feature here — it's the substrate this idiom rests on. A surface that an agent cannot drive blind (CLI without machine-readable output, an endpoint without OpenAPI, a config knob without schema) violates the alignment. **How it shows up.** - CLI: `ductile `; `--output json` on every read command. - API: every endpoint listed in `/openapi.json`; `/skills` registry enumerates pipelines as discoverable tools. - Diagnostics: `GET /system/doctor`, `GET /system/selfcheck`, `GET /stopwatch/{plugin}` (p50/p95/p99), `GET /topology` (plugin- signal-plugin graph). - Ledger: every job, every step, every plugin invocation persisted in SQLite; queryable directly or via `ductile inspect`. If you add a feature whose only interface is a curl command an agent has to construct from documentation, you've violated this idiom. Add the OpenAPI entry, the CLI verb, the structured exit code. ______________________________________________________________________ ## What's not in the 8 (deliberately) These appear in older docs or look idiom-shaped but are not rules: - **"Ductile is upstream and downstream."** A capability claim, not a design rule. Lives in the capabilities list, not here. - **"Switch decides; plugins implement."** Superseded by idiom 3 (core owns orchestration). The Switch plugin is legacy; `if:` predicates are the authoring concept. - **"Observability is a feature."** Subsumed by idiom 8 — observability isn't an *added* feature, it's how the agent-drivable surfaces work at all. - **"Workflow logic belongs in the plugin."** The pre-Sprint-6 version of idiom 3. Inverted by current architecture; left here as a warning. ______________________________________________________________________ ## What this list is for Two readers: **A pipeline or plugin author** (human or agent) uses these as a checklist. Does this connector hold state across invocations? (Violates 2.) Does this YAML stash branching logic inside the plugin's config? (Violates 3.) Does this command have a `--quiet` flag but no JSON output mode? (Violates 8.) **A reviewer of changes to the core** uses these as the bar. A PR that adds a long-lived plugin connection (violates 2), or a route-by-string- matching feature (violates 4), or a new endpoint without an OpenAPI schema (violates 8), is a PR that needs to defend the deviation, not land quietly. When in doubt, check against the [Constitution](https://ductile.run/CONSTITUTION.md) pillar the change is supposed to serve. An idiom that doesn't fit any pillar is probably wrong; a pillar a change doesn't serve is probably the wrong pillar. # Ductile Glossary Key terms used throughout Ductile's documentation and configuration. ______________________________________________________________________ ## Gateway The `ductile` binary — the central runtime that manages plugins, schedules work, routes events, and maintains the execution ledger. ## Plugin / Connector A polyglot adapter that connects Ductile to an external system (an API, a database, a shell command). Written in any language; communicates via JSON over stdin/stdout. - **Plugin:** The code and manifest (the implementation). - **Connector:** The logical integration point (the "skill"). ## Alias (Plugin Instance) A uniquely named and configured instance of a base plugin. Defined in `plugins.yaml` using the `uses:` field. Allows running multiple copies of the same logic (e.g., `discord_alerts` vs `discord_logs`) with different settings. ## Command A discrete operation provided by a plugin. Common commands include: - **`poll`** — proactive; Ductile calls the plugin on a schedule to pull data. - **`handle`** — reactive; the plugin processes an incoming event. - **`health`** — diagnostic; verifies the plugin's prerequisites are met. - **`init`** — one-time setup; runs when a plugin is first registered. ## Pipeline A high-level workflow orchestration defined in YAML. Pipelines react to a single trigger event and execute a sequence of plugin steps, automatically passing data between them. ## Event Bus The internal routing layer that decouples producers (schedules, webhooks, API) from consumers (pipelines, plugins). It ensures events are distributed to all matching routes. ## Event A typed packet of data (e.g., `youtube.playlist_item`) that signals an occurrence and triggers routing within the gateway. ## Payload The JSON object attached to an event. Payload fields are passed to downstream plugins when the event is routed. ## Context (Baggage) Immutable metadata (e.g., `origin_user_id`, `trace_id`) that persists across every hop of a multi-step pipeline once a step claims it with `baggage`. Carried in the `event_context` ledger and merged into downstream requests. ## Worker Pool (Max Workers) The global set of execution slots that process jobs in parallel. Controlled by `service.max_workers` (defaults to `max(1, CPU-1)`). Operators can force whole-system serial dispatch by setting it to `1`. ## Parallelism The maximum number of concurrent jobs allowed for a specific plugin or alias. Prevents a single resource-heavy plugin from saturating the worker pool. ## Concurrency Safe A boolean hint in a plugin's `manifest.yaml`. Omitted means `true`. If set to `false`, the plugin author is declaring that same-plugin concurrent execution is unsafe unless an operator deliberately constrains or overrides plugin parallelism. ## Smart Dequeue The logic that skips jobs in the queue if their target plugin has already reached its parallelism limit, allowing other plugins to proceed. ## Result The human-readable summary or data returned by a plugin in its protocol response. Often used as the input for the next step in a pipeline. ## Plugin Facts The append-only record of durable plugin observations. Each row carries a stable snapshot a plugin emitted as `state_updates`, plus a manifest-declared `fact_type` and a Ductile-owned monotonic `seq`. This is the durable record of what a plugin remembers across runs. See [PLUGIN_FACTS.md](https://ductile.run/PLUGIN_FACTS/index.md). ## Plugin State (Compatibility View) A single JSON row per plugin maintained as the compatibility/cache view of the latest fact. Existing readers (and legacy plugins that have not yet declared `fact_outputs`) see the same shape they always have. The view is rebuilt automatically by core when a new fact lands. New plugins should declare `fact_outputs` rather than treating this row as the place where durable truth lives. ## Job The atomic unit of work in Ductile. Every command invocation creates an immutable Job record capturing input, output, logs, and status. ## Queue The persistent, SQLite-backed job queue. All triggers (scheduler, router, API, webhooks) submit jobs here for the worker pool to pick up. ## Schedule A configuration entry that tells the scheduler when and how to run a plugin command (e.g., `every: 5m`, `cron: "0 * * * *"`). ## Jitter A random offset applied to schedules to prevent multiple jobs from triggering at the exact same millisecond (the "thundering herd" problem). ## Dedupe Key A unique string used to suppress duplicate enqueues. If a job with the same key is already queued or recently succeeded, the new enqueue is ignored. ## Circuit Breaker An automated safety switch that "opens" after repeated plugin failures, temporarily blocking scheduled runs to allow the system or external API to recover. ## Webhook An HMAC-verified HTTP endpoint that accepts external events and injects them into the Ductile event bus. ## Skill A machine-readable description of a capability (either an atomic plugin command or an orchestrated pipeline), exported via `/skills` or `/openapi.json`. ## Workspace (historical) Formerly: a per-job, hard-link-cloned directory the core provisioned for each plugin invocation. Removed; the core no longer touches the filesystem on a job's behalf. Plugins that need a scratch path manage it themselves. ## Execution Ledger The persistent history of all jobs, pipeline steps, and event transitions. Used for the TUI "Overwatch" and audit logging. # Audiences Three orthogonal axes define who reads Ductile's documentation and uses its surfaces. Eight cells. Each cell is a real reader, and every documentation or software affordance can be evaluated against the cells it serves. This document is **taxonomy-neutral**: it describes who the readers are, not how `docs/` is organised today. The doc layout should serve these audiences; when it stops doing so, the layout changes — these definitions do not. ______________________________________________________________________ ## Axes | Axis | Distinction | What it controls | | -------------------- | ------------------------------------------------ | ----------------------------------------------------------------------------------- | | **Agent ↔ Human** | Who is reading? | *Form*: machine-actionable schemas/skills vs narrative prose. | | **Coder ↔ Operator** | Are they changing Ductile, or running it? | *Domain*: `internal/`-facing code surfaces vs `~/.config/`-facing runtime surfaces. | | **Learner ↔ Expert** | Forming a mental model, or looking something up? | *Density*: tutorial + one example vs reference + invariants. | The axes are independent. A persona is *not* a stereotype; it is the intersection of three deliberate choices. ______________________________________________________________________ ## The eight cells | # | Persona | Axes | Landing surface today | Coverage | | --- | ------------------------------------------------- | --------- | -------------------------------------------------------------------- | ----------- | | 1 | New contributor / first-time plugin author | H · C · L | `README.md`, `docs/GETTING_STARTED.md`, `docs/PLUGIN_DEVELOPMENT.md` | **partial** | | 2 | Maintainer / experienced plugin author | H · C · E | `AGENTS.md`, `docs/ARCHITECTURE.md`, `docs/PIPELINES.md` | **served** | | 3 | Evaluator / first-time installer | H · O · L | `README.md`, `docs/GETTING_STARTED.md`, `docs/MACOS_INSTALLATION.md` | **partial** | | 4 | Veteran operator running Ductile in production | H · O · E | `docs/OPERATOR_GUIDE.md`, `docs/DEPLOYMENT.md`, `docs/DATABASE.md` | **partial** | | 5 | Cold-start agent generating a plugin | A · C · L | `schemas/`, `skills/ductile/`, `plugins/echo/` | **partial** | | 6 | In-repo coding agent | A · C · E | `AGENTS.md`, `internal/docs/lint_test.go` | **served** | | 7 | Agent doing first-time setup or config generation | A · O · L | none canonical | **gap** | | 8 | Agent operating a live Ductile instance | A · O · E | `/skills` registry, OpenAPI | **partial** | **Status legend:** *served* — surface exists and works for this cell. *partial* — surface exists but is incomplete, scattered, or known-stale. *gap* — net-new content or feature is needed. *deferred* — work is parked with explicit scope (see Coverage below). ______________________________________________________________________ ## One-paragraph stories **1. H · C · L — first-time contributor.** Cloned the repo to add a plugin for service X. Wants a 30-minute path from clone to working plugin on their machine, learning idioms by doing rather than by reading 800-line specs. Needs: a "start here" page ≤ 5 minutes of reading, one runnable teaching plugin, vocabulary table cross-linked so terms acquire meaning during the walkthrough. **2. H · C · E — maintainer.** Changing the queue, router, or protocol. Wants design decisions, invariants, and trade-offs in one click so they don't break a constraint they didn't know existed. Needs: authoritative architecture doc (today), per-package design notes for load-bearing packages (today: gap), enumerated non-negotiable constraints (today: `AGENTS.md §3d`). **3. H · O · L — evaluator.** Just heard about Ductile. Wants a five-minute path from install to seeing a real event flow, *before* reading documentation, so they can decide whether Ductile fits. Needs: a working minimal example, generated config (not copy-paste), one screenshot or asciinema of a live event flow (the standalone `ductile-watch` TUI is under redesign; see `ductile-hickey-tui-rip-and-rewrite` working note). **4. H · O · E — veteran operator.** Running Ductile on Unraid or a homelab. Wants runbooks for failure modes — orphaned jobs, integrity check failure, plugin crash loop, full disk — so they can recover without reading source. Needs: operator runbook organised by *symptom*, not by feature. **5. A · C · L — cold-start agent generating a plugin.** Invoked with no prior Ductile context. Wants to discover the plugin contract from machine-readable schemas and one canonical example, so it can produce a valid plugin without hallucinating field names. Needs: schemas at a stable path, a reference plugin explicitly tagged as the teaching example. **6. A · C · E — in-repo coding agent.** Working inside the repo on a real change. Wants the contract — style, safety, idioms, vocabulary — in one file at a predictable path. Needs: `AGENTS.md` (now done), per-package design notes (overlaps with persona 2), a `bd` workflow it can drive (already present). **7. A · O · L — agent doing first-time setup.** Helping a user adopt Ductile. Wants a deterministic way to generate a valid initial config and verify it, so it never produces config that fails `ductile config check`. Needs: `ductile init`, advertised use of `schemas/config.schema.json`, a curated `examples/` library. **8. A · O · E — agent operating live Ductile.** Building automations against a running instance. Wants every operation discoverable through `/skills` and OpenAPI, so it does not need to read prose docs to act safely. Needs: live OpenAPI, `/skills` as the *primary* surface for this cell, an agent-readable runbook for recovery flows. ______________________________________________________________________ ## Cross-cutting design implications - **Audiences share information; they need different forms.** The same fact ("baggage propagates downstream") needs narrative form for Learner cells, reference form for Expert cells, and machine-readable form for Agent cells. A doc plan that ignores this rebuilds the same content three times by accident; a deliberate plan maintains it once and projects it three ways. - **Coder ↔ Operator is the cleanest domain split** and maps naturally to the existing `internal/` vs `~/.config/` boundary. - **Learner ↔ Expert is information density.** A document trying to be both serves neither. Tutorial content and reference content should not share files. - **Agent ↔ Human is a form question.** Agent surfaces (`schemas/`, `skills/`, OpenAPI, `AGENTS.md`) are not separate documentation; they are the same content in machine-actionable form. They deserve first-class billing alongside `docs/`, not inside it. - **Two cells unblock from the same work.** Personas 3 (H·O·L) and 7 (A·O·L) both fail today for the same reason: there is no canonical, validated minimal config. A `ductile init` plus a curated `examples/` library serves both. - **One cell is a content gap, not a layout gap.** Persona 4 (H·O·E) needs symptom-organised runbooks that do not exist anywhere today. No reorganisation produces them. ______________________________________________________________________ ## Coverage today | Cell | Status | Notes | | ------- | -------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | 1 H·C·L | partial | No signposted "first 30 minutes" path; idioms now live in `AGENTS.md` but the entry experience does not yet guide a learner there. | | 2 H·C·E | served | `AGENTS.md` (vocabulary, design grounding, constraints) plus `docs/ARCHITECTURE.md` cover the steady-state need. Per-package design notes would strengthen it. | | 3 H·O·L | partial / **deferred (Phase 3)** | Root `config.yaml` is the welcome-mat and is known-stale. Rethinking the exemplar is parked: `ductile init` + curated `examples/`. Watch TUI ripped pending `ductile-watch` rewrite (see `ductile-hickey-tui-rip-and-rewrite`). | | 4 H·O·E | partial / **gap** | Operator guide and deployment exist; symptom-driven runbooks do not. Net-new content. Interim observability is the API and structured logs; `ductile-watch` redesign tracked in `ductile-hickey-tui-rip-and-rewrite`. | | 5 A·C·L | partial | Schemas exist but are not advertised; no plugin is explicitly labelled as the canonical teaching example. | | 6 A·C·E | served | Unified `AGENTS.md` is the contract; doc-smoke lint covers it. | | 7 A·O·L | **gap / deferred (Phase 3)** | Same unblock as persona 3. | | 8 A·O·E | partial | `/skills` and OpenAPI exist; agent-readable recovery runbooks do not. | ______________________________________________________________________ ## How to use this document - **As a reader:** find your cell. Follow its landing surface. If your cell is marked *partial* or *gap*, the documentation cannot fully serve you yet and the gap is acknowledged here. - **As a contributor proposing a change to docs or affordances:** name the cells it serves and the cells it does not. A change that serves a *gap* cell is high-leverage; a change that re-paints a *served* cell needs stronger justification. - **As a reviewer:** cite cells in review comments. *"This is for cell 4, currently a gap"* is a precise statement; *"this is too detailed for beginners"* is not. - **As a maintainer:** when the doc taxonomy changes, re-validate this file. Every cell must still resolve to a landing surface (or be honestly marked *gap* / *deferred*). ______________________________________________________________________ ## Maintenance This file is referenced from `AGENTS.md` and `CONTRIBUTING.md` and is expected to evolve alongside the doc taxonomy. Coverage status should be updated whenever a cell's landing surface materially changes. If a cell goes unserved for a release without being marked *deferred*, that is a planning signal, not a documentation signal. # Concepts # Ductile — Specification **Version:** 1.0 **Date:** 2026-02-08 **Author:** Matt Joyce **Sources:** RFC-001, RFC-002, RFC-002-Decisions This is the unified, buildable specification for Ductile. It supersedes all prior RFCs and review documents. ______________________________________________________________________ ## 1. Overview ### 1.1 Problem Ductile currently exists as a FastAPI monolith handling health data ETL, LLM processing, and various integrations. Adding new connectors means modifying the core application. Existing integration servers (n8n, Huginn, Node-RED) are too heavy for a personal service. ### 1.2 Solution An automation runtime built for AI agents to operate, diagnose, and extend. Where platforms impose workflow, Ductile provides primitives: a NOUN ACTION CLI, a manifest-contracted plugin protocol, and a queryable execution ledger. A compiled Go core orchestrates polyglot plugins via a subprocess protocol. Simple enough for a human to understand in an afternoon; structured enough for an agent to drive the full lifecycle without supervision. See [`../CONSTITUTION.md`](https://ductile.run/CONSTITUTION.md). ### 1.3 Scope This is a **personal integration server** processing roughly 50 jobs per day. Design decisions are calibrated to that scale. The system runs unattended and must behave predictably under crash, retry, and timeout conditions. ______________________________________________________________________ ## 2. Architecture ```text ┌─────────────────────────────────────────────┐ │ ductile │ │ (Go binary, ~1 process) │ │ │ │ ┌───────────┐ ┌──────────┐ ┌───────────┐ │ │ │ Scheduler │ │ Webhook │ │ CLI │ │ │ │ (heartbeat)│ │ Receiver │ │ Commands │ │ │ └─────┬──────┘ └────┬─────┘ └─────┬─────┘ │ │ │ │ │ │ │ ▼ ▼ ▼ │ │ ┌────────────────────────────────────────┐ │ │ │ WORK QUEUE │ │ │ │ (in-memory, SQLite-backed for │ │ │ │ persistence/crash recovery) │ │ │ └──────────────────┬─────────────────────┘ │ │ │ │ │ ▼ │ │ ┌────────────────────────────────────────┐ │ │ │ DISPATCH LOOP (serial) │ │ │ │ pull job → spawn plugin → collect │ │ │ │ result → route events → update │ │ │ │ state → repeat │ │ │ └──────────────────┬─────────────────────┘ │ │ │ │ │ ┌──────────┐ ┌────┴─────┐ ┌────────────┐ │ │ │ Config │ │ State │ │ Plugin │ │ │ │ Loader │ │ Store │ │ Registry │ │ │ │ (YAML) │ │ (SQLite) │ │ │ │ │ └──────────┘ └──────────┘ └────────────┘ │ └─────────────────────┬───────────────────────┘ │ stdin/stdout JSON protocol ┌─────────────┼─────────────┐ ▼ ▼ ▼ ┌─────────┐ ┌──────────┐ ┌─────────┐ │withings/ │ │ google/ │ │ notify/ │ │ run.py │ │ run.py │ │ run.sh │ └─────────┘ └──────────┘ └─────────┘ ``` ### 2.1 Key Decisions | Decision | Choice | Rationale | | ------------------ | ------------------------------------- | ---------------------------------------------------------------------------- | | Core language | Go | Single binary, easy deployment, natural subprocess spawning | | Plugin coupling | Subprocess (JSON over stdin/stdout) | Language-agnostic, fault-isolated, drop-in plugins | | Scheduling | Heartbeat with fuzzy intervals | Human-friendly, avoids thundering herd | | Execution | Bounded Worker Pool | High-throughput, resource-safe, per-plugin concurrency caps | | Routing | Config-declared, fan-out, exact match | Plugins stay dumb, core controls flow | | Pipeline Execution | Async by default; Sync opt-in | Preserves event-driven core while enabling interactive results | | State | SQLite | Proven, zero-ops; append-only `plugin_facts` with derived compatibility view | | Delivery | At-least-once | Plugins own idempotency; core never drops work | | Plugin lifecycle | Spawn-per-command | Eliminates daemon management, memory leaks, zombie processes | ### 2.2 Governance Hybrid (The "Control Plane") Ductile employs a "Governance Hybrid" model to manage state across multi-hop plugin chains. Filesystem state is the plugin's concern; the core is dispatch, routing, and durable state. - **Control Plane (Baggage):** Metadata about the execution (e.g., `origin_user_id`, `trace_id`). This data is stored in the `event_context` SQLite ledger. Values become durable only when a pipeline step claims them with `baggage`, and inherited baggage paths are immutable. - **No core-managed data plane.** The core does not provision per-job filesystem workspaces. Plugins that need a scratch path (`mktemp -d`) or a persistent cache (`~/.cache//`) manage it themselves; pipelines that need step-to-step file passing wire absolute paths via `with:` baggage. See `docs/PLUGIN_DEVELOPMENT.md` §9 for guidance. ______________________________________________________________________ ## 3. Work Queue The central abstraction. All producers submit to a single queue. ### 3.1 Producers | Producer | Trigger | | ---------------- | ------------------------------------ | | Scheduler | Heartbeat tick finds a plugin is due | | Webhook receiver | Inbound HTTP event | | Router | Plugin output matches a routing rule | | CLI | Manual `ductile run ` | ### 3.2 Job Model ```text { id: UUID plugin: string command: string (poll | handle) payload: JSON status: queued | running | succeeded | failed | timed_out | dead attempt: int (starts at 1) max_attempts: int (default 4) submitted_by: string (scheduler | webhook | route | cli) dedupe_key: string (optional) created_at: timestamp started_at: timestamp (null until running) completed_at: timestamp (null until terminal) next_retry_at: timestamp (null unless awaiting retry) last_error: text (null unless failed) parent_job_id: UUID (null unless created by routing) source_event_id: UUID (null unless created by routing) } ``` No `priority` field. Jobs are strictly FIFO. ### 3.3 Job State Machine ```text queued → running → succeeded → failed → queued (retry) → dead (max retries exceeded) → timed_out → queued (retry) → dead (max retries exceeded) ``` ### 3.4 Delivery Guarantee **At-least-once.** A job may run more than once (after crash, timeout, or retry). It will never be silently dropped. - Plugins MUST be idempotent, or use `state` to track what they've already processed. - The core provides an opt-in `dedupe_key` field. If a job is enqueued with a `dedupe_key` matching a job that succeeded within the effective dedupe window, it is not enqueued. The drop is logged at `INFO` with the `dedupe_key` and existing job ID. - `dedupe_ttl` is configurable (default 24h) and acts as the default dedupe window. Callers may set a per-enqueue dedupe TTL override when a narrower window is needed (for example, scheduler cadence). When this override is set, enqueue also guards against in-flight duplicates (`queued`/`running`) for that `dedupe_key`. ### 3.5 Dispatch **Bounded Worker Pool.** Ductile uses a global worker pool to process jobs in parallel. This ensures high throughput while preventing resource exhaustion. - **Global Limit:** Controlled by `service.max_workers` (defaults to `max(1, CPU-1)`). Operators can force whole-system serial dispatch by setting `service.max_workers: 1`. - **Plugin Parallelism:** Each plugin can define a `parallelism` limit in its configuration. The plugin manifest's `concurrency_safe` hint is the plugin author's declaration about whether same-plugin concurrent execution is safe; omitted means `true`. - **Smart Dequeue:** The scheduler and dispatcher skip jobs for plugins that have reached their active parallelism cap, ensuring the worker pool remains available for other tasks. Running counts and same-`dedupe_key` execution exclusion are derived from `job_queue`; dispatcher in-memory counters are local worker lifecycle coordination only. Revisit condition: sustained queue wait times exceed 60 seconds with all workers saturated. ### 3.6 Deduplication When a producer enqueues a job with a `dedupe_key`: 1. Determine effective dedupe TTL: per-enqueue override (if provided), otherwise service `dedupe_ttl`. 1. If a per-enqueue override is set, query for an existing `queued` or `running` job with the same `dedupe_key`. 1. Query for a `succeeded` job with the same `dedupe_key` completed within the effective TTL. 1. If either check finds a match: do not enqueue. Log at `INFO`: dedupe_key, existing job ID. 1. If no match is found: enqueue normally. During dispatch, a queued job with a `dedupe_key` is skipped while another job with the same `dedupe_key` **and the same target (`plugin` + `command`)** is `running`. The guard is per-target by design: a single source event that fans out to multiple distinct targets inherits one `dedupe_key`, and those distinct-target siblings must still run concurrently rather than serialise (and starve) behind each other. That execution serialisation is query-backed by `job_queue`, not a separate durable state table. ______________________________________________________________________ ## 4. Scheduler A single internal tick loop manages scheduled `poll` jobs. Each enabled plugin can define one or more schedule entries under `schedules:`. Plugins without schedules are ignored by the scheduler and can still be triggered via webhook, router, CLI, or API. For a full field-by-field reference and behavior details, see [SCHEDULER.md](https://ductile.run/SCHEDULER/index.md). ### 4.1 Schedule Entries Each schedule entry is independent and has its own ID (default: `default`), command, and payload: ```yaml plugins: withings: schedules: - id: hourly every: 1h command: poll payload: source: heartbeat ``` Supported schedule types: - `every`: Interval schedule (`5m`, `15m`, `30m`, `hourly`, `2h`, `daily`, `weekly`, `monthly`). - `cron`: Standard 5-field cron (`min hour dom month dow`). - `at`: One-shot RFC3339 timestamp. - `after`: One-shot delay from service start. ### 4.2 Time Controls Schedule execution can be constrained with time settings: - `jitter`: Random offset per scheduled run. - `only_between`: Time window string (e.g. `"08:00-22:00"`). - `timezone`: IANA timezone for cron/window evaluation. - `not_on`: List of weekdays to skip (`[saturday, sunday]` or `[0-6]`). `preferred_window` exists in config but is not enforced yet. Jitter is computed per scheduled run (not per tick): ```text next_run = last_successful_run + interval + random(-jitter/2, +jitter/2) ``` ### 4.3 Catch-up and Overlap Two per-schedule policies control missed ticks and concurrency: - `catch_up`: `skip` (default), `run_once`, `run_all`. - `if_running`: `skip` (default), `queue`, `cancel`. ### 4.4 Poll Guard The scheduler **must not enqueue** a new `poll` job if there is already a `queued` or `running` `poll` job for that plugin. Configurable per-plugin (default 1): ```yaml plugins: withings: max_outstanding_polls: 1 ``` ______________________________________________________________________ ## 5. Plugin System ### 5.1 Lifecycle: Spawn-Per-Command One process per job. No long-lived plugin processes. 1. Fork the plugin entrypoint. 1. Write JSON request to stdin. 1. Close stdin. 1. Read stdout until EOF or timeout. 1. Capture stderr. 1. Collect exit code. 1. Kill the process if it hasn't exited. Process spawn overhead is ~5ms on Linux — irrelevant when the shortest interval is 5 minutes. **Persistent connections (WebSockets, long-polling) are out of scope.** If needed, run as a separate service that pushes events into Ductile via the webhook endpoint. No streaming plugin mode — not now, not ever for this core. ### 5.2 Commands | Command | Purpose | When | | -------- | ------------------------------- | ------------------------------------- | | `poll` | Fetch data from external source | Scheduled by heartbeat | | `handle` | Process an inbound event | Routed from another plugin or webhook | | `health` | Diagnostic check | On-demand via `ductile status` | | `init` | One-time setup | On first discovery or config change | - `init` is not retried on failure — plugin is marked unhealthy. - `health` is not called on a schedule — it's a diagnostic tool for the operator. ### 5.3 Plugin Directory Structure ```text plugins/ ├── withings/ │ ├── manifest.yaml │ └── run.py ├── google-calendar/ │ ├── manifest.yaml │ └── run.py ├── notify/ │ ├── manifest.yaml │ └── run.sh └── lib/ # shared helpers (e.g. OAuth utilities) ``` ### 5.4 Manifest **Object format:** ```yaml manifest_spec: ductile.plugin manifest_version: 1 name: withings version: 1.0.0 protocol: 2 entrypoint: run.py description: "Fetch health data from Withings API" commands: poll: type: read description: "Fetch latest measurements from Withings API" sync: type: write description: "Push weight data to Withings API" oauth_callback: type: write description: "Handle OAuth2 callback and store tokens" health: type: read description: "Health check" config_keys: required: [client_id, client_secret] optional: [access_token] ``` **Command type semantics:** - `type: read` - No external side effects, idempotent (safe for automated retries) - Examples: poll, fetch, get, list, health - May emit a durable snapshot via `state_updates` (declared as a `fact_outputs` rule for append-only persistence; the compatibility view is updated automatically). - Cannot POST/PUT/DELETE to external APIs - `type: write` - Modifies external state, may not be idempotent - Examples: sync, send, notify, oauth_callback, delete - Default if type not specified (paranoid default) **Purpose:** Enables manifest-driven token scopes (`plugin:ro` vs `plugin:rw`) without hardcoding command knowledge in auth middleware. **Validation:** - `manifest_spec` — must be `ductile.plugin`. - `manifest_version` — must be `1`. - `protocol` — must match a version the core supports. Mismatch → plugin not loaded. - `entrypoint` — mandatory. Core constructs execution path relative to the discovered plugin directory. - `config_keys.required` — validated at load time. Missing keys → plugin not loaded, error logged. - `commands.*.type` — must be `read` or `write` if specified. Invalid type → plugin not loaded. See card #36 (Manifest Command Type Metadata). ### 5.5 Trust & Execution - Plugins MUST live under one of the configured plugin roots. Symlinks resolved, must resolve within an approved root. - `..` in `entrypoint` is rejected (path traversal prevention). - Entrypoint MUST be executable (`chmod +x`). Shebang line handles interpreter selection. - World-writable plugin directories are refused at load time. - Plugins run as the same OS user as the core. Use systemd `User=ductile` to limit blast radius. ### 5.6 Timeouts **Defaults:** | Command | Timeout | | -------- | ------- | | `poll` | 60s | | `handle` | 120s | | `health` | 10s | | `init` | 30s | **Enforcement:** 1. Core starts a deadline timer when spawning the process. 1. On timeout: `SIGTERM` to the process group. 1. 5-second grace period. 1. `SIGKILL` if still alive. 1. Job status → `timed_out`, follows retry policy. **Configurable per-plugin:** ```yaml plugins: slow-plugin: timeouts: poll: 300s handle: 300s ``` **Resource caps:** - Max stdout: 10 MiB captured. Exceeding this cap is a protocol/output failure; the captured prefix is kept for diagnostics. - Max stderr: 64 KiB captured for diagnostics. Excess stderr is truncated with a logged warning. ### 5.7 Retry & Backoff - Default: 4 attempts total (1 original + 3 retries). - Backoff: `base * 2^(attempt-1) + random(0, base)` where `base = 30s`. - Retry delays: ~30s, ~1m, ~2m (then dead). **Non-retryable conditions:** - Plugin exits with code `78` (EX_CONFIG from sysexits.h) — configuration error. - Plugin response may include `"retry": false`; core treats this as a compatibility signal, not plugin-owned policy. - All other failures are retried. **Configurable per-plugin:** ```yaml plugins: withings: retry: max_attempts: 5 backoff_base: 60s ``` ### 5.8 Circuit Breaker Configurable consecutive failure threshold per `(plugin, command)` pair. Applies to **scheduler-originated poll jobs only** — webhook-triggered `handle` jobs are not blocked by poll failures. - Default threshold: 3 consecutive failures. - Default reset: 30 minutes. - Manual reset: `ductile system reset `. - Inspect state and transition history: `ductile system breaker [--json]`. - States: `closed` -> `open` -> `half_open`. - When cooldown expires, scheduler allows a single half-open probe poll: - Success closes the circuit and resets failure count. - Failure reopens the circuit. ```yaml plugins: withings: circuit_breaker: threshold: 3 reset_after: 30m ``` ### 5.9 State Model **Config is static. Facts are durable. `plugin_state` is a compatibility view.** - `config` — from `config.yaml`, interpolated with env vars, read-only. Contains credentials, endpoints — things the operator sets. - Config paths (config dir, includes, backups) are local operator-controlled inputs; Ductile does not accept untrusted remote file paths. - `service.allow_symlinks` controls whether symlinks are permitted in config/plugin paths (warnings are always emitted when symlinks are detected). - `plugin_facts` — append-only record of durable plugin observations. Each row carries a stable snapshot the plugin emitted as `state_updates`, plus a `fact_type` declared in the plugin manifest's `fact_outputs` and a Ductile-owned monotonic `seq`. This is the durable record. See [PLUGIN_FACTS.md](https://ductile.run/PLUGIN_FACTS/index.md). - `plugin_state` — single JSON row per plugin maintained as a compatibility/cache view of the latest fact. Existing readers see the same shape they saw before facts existed. The view is rebuilt automatically by core when a fact lands, governed by the manifest's `compatibility_view` declaration (currently `mirror_object`). Plugins that have not declared `fact_outputs` still get write-through behaviour during the compatibility window; new plugins should declare `fact_outputs` rather than treating this row as their durable home. ```sql -- Append-only durable record (primary). plugin_facts ( id INTEGER PRIMARY KEY AUTOINCREMENT, seq INTEGER NOT NULL, -- Ductile-owned monotonic plugin_name TEXT NOT NULL, fact_type TEXT NOT NULL, job_id TEXT, command TEXT, fact_json JSON NOT NULL, created_at TEXT NOT NULL ); -- Compatibility/cache view of the latest fact (derived). plugin_state ( plugin_name TEXT PRIMARY KEY, state JSON NOT NULL DEFAULT '{}', updated_at TIMESTAMP ); ``` **Size limit:** 1 MB per `plugin_state` row. Exceeding this rejects the update and fails the job. The same limit constrains the snapshot a plugin emits, since the compatibility view mirrors it. ### 5.10 OAuth Plugins manage their own OAuth token lifecycle. The core does not understand OAuth. - `client_id`, `client_secret` → `config` (static, set by operator). - `access_token`, `refresh_token`, `token_expiry` → managed by the plugin and emitted as part of its `state_updates` snapshot. The plugin should declare a `fact_outputs` rule so each token-refresh observation is recorded append-only and the compatibility view stays current for downstream readers. - Plugin checks expiry on each invocation, refreshes if needed, returns new tokens via `state_updates`. - Shared OAuth helpers can live in `plugins/lib/`. ______________________________________________________________________ ## 6. Protocol (v2) ### 6.1 Request Envelope (core → plugin) Single JSON object written to plugin's stdin: ```json { "protocol": 2, "job_id": "uuid", "command": "poll | handle | health | init", "config": {}, "state": {}, "context": {}, "event": {}, "deadline_at": "ISO8601" } ``` - `event` — present only for `handle`. - `state` — the plugin's current compatibility-view row (the latest fact's snapshot, or write-through state for plugins not yet declaring `fact_outputs`). - `context` — shared metadata (Baggage) carried across the pipeline chain. - `deadline_at` — informational. Plugins MAY use it to abandon long-running work early. The core enforces the real deadline externally. ### 6.2 Response Envelope (plugin → core) Single JSON object written to plugin's stdout: ```json { "status": "ok | error", "result": "short human-readable summary", "error": "human-readable message (when status=error)", "retry": true, "events": [], "state_updates": {}, "logs": [] } ``` - `result` — required when `status=ok`. Summarizes what the plugin did. - `retry` — response-envelope compatibility signal. Defaults to `true` if omitted. Set `false` for permanent failures; core still owns the retry decision with exit status, attempts, and config as inputs. - `events` — array of event envelopes (see 6.3). - `state_updates` — the plugin's emitted snapshot. When the manifest declares a matching `fact_outputs` rule, core records this snapshot as an append-only `plugin_facts` row and rebuilds the compatibility view from it. Plugins without a declared `fact_outputs` rule get write-through into `plugin_state` directly during the compatibility window. - `logs` — array of `{"level": "info|warn|error", "message": "..."}`. Optional. Stored with the job record. ### 6.3 Event Envelope Every event emitted by a plugin in the `events` array: ```json { "type": "new_health_data", "payload": {}, "dedupe_key": "withings:weight:2026-02-08" } ``` - `type` — matches `event_type` in routing config. Exact string match. - `payload` — arbitrary JSON, passed to downstream plugin's `handle` command. - `dedupe_key` — optional. Downstream job inherits this as its `dedupe_key`. The core injects when creating downstream jobs: - `source` — plugin name. - `timestamp` — ISO8601. - `event_id` — UUID assigned by the core. ### 6.4 Framing Single JSON object on stdout. Not JSON Lines, not length-prefixed. One request, one response, process exits. ### 6.5 Protocol Mismatch If the request `protocol` field doesn't match what the plugin expects, the plugin SHOULD exit with code `78` (EX_CONFIG) and a clear error on stderr. The core refuses to load plugins whose manifest declares a `protocol` version it doesn't support. ______________________________________________________________________ ## 7. Routing Plugin chaining is declared in config, not by plugins. Plugins produce typed events; the config says where they go. ### 7.1 Config ```yaml routes: - from: withings event_type: new_health_data to: health-analyzer - from: health-analyzer event_type: alert to: notify ``` ### 7.2 Semantics - **Fan-out:** A single event can match multiple routes. All matching routes produce a job. - **No match:** Logged at DEBUG, dropped. Not an error. - **Matching:** Exact string match on `event_type` only. No wildcards, no regexes, no glob patterns. - **No conditional filters.** No `payload.severity == "high"`. If you need conditional logic, put it in the receiving plugin — it can inspect the payload and no-op. ### 7.3 Traceability When the router creates a downstream job from an event: - `parent_job_id` is set to the producing job's ID. - `source_event_id` is set to the core-assigned `event_id`. ______________________________________________________________________ ## 8. Pipelines (DSL) Pipelines provide a higher-level orchestration layer over raw routes, using a GitHub Actions-inspired notation. ### 8.1 Schema ```yaml pipelines: - name: youtube-summary on: discord.command.youtube # Trigger event type execution_mode: synchronous # Optional: async | synchronous timeout: 3m # Optional: duration (default 30s) steps: - id: download # Optional uses: youtube.download # plugin.command - id: summarize uses: fabric.summarize - id: notify uses: discord.respond ``` ### 8.2 Execution Modes - **async (default):** Fire-and-forget. The API returns `202 Accepted` with a `job_id` immediately. Dispatcher handles jobs as they come. - **synchronous (opt-in):** The API caller "stays on the line". The gateway waits for the entire execution tree (all steps) to reach a terminal state before responding with aggregated results. ### 8.3 Guarded Bridge The engine remains event-driven and asynchronous internally. Synchronous behavior is implemented as a "Guarded Bridge" at the API layer: 1. Dispatcher provides completion channels for job trees. 1. API handler blocks on these channels. 1. If `timeout` is exceeded, the bridge "breaks" and returns `202 Accepted` with the root `job_id`, allowing the client to poll for completion. ______________________________________________________________________ ## 9. API Endpoints The HTTP API allows external systems (LLMs, scripts, other services) to programmatically trigger plugin execution and retrieve job results. ### 9.1 Configuration ```yaml api: enabled: true listen: "localhost:8080" auth: tokens: - token: ${ADMIN_API_TOKEN} scopes: ["*"] ``` ### 9.2 Primary Trigger Endpoints The API exposes two first-class trigger paths: - `POST /plugin/{plugin}/{command}`: direct plugin execution (no pipeline routing), returns `202 Accepted`. - `POST /pipeline/{pipeline}`: explicit pipeline orchestration, returns `202 Accepted` by default and `200 OK` for synchronous pipelines. See `docs/API_REFERENCE.md` for full examples and response schemas. ### 9.3 GET /job/{job_id} Retrieves the status and results of a previously triggered job. **Request:** - URL param: `{job_id}` - UUID returned from one of the POST trigger endpoints - Header: `Authorization: Bearer ` **Response (200 OK - queued):** ```json { "job_id": "uuid-v4", "status": "queued", "plugin": "plugin_name", "command": "command_name", "created_at": "2026-02-09T10:00:00Z" } ``` **Response (200 OK - running):** ```json { "job_id": "uuid-v4", "status": "running", "plugin": "plugin_name", "command": "command_name", "started_at": "2026-02-09T10:00:05Z" } ``` **Response (200 OK - completed):** ```json { "job_id": "uuid-v4", "status": "completed", "plugin": "plugin_name", "command": "command_name", "result": { "status": "ok", "result": "Plugin executed successfully", "state_updates": {"last_run": "2026-02-09T10:00:10Z"}, "logs": [{"level": "info", "message": "Plugin executed successfully"}] }, "started_at": "2026-02-09T10:00:05Z", "completed_at": "2026-02-09T10:00:10Z" } ``` **Error Responses:** - `401 Unauthorized` - Missing or invalid token - `404 Not Found` - Job ID not found ### 9.4 Authentication & Authorization **Bearer token authentication** with scoped permissions. **Token registry** (`tokens.yaml`): - Multiple tokens with individual scope definitions - Each token references a scope file (JSON) - BLAKE3 hash ensures scope file integrity - Environment variable references for keys (never plaintext) **Scope types (current):** - `plugin:ro`, `plugin:rw` - Plugin and pipeline trigger permissions - `jobs:ro`, `jobs:rw` - Job read/write permissions - `events:ro`, `events:rw` - Event stream permissions - `*` - Full admin access **Example tokens.yaml:** ```yaml tokens: - name: admin-cli key: ${ADMIN_API_KEY} scopes_file: scopes/admin-cli.json scopes_hash: blake3:a3f8c2d9... - name: github-integration key: ${GITHUB_API_KEY} scopes_file: scopes/github-integration.json scopes_hash: blake3:b4e9d3c0... ``` **Example scope file (scopes/github-integration.json):** ```json { "scopes": [ "read:jobs", "read:events", "github-handler:rw", "withings:ro" ] } ``` **Authorization middleware:** 1. Extract bearer token from `Authorization` header 1. Lookup token in registry 1. Load and verify scope file (BLAKE3 hash check) 1. Normalize implied read-from-write scopes 1. Check if requested action matches any granted scope 1. Return 403 if denied, proceed if allowed Tokens should be stored in environment variables and interpolated (for example `${ADMIN_API_TOKEN}`). - All API requests must include `Authorization: Bearer ` header - Invalid or missing token returns `401 Unauthorized` - No key rotation mechanism in MVP (manual config update + reload) ### 9.5 Resource Guarding (Synchronous Pipelines) To prevent HTTP worker exhaustion, synchronous pipelines are governed by a semaphore: - **api.max_concurrent_sync:** Max number of simultaneous blocking API calls (default 10). - **api.max_sync_timeout:** Hard limit on pipeline timeout to prevent zombie connections. ### 9.6 Use Cases - **LLM Tool Calling:** LLM agents can call `/plugin` for atomic actions and `/pipeline` for orchestrated workflows - **External Automation:** Scripts, cron jobs, or other services can trigger plugins programmatically - **Result Polling:** External systems can poll /job/{id} to wait for async plugin execution completion - **Manual Testing:** Developers can trigger plugins via curl without waiting for scheduler ______________________________________________________________________ ## 10. Webhooks For operator setup and example requests, see [WEBHOOKS.md](https://ductile.run/WEBHOOKS/index.md). ### 10.1 Listener ```yaml webhooks: listen: 127.0.0.1:8081 endpoints: - path: /hook/github plugin: github-handler secret_ref: github_webhook_secret signature_header: X-Hub-Signature-256 max_body_size: 1MB ``` ### 10.2 Security HMAC-SHA256 signature verification is **mandatory** for all webhook endpoints. 1. Read raw request body (up to `max_body_size`, default 1 MB). 1. Resolve `secret_ref` from tokens.yaml and compute `HMAC-SHA256(secret, raw_body)`. 1. Compare against the signature header (configurable name per endpoint). 1. Reject with `403` if invalid. No error details in response. 1. Reject with `413` if body exceeds `max_body_size`. No replay protection in V1. No rate limiting in V1 (proxy responsibility if fronted by reverse proxy). ### 10.3 Health Endpoint `/healthz` on the webhook listener port: ```json { "status": "ok", "uptime_seconds": 3600, "queue_depth": 2, "plugins_loaded": 5, "plugins_circuit_open": 0 } ``` No authentication. Localhost only. Useful for systemd watchdog and operator checks. ______________________________________________________________________ ## 11. Operations ### 11.1 Single-Instance Lock PID file with `flock(LOCK_EX | LOCK_NB)`: 1. Create/open `/ductile.lock`. 1. Acquire `flock`. Fail → log error, exit 1. 1. Write current PID. 1. Lock held for process lifetime. Kernel releases on crash/exit. ### 11.2 Crash Recovery On startup: 1. Open the SQLite database. 1. Acquire the exclusive lock. 1. Find all jobs with `status = running` — orphans from a prior crash. 1. For each orphan: increment `attempt`, set `status = queued` if under `max_attempts`, else `status = dead`. 1. Log each recovered job at WARN level. 1. Resume normal dispatch. ### 11.3 Config Reload Send `SIGHUP` to the running process (found via PID file) to reload config. On SIGHUP: 1. Parse new config. If invalid → log error, keep old config. 1. In-flight jobs continue with existing config snapshot. 1. Scheduler updates intervals/jitter for all plugins. 1. Router updates routing rules. 1. Plugin config changes take effect on next dispatch. 1. Newly added plugins discovered → `init` runs. 1. Removed/disabled plugins → queued jobs cancelled (status → `dead`), no new jobs enqueued. ### 11.4 Logging **Core logs:** JSON to stdout. Fields: `timestamp`, `level`, `component`, `plugin` (when relevant), `job_id` (when relevant), `message`. **Plugin stderr:** Captured. Always. Stored in `job_log` (capped at 64 KB). Logged at WARN to core log stream. **Plugin stdout:** Reserved exclusively for protocol response. Stored verbatim on completion in `job_log.result` (JSON). Non-JSON on stdout is a protocol error — job fails, stderr + stdout captured for debugging. **Redaction:** Not in V1. Don't log secrets. Fix the plugin, don't bandage the core. ### 11.5 Job Log Retention Pruned on every scheduler tick: ```sql DELETE FROM job_log WHERE completed_at < datetime('now', '-30 days') ``` Default 30 days. Configurable via `service.job_log_retention`. ### 11.6 CLI ```text ductile system start # run the service (foreground) ductile run # manually run a plugin once ductile status # show plugin compatibility views, queue depth, last runs # send SIGHUP to reload config without restart ductile system reset # reset circuit breaker for a plugin ductile plugins # list discovered plugins ductile logs [plugin] # tail structured logs ductile queue # show pending/active jobs ``` ### 11.7 CLI Principles To ensure predictability and safety for both human and LLM operators, all CLI commands MUST adhere to the standards defined in `docs/CLI_DESIGN_PRINCIPLES.md`. Core requirements: - **Hierarchy:** Strict **NOUN ACTION** pattern. - **Verbosity:** mandatory `-v` / `--verbose` flags. - **Safety:** mandatory `--dry-run` for mutations. - **Machine-Readability:** mandatory `--json` for status and inspection. ______________________________________________________________________ ## 12. Database Schema ### 12.1 Tables ```sql -- Job queue (active and historical) job_queue ( id TEXT PRIMARY KEY, -- UUID plugin TEXT NOT NULL, command TEXT NOT NULL, -- poll | handle payload JSON, status TEXT NOT NULL, -- queued | running | succeeded | failed | timed_out | dead attempt INTEGER NOT NULL DEFAULT 1, max_attempts INTEGER NOT NULL DEFAULT 4, submitted_by TEXT NOT NULL, -- scheduler | webhook | route | cli dedupe_key TEXT, created_at TEXT NOT NULL, -- ISO8601 started_at TEXT, completed_at TEXT, next_retry_at TEXT, last_error TEXT, parent_job_id TEXT, -- FK to job_queue.id source_event_id TEXT -- UUID assigned by core ); -- Append-only durable plugin record (primary). plugin_facts ( id INTEGER PRIMARY KEY AUTOINCREMENT, seq INTEGER NOT NULL, -- Ductile-owned monotonic plugin_name TEXT NOT NULL, fact_type TEXT NOT NULL, -- e.g. ".snapshot" job_id TEXT, command TEXT, fact_json JSON NOT NULL, created_at TEXT NOT NULL ); -- Compatibility/cache view of the latest fact (derived). -- One row per plugin. Existing readers see the same shape as before facts existed. plugin_state ( plugin_name TEXT PRIMARY KEY, state JSON NOT NULL DEFAULT '{}', updated_at TEXT ); -- Job log (completed jobs for audit/debugging) job_log ( id TEXT PRIMARY KEY, plugin TEXT NOT NULL, command TEXT NOT NULL, status TEXT NOT NULL, result TEXT, -- protocol response JSON attempt INTEGER NOT NULL, submitted_by TEXT NOT NULL, created_at TEXT NOT NULL, completed_at TEXT NOT NULL, last_error TEXT, stderr TEXT, -- capped at 64 KB parent_job_id TEXT, source_event_id TEXT ); -- Circuit breaker state for scheduler poll guard circuit_breakers ( plugin TEXT NOT NULL, command TEXT NOT NULL, -- poll state TEXT NOT NULL, -- closed | open | half_open failure_count INTEGER NOT NULL DEFAULT 0, opened_at TEXT, -- ISO8601 last_failure_at TEXT, -- ISO8601 last_job_id TEXT, -- latest processed scheduler poll job id updated_at TEXT NOT NULL, -- ISO8601 PRIMARY KEY(plugin, command) ); -- Append-only circuit breaker transition facts. -- circuit_breakers remains the current-state compatibility/cache row. circuit_breaker_transitions ( id TEXT PRIMARY KEY, plugin TEXT NOT NULL, command TEXT NOT NULL, from_state TEXT, -- closed | open | half_open | NULL to_state TEXT NOT NULL, -- closed | open | half_open failure_count INTEGER NOT NULL DEFAULT 0, reason TEXT NOT NULL, -- failure_threshold | success | cooldown_elapsed | manual_reset job_id TEXT, created_at TEXT NOT NULL -- ISO8601 ); ``` ______________________________________________________________________ ## 13. Configuration Reference Ductile uses a **Monolithic Runtime** compiled from a modular, **Tiered Directory** structure. ### 13.1 Overview For the complete configuration specification, including file formats, merge logic, and integrity verification rules, see:\ 👉 **[docs/CONFIG_REFERENCE.md](https://ductile.run/CONFIG_REFERENCE/index.md)** ### 13.2 Key Principles - **Include-Based Modularity:** Configuration is loaded from `config.yaml` plus any files or directories listed in `include:`. - **Multi-Root Plugin Discovery:** `plugin_roots` is the source of truth; roots are scanned in order and first match wins on duplicate plugin names. - **Pipeline Discovery Flow:** Pipelines are loaded from included YAML files (or include directories) that define `pipelines:` entries. - **Tiered Integrity:** High-security files (auth/webhooks) require a valid BLAKE3 hash in `.checksums` to start. Operational files (settings/routes) log warnings if hashes are missing or mismatched. - **Monolithic Grafting:** At runtime, all included files are merged into a single internal configuration object following strict precedence rules (later entries override earlier ones). - **Environment Interpolation:** Secrets are injected via `${VAR}` placeholders, which are interpolated after hash verification but before parsing. - **Default Permissions:** Config directories are created with `0700`. Config files and lock files default to `0600`; operators may relax permissions explicitly for shared environments. - **Secret Redaction:** CLI config inspection outputs redact token keys and webhook secrets; secrets are only shown at creation time. ## 14. Deployment ### 14.1 Systemd Unit ```ini [Unit] Description=Ductile After=network.target [Service] Type=simple ExecStart=/usr/local/bin/ductile system start --config /etc/ductile/config.yaml ExecReload=/bin/kill -HUP $MAINPID Restart=on-failure User=ductile Group=ductile [Install] WantedBy=multi-user.target ``` ### 14.2 Development Run `ductile system start` directly. No systemd required. ______________________________________________________________________ ## 15. Project Layout ```text ductile/ ├── cmd/ │ └── ductile/ │ └── main.go ├── internal/ │ ├── config/ │ ├── queue/ │ ├── scheduler/ │ ├── dispatch/ │ ├── plugin/ │ ├── state/ │ ├── api/ │ ├── webhook/ │ └── router/ ├── plugins/ │ └── example/ │ ├── manifest.yaml │ └── run.py ├── config.yaml ├── go.mod ├── go.sum └── Makefile ``` ______________________________________________________________________ ## 16. Implementation Phases | Phase | Sprint | Scope | Status | | ------------------------- | ------ | ------------------------------------------------------------------------------------------------------------- | ---------------------------------------------- | | 1. Skeleton | 0 | Go scaffold, CLI, config loader, SQLite state, plugin discovery | ✅ Complete | | 2. Core Loop | 1 | Work queue, heartbeat scheduler with fuzzy intervals, dispatch loop, plugin protocol, crash recovery | ✅ Complete | | 3. API Triggers | 2 | HTTP server with chi router, POST /plugin and POST /pipeline, GET /job, Bearer token auth, job result storage | ✅ Complete | | 4. Routing | 3 | Config-declared event routing, downstream enqueuing, event_id traceability | ✅ Complete | | 5. Webhooks | 3 | HTTP listener, HMAC verification, /healthz, route inbound webhooks to plugins | ✅ Complete | | 6. Reliability Controls | 4 | Circuit breaker, retry with exponential backoff, deduplication enforcement | ✅ Complete | | 7. Pipeline Orchestration | 4 | Sync/Async execution modes, Guarded Bridge, YAML DSL, completion channels | ✅ Complete | | 8. CLI & Ops | 5 | Status/run/reload/reset/plugins/queue/logs commands, systemd unit | 🔄 In Progress (Status: ✅ Status implemented) | | 9. First Plugins | 6 | Port Withings & Garmin from existing Ductile, notify plugin | Planned | **Note:** Phase 3 (API Triggers) was prioritized before Routing and Webhooks to enable LLM-driven automation via curl-based triggers. This allows external systems to programmatically enqueue jobs and retrieve results immediately, accelerating the path to production use cases. ______________________________________________________________________ ## 17. Deferred Decisions | Topic | Rationale | | ---------------------------------------------------- | ----------------------------------------------------------------------- | | Two-tier stderr/stdout caps (capture vs persistence) | Current spec is workable. Clarify post-V1 if storage becomes a concern. | | `protocol` field in response envelope | Accretive addition; back-compatible with plugins that omit it. | | Replay protection for webhooks | Provider-specific. Add per-plugin if a provider requires it. | | Rate limiting on webhook listener | Proxy responsibility. Core doesn't duplicate concerns it can't own. | | Secret redaction in logs | Operator responsibility. Fix the plugin, don't bandage the core. | | Streaming / long-lived plugin mode | Out of scope permanently. If it needs to stream, it's not a plugin. | | Priority queues / multi-lane dispatch | Revisit only if daily jobs exceed 500 or median wait exceeds 30s. | | Router query language / payload filters | Put conditional logic in the receiving plugin. | # Ductile: Pipelines & Orchestration (DSL Reference) Ductile uses a YAML-based Domain Specific Language (DSL) to define event-driven workflows. Pipelines transform atomic **Connectors** into complex, multi-hop **Orchestrations**. ______________________________________________________________________ ## 1. Top-Level Structure A pipeline file (e.g., `pipelines.yaml`) contains an array of pipeline definitions. ```yaml pipelines: - name: my-workflow # Required: Unique identifier on: my.event.type # Required: Trigger event type execution_mode: async # Optional: async (default) | synchronous timeout: 30s # Optional: For synchronous execution steps: # Required: Sequential steps - uses: my-plugin ``` ______________________________________________________________________ ## 2. Pipeline Properties | Field | Type | Description | | ---------------- | --------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `name` | String | A unique name for the pipeline. Used for logging and API triggers. | | `on` | String | The event type that triggers this pipeline. Must match exactly. | | `on-hook` | String | Lifecycle signal that triggers this pipeline (`job.completed` / `job.failed` / `job.timed_out`). Mutually exclusive with `on`. | | `from_plugin` | String | **Optional source-plugin selector.** When set, the trigger or hook signal only matches when the upstream source plugin is exactly this plugin. Empty (default) preserves today's behaviour — match regardless of source plugin. See §2.2. | | `if` | Condition | **Optional pipeline-level trigger predicate.** Evaluated against the event's payload (and the upstream job's accumulated durable context, when available) after the trigger/hook name match; a false result skips dispatch entirely. Same shape as step-level `if:` (see §3.6). | | `max_depth` | Integer | **Optional author-set route depth cap.** Overrides the auto-computed cap. `0` means *unlimited*. Negative values are rejected at config load. | | `execution_mode` | Enum | `async` (fire-and-forget) or `synchronous` (API blocks for result). | | `timeout` | Duration | Max time to wait for a `synchronous` pipeline (e.g., `5s`, `2m`). | | `steps` | Array | The list of steps to execute in order. | ### 2.1 Pipeline-level `if:` vs. step-level `if:` Both `if:` blocks share the same predicate engine — atomic `path/op/value` plus `all/any/not`. They differ in *where* they evaluate: | Surface | Evaluated when | Scope | Effect on false | | ----------------------- | -------------------------------------------------- | ------------------------------------- | -------------------------------------------------------------------- | | Pipeline-level `if:` | Trigger/hook name has matched, before any dispatch | `payload`, `context` (when available) | No dispatch at all — no `core.switch`, no plugin spawn | | Step-level `if:` (§3.6) | At each step, after upstream steps run | `payload`, `context`, `config` | Step bypassed via internal `core.switch`; downstream steps still run | Use pipeline-level `if:` to **suppress dispatch** when an event isn't relevant to a pipeline at all. Use step-level `if:` to **gate a step** within a pipeline that is otherwise running. #### Context availability at trigger time `context.*` paths in a pipeline-level `if:` resolve against the upstream job's *accumulated durable context* — the same baggage view that downstream pipeline steps see. Context is available when the routed event was emitted by a plugin running inside an existing pipeline; it is empty for events from the scheduler tick, webhook ingress, and direct API triggers. A predicate that tests `context.*` against an absent context simply evaluates to false (no special-case error), the same way a payload predicate against an absent key returns false. For hook pipelines (`on-hook:`), context is currently empty at hook fire time because hooks fire only for root jobs that have no upstream context of their own. The predicate engine and runtime plumbing accept `context.*` paths in hook predicates so authors can prepare for future architectures that surface upstream context at hook time. A pipeline may use **both** in the same definition. ```yaml - name: repo-changelog on: git_repo_sync.completed if: # pipeline-level: skip dispatch when no work path: payload.new_commits op: eq value: true steps: - id: changelog uses: changelog_microblog - id: commit uses: git_commit_push if: # step-level: only commit if the step before path: payload.changed # actually produced changes op: eq value: true ``` `max_depth` is a separate concern: it caps how many internal `core.switch` hops a pipeline may chain before the runtime considers the route exhausted. Author-setting it is rare; the auto-computed value is correct in almost all cases. Set `max_depth: 0` only when you have a deliberate need for unbounded recursion through `call:`, and you have read §6.4 of this doc. #### Hook-trigger predicate (`on-hook:` + `if:`) Lifecycle hook pipelines (`on-hook: job.completed | job.failed | job.timed_out`) fire for **every** matching lifecycle event across the whole runtime. Without a predicate, this is fundamentally noisy. A pipeline-level `if:` is the correct surface for scoping a hook pipeline: ```yaml - name: notify-on-real-failure on-hook: job.failed if: not: path: payload.plugin op: in value: [check_youtube, jina-reader] # known-noisy plugins steps: - uses: discord_notify ``` Hook predicates evaluate against the lifecycle event's payload, which includes the plugin name, status, attempt count, and other lifecycle fields documented in §9. ### 2.2 `from_plugin:` source-plugin selector Pipelines that fire from plugin-emitted events (`on:`) or lifecycle hooks (`on-hook:`) can be scoped to a single upstream plugin with the optional `from_plugin:` field. When set, the route only matches when the event's source plugin is exactly the named plugin. ```yaml - name: page-on-claude-failure on-hook: job.failed from_plugin: claude_harvest steps: - uses: pagerduty_notify ``` `from_plugin:` is a positive assertion: an empty source plugin (e.g. webhook ingress, scheduler tick) never matches a route that declares one. Use this to keep a hook pipeline silent for unrelated plugins without smuggling the filter through `if:` against `payload.plugin`. `from_plugin:` and `if:` compose. The selector is checked first; the predicate runs only when the source plugin matches. ```yaml - name: page-on-high-severity-claude-failure on-hook: job.failed from_plugin: claude_harvest if: path: payload.severity op: eq value: high steps: - uses: pagerduty_notify ``` For multi-plugin scoping, prefer either multiple narrowly-scoped pipelines or an `if:` predicate against `payload.plugin`. The single-plugin selector is intentional; a generic multi-value source matcher is deferred. #### Inspection The richer compiled-route shape (including `source_plugin` and `if`) is exposed via the `GET /config/view` API endpoint under the `compiled_routes` key, keyed by pipeline name. Operators can use this to answer: - what signal does this route match? - does it require a source plugin? - what predicate is evaluated? - what depth guard exists? ______________________________________________________________________ ## 3. Step Types Each step in a pipeline must perform exactly **one** of the following actions: ### 3.1 `uses` (Invoke Plugin) Calls a specific plugin or alias. This is the most common step. ```yaml steps: - id: download-step # Optional: Unique ID within the pipeline uses: youtube-dl ``` ### 3.2 `call` (Invoke Pipeline) Calls another pipeline by name, inheriting the current baggage. This promotes logic reuse. ```yaml steps: - call: standard-summarizer ``` ### 3.3 `steps` (Nested Sequence) Groups multiple steps together. Useful for organization or within a `split`. ```yaml steps: - steps: - uses: step-1 - uses: step-2 ``` ### 3.4 `split` (Parallel Fan-out) Executes multiple steps or sub-pipelines in parallel. Branches share the same **Baggage** but otherwise execute independently. Plugins that need per-branch filesystem isolation manage it themselves (e.g. via `mktemp -d`). ```yaml steps: - uses: processor - split: - uses: discord-notifier - uses: s3-archiver ``` ### 3.5 `relay` (Remote Event Relay) `relay` delivers a projected event to a named remote Ductile instance. The step is declarative: it refers to `relay-instances.yaml` by stable instance name and does not expose URLs or secrets in pipeline logic. ```yaml steps: - id: relay-to-lab relay: to: lab event: backup.ready dedupe_key: payload.archive_id with: archive_id: payload.archive_id archive_path: payload.archive_path checksum: payload.checksum baggage: trace_id: context.trace_id ``` Rules: - `to` is the outbound relay instance name. - `event` is the remote event type. - `dedupe_key` is optional and resolves from `payload.*` or `context.*`. - `with` is the remote event payload projection. If omitted, the current event payload is relayed. - `baggage` is an optional explicit projection into the relay envelope baggage. ### 3.6 `if` (Conditional Step Execution) A step may include an optional structured `if` object. Authored conditions compile into an internal `core.switch` hop. The switch evaluates the condition against the current scope and then either dispatches the gated step or bypasses it without spawning the gated plugin. `if` must be exactly one of: - atomic predicate: `path`, `op`, optional `value` - `all: [...]` - `any: [...]` - `not: ` Atomic example: ```yaml steps: - uses: discord-notify if: path: payload.status op: eq value: error ``` Composite example: ```yaml steps: - uses: long-video-handler if: all: - path: payload.kind op: eq value: video - path: payload.duration_sec op: gte value: 30 ``` Supported operators in v1: - `exists` - `eq` - `neq` - `in` - `gt` - `gte` - `lt` - `lte` - `contains` (case-insensitive string contains) - `startswith` (case-insensitive string prefix) - `endswith` (case-insensitive string suffix) - `regex` (Go regexp full-string match; use inline flags like `(?i)` for case-insensitive patterns) Path roots allowed in v1: - `payload.*` - `context.*` - `config.*` Semantics: - typing is strict - numeric operators require numeric operands - string operators require string path values and string comparison values - no implicit string-to-number coercion - missing paths resolve to absent for `exists`, otherwise compare as `null` - invalid conditions fail at pipeline load time - branch decisions are observable as internal `ductile.switch.true` / `ductile.switch.false` events - a `false` result bypasses the step and continues from the nearest downstream route ### 3.7 `with` (Payload Remap for `uses` Steps) `with` lets a `uses` step add or override top-level payload keys immediately before the plugin is spawned. ```yaml steps: - id: notify uses: discord_notify with: message: "{payload.stdout}" channel: "{context.origin_channel_id}" summary: "Build finished: {payload.status}" ``` Rules: - `with` is only valid on `uses` steps. - Each value is evaluated against a snapshot of the merged `payload.*` and `context.*` scope. - `context.*` values only exist if an upstream step claimed them with `baggage`. - A pure reference such as `{payload.count}` preserves the original type. - A mixed template such as `Build: {payload.status}` produces a string. - `with` entries do not see each other's output. They all read from the same pre-remap snapshot. - Invalid paths or malformed templates fail the job. Ductile does not silently substitute `null` or `""`. ### 3.8 `baggage` (Explicit Durable Context for `uses` Steps) `baggage` names the facts that should survive beyond the immediate plugin request. It is only valid on `uses` steps. Payload is per-hop. A plugin may emit useful fields, but those fields are not durable unless the pipeline author claims them with `baggage`. Plugin manifests help authors choose these mappings. Names-only `values.consume` says what request payload names a command consumes, and `values.emit` says what event payload names a command emits. The author still chooses durable names: ```yaml # plugin manifest commands: - name: handle type: write values: consume: - payload.url emit: - event: content_ready values: - payload.url - payload.content_hash - payload.truncated ``` ```yaml # pipeline steps: - id: summarize uses: fabric baggage: web.url: payload.url web.content_hash: payload.content_hash web.truncated: payload.truncated ``` ```yaml steps: - id: process uses: content_processor baggage: content.text: payload.content content.input_status: payload.status - id: notify uses: discord_notify baggage: processor.result: payload.result processor.exit_code: payload.exit_code with: message: "{payload.result}" ``` Rules: - `baggage` is only valid on `uses` steps. - Mapping keys are durable dotted paths such as `content.text` or `processor.result`. - Mapping values are source expressions resolved from `payload.*` or `context.*`. - Missing source paths fail the job or trigger. Ductile does not silently skip missing durable claims. - Durable context is deep-accreted. A downstream step may add new paths, but may not change an inherited path to a different value. - Repeating the same inherited value is allowed. Bulk import is available when an object should be promoted under a named namespace: ```yaml steps: - id: transcribe uses: whisper baggage: from: payload.metadata namespace: whisper ``` This imports `payload.metadata` as `context.whisper.*`. The namespace is required until plugin manifest default namespaces exist. Without a namespace, Ductile rejects the claim rather than placing generic keys at the durable root. Use `baggage` for durable facts and `with` for the next plugin request. These are separate concerns: ```yaml steps: - id: notify uses: discord_notify baggage: status.current: payload.status with: message: "Status changed to {payload.status}" ``` In this example, `status.current` is durable. `message` is just the request sent to `discord_notify`. ______________________________________________________________________ ## 4. How Data Flows ### 4.1 Filesystem (Plugin-managed) - Ductile core does not provision a workspace directory for jobs. - If Step A needs to hand a file to Step B, the producing plugin writes to a path it chooses (e.g. under `~/.cache//` or a `mktemp -d`) and the path is propagated as baggage via `with:`. - See `docs/PLUGIN_DEVELOPMENT.md` §9 for plugin-side guidance. ### 4.2 The Control Plane (Baggage) - Metadata (JSON) is stored in the `event_context` database table. - Every step receives durable context claimed by upstream steps. - New durable facts are claimed explicitly with `baggage`. - Existing durable paths are immutable: descendants may add new paths or repeat the same value, but may not rewrite inherited facts. - If a step does not declare `baggage`, it contributes no new durable facts. Its event payload is still the immediate input to downstream routing and plugin execution, but it is not written into `event_context` implicitly. ### 4.3 Results & Payloads - The event `payload` from Step A is passed to Step B as the immediate payload. - `with` can reshape that immediate payload before the plugin is spawned. - `baggage` can promote selected immediate payload fields into durable context. - In `synchronous` mode, the final API response aggregates the results from every step. - **Synthetic events:** If a pipeline step completes successfully but emits no events, Ductile routes a synthetic `ductile.step.succeeded` event to ensure downstream sequential steps are still triggered. ______________________________________________________________________ ## 5. Decision Making Ductile supports two kinds of decision making: ### 5.1 Native step gating with `if` Use `if` when you want to decide whether a step should run based on the current payload, accumulated context, or plugin config. Internally Ductile inserts a `core.switch` decision hop so the branch is explicit and observable. ### 5.2 Event-driven branching Ductile also supports **Event-Driven Branching**. A plugin decides the next path by choosing which event type to emit. 1. **Step 1:** Plugin `classifier` inspects data. 1. **Output:** Plugin emits `type: "image.detected"` or `type: "text.detected"`. 1. **Routing:** You define two pipelines—one `on: image.detected` and one `on: text.detected`. Use this when the plugin is making a domain decision about what happened. Use `if` when the pipeline is making a structural decision about whether a step should run. ______________________________________________________________________ ## 6. Dispatcher Preflight Before spawning a plugin process, the dispatcher runs a **preflight phase** for every job. Preflight separates orchestration decisions from plugin execution, ensuring consistent data-plane semantics regardless of whether a step is user-defined or an internal orchestration primitive such as `core.switch`. ### 6.1 Preflight Steps Preflight executes two operations in order: 1. **Load request context** — Fetches accumulated baggage from the `event_context` table (all upstream metadata for this job's execution tree). 1. **Prepare for execution** — User-defined `uses` steps may apply `with` remaps after the governance payload/context merge. Internal `core.switch` jobs evaluate the compiled condition and emit `ductile.switch.true` or `ductile.switch.false`. ### 6.2 Preflight Outcomes | Outcome | When | Effect | | -------- | ----------------------------------------------------------- | ---------------------------------------------------- | | **run** | Context loaded successfully | Plugin process or internal builtin executes normally | | **skip** | Reserved for explicit orchestration skip paths | Rare for authored `if:` pipelines | | **fail** | Context load, remap, or builtin evaluation returns an error | Job marked `failed`; no downstream routing | ### 6.3 Conditional Branch Routing When a compiled `if:` step is reached, the dispatcher runs the internal `core.switch` job. That job: 1. Evaluates the compiled condition against `payload.*`, `context.*`, and `config.*`. 1. Emits either `ductile.switch.true` or `ductile.switch.false`. 1. Lets the router dispatch either the gated step or the bypass path. Successor routing still happens before the deciding job is marked terminal, preventing synchronous callers from seeing the tree as complete before all children are enqueued. ### 6.4 Preflight Events The dispatcher emits a `job.preflight` event after preflight completes (or fails), with the following payload: ```json { "job_id": "uuid", "plugin": "plugin-name", "command": "command-name", "decision": "run | skip | fail", "reason": "" } ``` The `reason` field is empty for `run` decisions, contains the condition failure reason for `skip`, and contains the error message for `fail`. These events enable async consumers (TUI, event streams, monitoring) to distinguish orchestration decisions from plugin execution outcomes. ______________________________________________________________________ ## 8. Lifecycle Hooks (`on-hook`) Lifecycle hooks allow pipelines to trigger based on **system events** (e.g., job completion) rather than plugin-emitted events. Hook pipelines run as independent root jobs and do not inherit context from the job that triggered them. ### 8.1 DSL Syntax Use the `on-hook:` keyword instead of `on:`. These keywords are mutually exclusive. ```yaml pipelines: - name: notify-on-failure on-hook: job.completed steps: - uses: discord-notify if: path: payload.status op: neq value: succeeded ``` ### 8.2 Supported Signals | Signal | Triggered When | | --------------- | ------------------------------------------------------------------------------------ | | `job.completed` | A root job reaches a terminal state (`succeeded`, `failed`, `timed_out`, or `dead`). | ### 8.3 Opt-in Configuration To prevent accidental infinite loops and reduce noise, plugins must explicitly opt-in to lifecycle hooks in their configuration. ```yaml plugins: my-important-plugin: notify_on_complete: true # Required for on-hook: job.completed to fire ``` ______________________________________________________________________ ## 9. Failure States & Event Payloads When a job fails, times out, or becomes "dead" (exceeds retries), Ductile emits specialized events. These events include enhanced payloads to simplify downstream notifications. ### 9.1 Enhanced Payload Fields In addition to standard fields like `job_id` and `duration_ms`, failure events (`job.failed`, `job.timed_out`, `job.dead`) include: | Field | Description | Example | | --------- | -------------------------------------------------------------- | ----------------------------------------- | | `plugin` | The name of the plugin that failed. | `git-sync` | | `message` | A human-readable summary of the failure. | `Job failed [git-sync]: connection reset` | | `text` | An alias for `message` (convenience for notification plugins). | `Job failed [git-sync]: connection reset` | | `error` | The raw error message (if available). | `connection reset` | ### 9.2 Usage in Pipelines These fields enable simple notification steps without complex `if` logic or payload mapping: ```yaml pipelines: - name: failure-announcer on-hook: job.completed steps: - uses: discord-notify if: path: payload.status op: neq value: succeeded # discord-notify automatically uses payload.message if present ``` ______________________________________________________________________ ## 10. Validation Ductile performs several checks when loading pipelines: - **Cycle Detection:** Refuses to start if a pipeline calls itself (directly or indirectly). - **Shadowing:** Ensures two pipelines don't use the same name. - **Dangling Calls:** Ensures every `call` references a valid pipeline name. - **Condition Validation:** Verifies `if` trees have valid shape, supported operators, allowed roots, and safe depth/count limits. - **Schema Validation:** Verifies the YAML structure against the official [pipelines.json](https://ductile.run/PIPELINES/schemas/pipelines.schema.json). # Ductile — Routing & Orchestration Specification **Version:** 1.0\ **Date:** 2026-02-11\ **Model:** Governance Hybrid (DB-only) > **Note:** the original spec described a Data Plane consisting of core-managed workspace directories. The core no longer provisions per-job workspaces; filesystem state is the plugin's concern. Sections below referring to `workspace_dir` are retained for historical context but no longer describe runtime behaviour. ______________________________________________________________________ ## 1. Overview Ductile uses a **Graph-based Pipeline** model to orchestrate event flow. It separates **Governance** (metadata/context) from **Execution** (plugin-spawned subprocesses). ### 1.1 Core Components - **Control Plane (DB):** A SQLite ledger (`event_context`) that accumulates metadata ("Baggage") across hops. - **Filesystem (Plugin-managed):** Plugins that need a scratch path or persistent cache create and manage it themselves; the core does not provision a per-job directory. - **Orchestrator (DSL):** A YAML-based Pipeline DSL that supports nesting, branching, and single-root triggers. ______________________________________________________________________ ## 2. Pipeline DSL Pipelines are defined in YAML files referenced via `include:` in `config.yaml` (files or directories). ### 2.1 Syntax ```yaml pipelines: - name: wisdom-chain on: discord.video_link_received # The "Single Root" trigger steps: - id: downloader uses: yt-dlp-plugin - id: processing call: standard-audio-wisdom # Nested Pipeline call - id: delivery split: # Branching logic - uses: discord-notifier - steps: # Sequential branch - uses: s3-archiver - uses: db-indexer ``` ### 2.2 Functional Blocks - **uses:** Execute a specific plugin command. - **call:** Execute another named pipeline (reusable middleware). - **split:** Branch execution into multiple parallel paths. - **on:** The event that triggers the root of the pipeline. - **on-hook:** The lifecycle signal that triggers the root of the pipeline (e.g., `job.completed`). Mutually exclusive with `on`. ______________________________________________________________________ ## 2.3 Lifecycle Hooks Lifecycle hooks allow for out-of-band orchestration triggered by the **Dispatcher** rather than a plugin event. 1. **Opt-in:** A plugin must have `notify_on_complete: true` in its operator configuration. 1. **Signal:** When the job reaches a terminal state, the Dispatcher resolves any pipelines matching the signal (e.g., `job.completed`). 1. **Isolation:** Hook pipelines run as fresh root jobs with no context inheritance from the triggering job. ______________________________________________________________________ ## 3. The Control Plane (Baggage & Ledger) Every job in a pipeline is associated with an `event_context`. ### 3.1 `event_context` Schema ```sql CREATE TABLE event_context ( id TEXT PRIMARY KEY, -- UUID parent_id TEXT, -- FK for lineage pipeline_name TEXT, step_id TEXT, accumulated_json JSON NOT NULL, -- The "Baggage" created_at TEXT NOT NULL ); ``` ### 3.2 Explicit Context Accumulation Baggage is explicit: plugins emit event payloads; pipeline authors decide which values become durable. When Step A transitions to Step B: 1. Core reads `accumulated_json` from Step A's context. 1. If Step B declares `baggage`, Core evaluates those claims against the immediate event `payload.*` and inherited `context.*`. 1. Core deep-accretes the claimed values into a new `event_context` row for Step B. 1. Existing durable paths are immutable. A step may add a new path or repeat the same value, but may not rewrite an inherited path. Example: ```yaml steps: - id: fetch uses: web_fetch baggage: web.url: payload.url - id: summarize uses: summarizer baggage: web.content: payload.content web.status_code: payload.status_code ``` Bulk import is allowed only under an explicit namespace: ```yaml baggage: from: payload.metadata namespace: whisper ``` This imports `payload.metadata` as `context.whisper.*`. Omitting `namespace` is rejected until plugin manifest default namespaces exist. If a step declares no `baggage`, Core creates no new durable context for that hop beyond inherited baggage and control-plane fields. Immediate event payload still flows to downstream steps, but it is not promoted into `event_context` implicitly. ______________________________________________________________________ ## 4. Filesystem (Plugin-managed) The core does not provision per-job workspace directories. The previous "Data Plane" section described a hard-linked, janitor-pruned `/ws/` tree; that machinery has been removed. Plugins that need filesystem state are responsible for it: - **Ephemeral scratch:** `mktemp -d` (or language equivalent), cleaned up on exit. - **Persistent cache:** `~/.cache/ductile-/` or a path declared in plugin config and validated at startup. - **Step-to-step file passing:** the producing plugin writes to a path it chooses; the path is propagated as baggage via the pipeline's `with:` remap so the consuming plugin can read it. See `docs/PLUGIN_DEVELOPMENT.md` §9 for details. ______________________________________________________________________ ## 5. The Plugin Protocol (v2) Plugins receive the following via `stdin`: ```json { "protocol": 2, "job_id": "uuid-456", "context": { "origin_plugin": "discord", "channel_id": "123", "permission_tier": "WRITE" }, "event": { "type": "video_downloaded", "payload": { "filename": "lecture.mp4", "size_bytes": 10485760 } } } ``` ### 5.1 Plugin Responsibilities - **Metadata:** Read durable facts and routing info from `context`. - **Artifacts:** Read/write files at plugin-managed paths (see §4). - **Communication:** Emit event payloads for downstream steps. Payload is per-hop; values become durable only when a pipeline author claims them with `baggage`. ______________________________________________________________________ ## 6. Failure & Recovery ### 6.1 State Persistence Because the `event_context` is in SQLite, a crash is non-destructive for the control plane. * The **LLM Operator** can inspect the `event_context` to see exactly where a pipeline stalled. * The Core can "Replay" a step by creating a new job using the existing `event_context_id`. Plugin-managed filesystem state is the plugin's concern to recover. ### 6.2 Cycle Detection The Core maintains a `hop_count` in the `event_context`. If a pipeline exceeds 20 hops (or calls itself recursively too deep), the Core kills the chain to prevent infinite loops. ______________________________________________________________________ ## 7. CLI & Operations All orchestration-related CLI commands MUST support the following flags to ensure safety and observability: - **-v, --verbose:** Expose internal DAG resolution, baggage merging logic, and path calculations. - **--dry-run:** Preview the next steps of a pipeline without enqueuing jobs. ### 7.1 LLM Operator Affordances (RFC-004) The Routing system exposes specific "Admin Utilities" for the LLM: * `job inspect `: Returns the full Graph of what happened. * `pipeline visualize `: Returns a Mermaid.js diagram of the DSL. * `pipeline dry-run `: Executes the plugin in a sandbox; any filesystem isolation is the plugin's responsibility. ## 8. Branching & Decisions Ductile supports two models for decision making: **Step-Gating (DSL)** and **Multi-Event Branching (Plugin)**. ### 8.1 Step-Gating via `if` Pipelines can use the `if` keyword on any step to decide whether it should run based on the current payload, accumulated context, or plugin configuration. ```yaml - id: notifier uses: discord-notifier if: path: payload.status op: eq value: error ``` Authored `if:` conditions compile into an internal `core.switch` hop. That hop emits `ductile.switch.true` or `ductile.switch.false`, so the gated step only runs on the true branch while the false branch bypasses directly to the downstream route. ### 8.2 Multi-Event Branching For complex domain-level decisions, plugins are responsible for emitting specific **Event Types** to signal different outcomes. **Example Pipeline:** ```yaml - id: validator uses: schema-checker # The router matches the emitted event type to the next pipeline or step. ``` This pattern keeps the DSL declarative while offloading complex logic to the plugins. # Scheduler Detailed reference for Ductile's scheduler behavior and schedule configuration. ## Overview The scheduler runs a single heartbeat loop. On each tick it evaluates all enabled plugin schedules and enqueues due jobs. Each schedule entry is tracked independently in the `schedule_entries` table so catch-up and next-run behavior are stable across restarts. Jobs are always enqueued as the schedule's `command` (default: `poll`) and include the schedule's `payload`. ## Schedule Entry Fields ```yaml plugins: example: schedules: - id: hourly command: poll every: 1h jitter: 30s catch_up: run_once if_running: skip only_between: "08:00-18:00" timezone: "Australia/Sydney" not_on: [saturday, sunday] payload: source: scheduler ``` ### Common Fields - `id`: Unique schedule ID within the plugin (default: `default`). - `command`: Command to run (default: `poll`). - `payload`: JSON object merged into the command payload. ### Schedule Types Exactly one of the following should be set: - `every`: Interval schedule (supports `5m`, `15m`, `30m`, `hourly`, `2h`, `daily`, `weekly`, `monthly`). - `cron`: Standard 5-field cron (`min hour dom month dow`). - `at`: One-shot RFC3339 timestamp (UTC or offset). - `after`: One-shot delay from service start (duration). ## Time Constraints These constraints are applied before enqueueing a due job. - `jitter`: Random offset applied to interval schedules per run. - `only_between`: Time window in local schedule time (e.g. `"08:00-22:00"`). - Supports overnight windows such as `"22:00-06:00"`. - `timezone`: IANA timezone used for cron and time window evaluation. - `not_on`: Weekdays to skip (string names like `saturday` or integers `0-6`, `7` for Sunday). ## Catch-up Policy On startup, the scheduler can run missed ticks based on `catch_up`: - `skip` (default): Ignore missed intervals. - `run_once`: Enqueue a single catch-up job if any ticks were missed. - `run_all`: Enqueue one job per missed interval (bounded to 100 runs). Catch-up applies only to `every` schedules. Catch-up jobs use a `catchup`-scoped dedupe key to avoid duplication. ## Overlap Policy `if_running` controls what happens when a prior job is still in-flight: - `skip` (default): Do not enqueue a new job. - `queue`: Enqueue regardless of in-flight jobs. - `cancel`: Cancel outstanding jobs for the same plugin/command, then enqueue. ## Poll Guard A global per-plugin guard prevents multiple concurrent scheduled polls: ```yaml plugins: example: max_outstanding_polls: 1 ``` If a matching `queued` or `running` job exists, the scheduler skips enqueueing. ## Circuit Breaker Scheduler-originated polls respect the circuit breaker: - Opens after `threshold` consecutive failures. - Remains open for `reset_after`. - Half-open probe allows one poll; success closes the circuit. - Current state is stored in `circuit_breakers`; append-only history is stored in `circuit_breaker_transitions`. - Operators can inspect history with `ductile system breaker [--json]`. ```yaml plugins: example: circuit_breaker: threshold: 3 reset_after: 30m ``` ## State Tracking Schedule state is stored in `schedule_entries`: - `last_fired_at`: Last time the scheduler attempted to enqueue. - `last_success_at` / `last_success_job_id`: Latest successful run. - `next_run_at`: Next due timestamp. - `status`: `active`, `paused_invalid`, `paused_manual`, `exhausted`. One-shot schedules (`at`, `after`) transition to `exhausted` after firing. ## Examples ### Cron with timezone ```yaml plugins: reports: schedules: - id: weekdays-9am cron: "0 9 * * 1-5" timezone: "Australia/Sydney" ``` ### One-shot at ```yaml plugins: reminder: schedules: - id: send-once at: "2026-03-15T14:00:00Z" ``` ### Only between + not_on ```yaml plugins: poller: schedules: - every: 5m only_between: "08:00-18:00" not_on: [saturday, sunday] ``` # Plugin Facts This document is the canonical reference for durable plugin memory in Ductile. **The model:** durable plugin truth is the append-only `plugin_facts` stream. `plugin_state` is a compatibility/cache view of the latest fact, kept current automatically by core so that legacy readers see the same shape they always have. New plugins declare `fact_outputs` in their manifest and let the view come for free. ## 1. What A Plugin Does A plugin that needs to remember anything across invocations follows this pattern: 1. The plugin emits a successful, stable snapshot in `state_updates`. 1. The plugin manifest declares that snapshot as a fact output. 1. Core records the snapshot as an append-only row in `plugin_facts`, with a Ductile-owned monotonic `seq` and the declared `fact_type`. 1. Core rebuilds the compatibility `plugin_state` row from the newest fact according to the declared `compatibility_view` (currently `mirror_object`). This means: - `plugin_facts` is the durable record. - `plugin_state` is the compatibility/cache view. ```text Previously, plugins wrote durable truth directly into `plugin_state` via shallow merge of `state_updates`. That model is legacy; new plugins should always declare `fact_outputs`. Plugins still on direct write-through are running in a compatibility window. ``` ## 2. Migrated Plugins In-tree (codex repo): - `file_watch` `poll` → `file_watch.snapshot` - `folder_watch` `poll` → `folder_watch.snapshot` - `py-greet` `poll` → `py-greet.snapshot` - `ts-bun-greet` `poll` → `ts-bun-greet.snapshot` - `stress` `state` → `stress.state_snapshot` External plugins: - `gmail_poller` `poll` → `gmail_poller.snapshot` - `youtube_playlist` `poll` → `youtube_playlist.snapshot` - `jina-reader` `poll` → `jina-reader.snapshot` - `birdnet_firstday` `poll` → `birdnet_firstday.snapshot` - `sqlite_change` `poll` → `sqlite_change.snapshot` - `withings` `poll` and `token_refresh` → `withings.snapshot` `health` commands are intentionally **not** part of the durable fact flow. Health is diagnostic and should not mutate durable state. ## 3. Compliance Rules If you want a plugin to be compatible with this pattern, the plugin and core need a clear, defensible contract. ### Plugin-side rules - Emit facts only from commands that produce meaningful durable truth. - Prefer successful `poll` or equivalent snapshot-producing commands. - Do not use `health` or `init` as durable state — they should emit no `state_updates`. - Keep the emitted snapshot shape stable and explicit. - Return a full snapshot, not a partial patch. The compatibility view is rebuilt wholesale from the latest fact, so partial patches lose information. - Keep the snapshot JSON object-shaped and deterministic enough for operators to inspect; avoid non-deterministic ordering inside lists or maps. ### Core-side rules - Declare an explicit fact type in `manifest.yaml`. - Record each fact append-only in `plugin_facts`. - Declare how compatibility `plugin_state` is derived from that fact. - Add an operator-visible read path. - Add tests that prove both fact persistence and derived compatibility state. The smallest useful manifest shape is: ```yaml fact_outputs: - when: command: poll from: state_updates fact_type: file_watch.snapshot compatibility_view: mirror_object ``` ## 4. Recommended Fact Shape Use a fact when the plugin can answer: > "What is the current durable observed state of this plugin right now?" Good candidates: - watcher snapshots - cursors/checkpoints - discovered remote resource inventories - reducer-friendly state snapshots Poor candidates: - transient health checks - ephemeral timing/latency noise - values that are meaningful only to a single in-flight job For a first migration, prefer a **full snapshot** over incremental diffs. ## 5. Snapshot Examples ### `file_watch` `file_watch poll` returns a snapshot shaped like: ```json { "watches": { "single-file": { "exists": true, "fingerprint": "abc123", "size": 42, "mtime_ns": 1713740000000000000, "path": "/tmp/file.txt", "strategy": "sha256", "updated_at": "2026-04-22T01:02:03Z" } }, "last_poll_at": "2026-04-22T01:02:03Z" } ``` Core then: - stores that JSON in `plugin_facts.fact_json` - assigns a Ductile-owned `plugin_facts.seq` for new facts - tags it `file_watch.snapshot` - updates `plugin_state` for `file_watch` to the same snapshot shape This keeps legacy state readers working while giving operators an append-only history. ### `folder_watch` `folder_watch poll` returns the same top-level compatibility shape: ```json { "watches": { "docs": { "root": "/srv/content", "files": { "summary.md": "abc123" }, "snapshot_hash": "def456", "file_count": 1, "updated_at": "2026-04-22T01:02:03Z" } }, "last_poll_at": "2026-04-22T01:02:03Z" } ``` ### `py-greet` and `ts-bun-greet` The example greeting plugins emit a tiny full snapshot: ```json { "last_run": "2026-04-22T01:02:03Z", "last_greeting": "Hello, Ductile!" } ``` ### `stress` The `stress state` command emits the full compatibility snapshot for its only durable datum: ```json { "count": 42 } ``` ## 6. Migration Checklist For Another Plugin When migrating another plugin to `plugin_facts`, do all of the following: 1. Choose one command that produces durable truth. 1. Define one explicit fact type. 1. Make the plugin emit a stable object snapshot. 1. Ensure the snapshot is a full compatibility-state view, not just a partial patch. 1. Add `fact_outputs` to the plugin manifest. 1. Declare the compatibility view policy for `plugin_state`. 1. Add operator inspection support. 1. Add unit tests for persistence and derived state. 1. Add a Docker or similarly realistic fixture when runtime behavior matters. 1. Document the fact type, snapshot shape, and non-goals. ## 7. Questions To Resolve Before Adding A New Plugin Or Migrating One Before declaring `fact_outputs` for a plugin, answer: - What exact command owns durable truth? - Is the emitted JSON a full snapshot or only a delta? It must be a full snapshot — partial patches break the compatibility view. - Should the compatibility view mirror the newest fact exactly (`compatibility_view: mirror_object`), or does the plugin need a different reduction policy? Today only `mirror_object` is supported; a reducer-based policy would be a future extension. - What data should remain diagnostic only and stay out of durable storage? - How will an operator inspect recent facts? - What realistic test proves the fact path end to end? If those answers are vague, the plugin should remain on direct write-through (action-bookkeeping non-candidates) rather than declaring a half-thought fact contract. ## 8. Deployment Note For existing databases, apply required schema migrations before a normal deploy, then restart or deploy the updated binary. For non-empty existing databases, startup should validate and fail if required schema is missing. It should not silently add `plugin_facts`, `seq`, or related indexes during normal open. Startup errors should name the migration script needed for the current database shape. Existing rows without `seq` keep `seq` as `NULL`. Ductile does not backfill guessed order for legacy facts; new rows use `seq` for ordering, and legacy rows fall back to their previous timestamp order. # Building # Ductile Cookbook: Integration Patterns Practical recipes for wiring Ductile **Connectors** and **Orchestrations** to solve real-world problems. ______________________________________________________________________ ## Pattern: Automated Astro Staging Rebuild (Watch -> Trigger) **Use case:** Rebuild your Astro staging site automatically whenever new AI-generated summaries are added to a specific folder. ### 1) Configure the `folder_watch` Connector Set up a **Proactive Operation** (`poll`) to scan your content directory. ```yaml # ~/.config/ductile/plugins.yaml plugins: folder_watch: enabled: true schedules: - id: default every: 1m config: watches: - id: astro_summaries root: ${HOME}/site/src/content/summaries event_type: astro.summaries.changed recursive: true include_globs: ["**/*.md"] emit_mode: aggregate ``` ### 2) Define the Rebuild Orchestration (Pipeline) Create a **Pipeline** to respond to the `astro.summaries.changed` event. ```yaml # ~/.config/ductile/pipelines.yaml pipelines: - name: astro-rebuild-on-change on: astro.summaries.changed steps: - id: rebuild_staging uses: astro_rebuild_staging # A sys_exec connector clone ``` ### 3) Configure the Rebuild Connector ```yaml # ~/.config/ductile/plugins.yaml plugins: astro_rebuild_staging: enabled: true timeout: 15m config: command: "docker compose -f ${HOME}/admin/docker-compose.yml up -d --build" working_dir: "${HOME}/admin" ``` ______________________________________________________________________ ## Pattern: YouTube Playlist-to-Summary Pipeline **Use case:** Automatically fetch, transcribe, and AI-summarise new videos from a YouTube playlist, then write the result to disk. This is a multi-hop pipeline where each step passes its output (`result`) as the next step's `content` via baggage. ### 1) Configure the Playlist Watcher (Proactive) ```yaml # ~/.config/ductile/plugins.yaml plugins: youtube_playlist: enabled: true schedules: - every: 30m jitter: 2m timeout: 60s max_attempts: 2 config: playlist_url: "https://www.youtube.com/playlist?list=PL5Rty1LvKaJ5GI4nqEzvEPTdgobgODlkk" output_dir: "/home/matt/tmp/ductile-output" filename_template: "{video_id}.md" max_entries: 50 max_emit: 1 # Only process one new video per run emit_existing_on_first_run: false transcript_language: en ``` ### 2) Define the Processing Pipeline ```yaml # ~/.config/ductile/pipelines.yaml pipelines: - name: playlist-wisdom on: youtube.playlist_item steps: - id: transcript uses: youtube_transcript # fetches transcript, result = raw text - id: summarize uses: fabric # summarises transcript, result = markdown summary - id: write uses: file_handler # writes summary to disk ``` ### 3) Configure Supporting Plugins ```yaml # ~/.config/ductile/plugins.yaml plugins: youtube_transcript: enabled: true timeout: 60s max_attempts: 2 config: {} fabric: enabled: true timeout: 120s max_attempts: 2 config: FABRIC_DEFAULT_PATTERN: "summarize" file_handler: enabled: true timeout: 30s max_attempts: 1 config: allowed_write_paths: "/home/matt/tmp/ductile-output" default_output_dir: "/home/matt/tmp/ductile-output" ``` ### How it works 1. `youtube_playlist` polls the playlist every 30 min (with up to 2 min jitter). 1. For each new video, it emits a `youtube.playlist_item` event with `video_id` in the payload. 1. `youtube_transcript` fetches the transcript; its output (`result`) flows into the next step. 1. `fabric` summarises the transcript using the `summarize` pattern; its `result` becomes the markdown summary. 1. `file_handler` writes the summary to `{video_id}.md` in the output directory. ### Notes - Set `max_emit: 1` to process one new video per poll cycle — avoids burst load on initial run. - `emit_existing_on_first_run: false` means already-seen videos are skipped on restart. - `youtube_playlist` uses `yt-dlp --flat-playlist` internally; ensure yt-dlp is installed and on PATH. - For systemd services, add `~/.local/bin` to the service `Environment="PATH=..."` line. ______________________________________________________________________ ## Pattern: Discord Notifications via Incoming Webhook **Use case:** Send messages to a Discord channel from any pipeline step or scheduled trigger. The `discord_notify` plugin wraps Discord's incoming webhook API. It exposes two schedulable-friendly commands: - `handle` — called from pipeline steps (event-driven) - `poll` — identical behaviour, but allowed in `schedules:` blocks (ductile forbids `handle` in schedules) ### 1) Configure the Plugin ```yaml # ~/.config/ductile/plugins.yaml plugins: discord_notify: enabled: true timeout: 15s max_attempts: 2 config: webhook_url: "${DISCORD_WEBHOOK_URL}" # or hard-code in tokens.yaml default_username: "Ductile" ``` ### 2) Use in a Pipeline Step The plugin reads `message`, `content`, `result`, or `title` from the payload (in that order). ```yaml pipelines: - name: notify-on-build on: build.complete steps: - id: notify uses: discord_notify # payload.result from the previous step becomes the Discord message ``` Or pass a static message: ```yaml pipelines: - name: notify-on-error on: job.failed steps: - id: alert uses: discord_notify payload: message: "A job failed — check the dashboard." ``` ### 3) Use as a Scheduled Heartbeat `poll` is the schedulable alias for `handle`. Use it with `schedules:` blocks to send timed notifications. ```yaml plugins: discord_notify: schedules: # Daily 09:00 status ping - id: morning-ping cron: "0 9 * * *" command: poll payload: message: "Good morning — Ductile is running." not_on: [saturday, sunday] # Startup one-shot - id: boot-notify after: 30s command: poll payload: message: "Ductile started." ``` ### Notes - Discord hard-limits messages to 2000 characters; the plugin truncates automatically. - 4xx errors (bad webhook, forbidden) are not retried. 5xx and network errors retry per `max_attempts`. - `poll` and `handle` are identical in behaviour; the distinction is purely for ductile's scheduler validation. - Store the webhook URL in `tokens.yaml` and reference it via `${VAR}` interpolation to keep it out of operational config files. ______________________________________________________________________ ## Pattern: End-to-End: Playlist → Summary → Discord Notification **Use case:** Combine the two patterns above — automatically process new playlist videos and notify a Discord channel when a summary is written. ```yaml # ~/.config/ductile/pipelines.yaml pipelines: - name: playlist-wisdom on: youtube.playlist_item steps: - id: transcript uses: youtube_transcript - id: summarize uses: fabric - id: write uses: file_handler - id: notify uses: discord_notify # After file_handler, result contains the output path or confirmation. # Or set a static title: payload: title: "New summary ready" # message will fall through to context.result from the write step ``` This gives you an automated, end-to-end pipeline with Discord confirmation for every new video processed. ______________________________________________________________________ ## Pattern: Route YouTube vs Web URLs **Use case:** Use the `if` classifier to emit different event types based on URL content. ### 1) Configure the classifier instance ```yaml # ~/.config/ductile/plugins.yaml plugins: check_youtube: enabled: true timeout: 30s max_attempts: 1 config: field: text checks: - contains: "youtu.be" emit: youtube.url.detected - contains: "youtube.com" emit: youtube.url.detected - startswith: "http" emit: web.url.detected - default: text.received ``` ### 2) Use it in a pipeline ```yaml # ~/.config/ductile/pipelines.yaml pipelines: - name: ai-dispatch on: discord.ai.command steps: - id: classify uses: check_youtube - name: youtube-wisdom on: youtube.url.detected steps: - uses: youtube_transcript - uses: fabric - uses: file_handler - name: web-summarize on: web.url.detected steps: - uses: fabric ``` ### Notes - `default` is a final fallback; omit it if you want no-match to error. - The plugin passes the payload through unchanged. ______________________________________________________________________ ## Adding your own patterns Each recipe follows the same structure: configure a plugin, define a pipeline, wire the events. If you have a working integration worth sharing, add it here. # Plugin Development Guide Ductile is built on a **spawn-per-command** model. A plugin is any executable that reads one JSON request from `stdin`, writes one JSON response to `stdout`, and exits. There is no daemon, no shared memory, no in-process state. **Durable plugin memory is the append-only `plugin_facts` stream.** A plugin that needs to remember anything across invocations declares a `fact_outputs` rule in its manifest, returns a stable snapshot from its durable command, and lets core record that snapshot append-only and rebuild the compatibility view automatically. This guide treats the manifest as the contract that drives plugin quality — every directive is explained below and exists to push you toward the correct shape. If you find yourself wanting to do something the manifest doesn't sanction, that is usually a signal to step back rather than add a workaround. See [Plugin Facts](https://ductile.run/PLUGIN_FACTS/index.md) for the canonical reference and worked examples of the durability contract. ______________________________________________________________________ ## 1. The Lifecycle When a job is triggered (via scheduler, API, or webhook): 1. Ductile forks the plugin entrypoint as a fresh process. 1. The core writes a **request envelope** (JSON) to the plugin's `stdin`. 1. The plugin processes the command and writes a **response envelope** (JSON) to `stdout`, then exits. 1. Ductile captures `stderr` for logging and kills the process if it exceeds the timeout. Because every invocation is a fresh process, the plugin has no in-memory state across calls. Anything the plugin needs to remember must come back through the request envelope's `state` field on the next invocation. ______________________________________________________________________ ## 2. The plugin protocol ### 2.1 Request Envelope (Core → Plugin) ```json { "protocol": 2, "job_id": "uuid", "command": "poll | handle | health | init", "config": {}, "state": {}, "context": {}, "event": {}, "deadline_at": "ISO8601" } ``` | Field | What it is | | ------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `protocol` | The wire-protocol version. Plugins declare which version they expect in `manifest.protocol`; mismatch refuses load. | | `job_id` | Ductile-assigned unique id for this invocation. Useful in logs and downstream events. | | `command` | The command the plugin is being asked to run. Always one of `poll`, `handle`, `health`, `init` (plus any plugin-declared command name). | | `config` | The static plugin config from the operator's YAML, with `${ENV}` interpolated. Read-only. | | `state` | The plugin's current compatibility-view row — i.e. the latest fact's snapshot for plugins that declare `fact_outputs`, or the direct-write `plugin_state` row for plugins that have not yet migrated. Treat it as *"what I knew last time."* | | `context` | Shared baggage carried across the pipeline chain. Operator-declared, immutable in the receiving plugin. | | `event` | Present only for `handle`. The triggering event envelope from upstream. | | `deadline_at` | Informational ISO8601 timestamp. Plugins may abandon long work early; core enforces the real deadline externally. | ### 2.2 Response Envelope (Plugin → Core) ```json { "status": "ok | error", "result": "short human-readable summary", "error": "human-readable message (when status=error)", "retry": true, "events": [], "state_updates": {}, "logs": [] } ``` | Field | What it is | | --------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | `status` | `ok` for success, `error` for any failure. | | `result` | **Required when `status=ok`.** Short human-readable summary of what happened. Surfaces in `ductile job inspect`, the watch UI, and as the result for synchronous pipelines. | | `error` | **Required when `status=error`.** Human-readable diagnostic. | | `retry` | Response-envelope compatibility signal. Defaults `true` if omitted. Set `false` only when retrying the same request cannot succeed (configuration error, permanent input invalid). Core owns the final retry decision; this is a *fact about the failure*, not a policy instruction. | | `events` | Array of `{type, payload, dedupe_key?}` envelopes that drive downstream pipeline routing. | | `state_updates` | The plugin's emitted **snapshot** for this invocation. When the manifest declares a matching `fact_outputs` rule, core records this snapshot append-only as a `plugin_facts` row and rebuilds the compatibility view (`plugin_state`) from it. See §3.4. | | `logs` | Array of `{level, message}`. Stored with the job record. | ### 2.3 What `state_updates` Is, And What It Is Not `state_updates` is the snapshot of the plugin's **observed durable state at the end of this invocation**. It is not a partial patch and it is not a running diary of actions taken. A correct snapshot: - Is a full object representing the plugin's durable observed state. - Contains the same keys every invocation that command runs (presence-stable). - Is deterministic: the same observed inputs produce the same bytes out. - Has a clear cache-view story: a downstream reader of the latest snapshot understands what the plugin knows. An incorrect snapshot (anti-patterns — see §6): - A counter that increments each invocation (`executions_count`). - A timestamp that updates whether or not anything was observed (`last_run`). - A diff or patch (`{"new_id": "abc"}`). - An ordered set built from `set()` union (non-deterministic order). If a plugin emits action bookkeeping rather than observed state, it should emit no `state_updates` at all. Action bookkeeping belongs in `job_log`, which is captured automatically. ### 2.4 Framing And Errors - One JSON request on stdin → one JSON response on stdout. Not JSON Lines, not length-prefixed. - Exit code `78` (EX_CONFIG) marks a permanent configuration failure and is treated as non-retryable regardless of the `retry` field. - If the request `protocol` field doesn't match what the plugin expects, the plugin should exit `78` with a clear error on stderr. ______________________________________________________________________ ## 3. The Manifest (`manifest.yaml`) The manifest is the single source of truth for what the plugin is, what it does, and how its memory works. Treat reading this section top-to-bottom as a quality checklist for any new plugin. ### 3.1 Top-Level Fields ```yaml manifest_spec: ductile.plugin # required manifest_version: 1 # required name: my_plugin # required version: 0.1.0 # required protocol: 2 # required entrypoint: run.py # required description: "What this plugin does, in one sentence." # optional but recommended concurrency_safe: true # optional; default true commands: [...] # required, at least one fact_outputs: [...] # required for any plugin with durable memory config_keys: # optional; declares config contract required: [...] optional: [...] ``` #### `manifest_spec` (required) Must be the literal string `ductile.plugin`. Identifies this YAML as a ductile plugin manifest. Future manifest families (e.g. an event spec) would use a different identifier. #### `manifest_version` (required) Must be the integer `1`. Ductile uses this to evolve manifest semantics accretively without breaking existing plugins. #### `name` (required) The plugin's identity. Must be unique across all plugin roots — first plugin discovered with a given name wins; later duplicates are ignored. Use underscores or hyphens, no spaces. Pipelines, schedules, and routes refer to the plugin by this name. #### `version` (required) The plugin's release identity over time. Free-form string; prefer semver-compatible `MAJOR.MINOR.PATCH`. Bump when behaviour changes so operators can correlate facts and job logs to plugin version. #### `protocol` (required) Must be `2`. Declares the wire protocol version this plugin understands. Mismatch refuses load; do not lie about protocol support. #### `entrypoint` (required) Path to the executable, relative to the plugin directory. Must be marked executable (`chmod +x`). The shebang line picks the interpreter. No `..` allowed (path traversal prevention). Examples: `run.py`, `run.sh`, `./bin/dispatcher`. #### `description` (optional, recommended) Short human-readable summary of what the plugin does. Surfaces in operator inspection and LLM-driven tools. Treat it as the answer to *"what does this plugin do?"* in one sentence. #### `concurrency_safe` (optional, default `true`) Concurrency hint. `false` tells the runtime that the plugin is **not** safe to run two of in parallel — typically because it owns a single-writer durable resource (a SQLite DB it writes to, an OAuth token table) and parallel execution would race the writer. When `false`, runtime defaults to serial execution unless the operator explicitly overrides with `plugins..parallelism > 1`. If you have any doubt, set `false`. Concurrency-safe is a property the plugin author asserts and the runtime trusts. #### `commands` (required) Array of command declarations. Every command the plugin can be invoked with must be listed, with at least `name` and `type`. See §3.2. #### `fact_outputs` (recommended for any plugin with durable memory) Declares which commands emit durable facts and how the compatibility view is rebuilt. **If your plugin needs to remember anything across invocations, declare this.** See §3.4. #### `config_keys` (optional) Declares the static config contract: ```yaml config_keys: required: [client_id, client_secret, db_path] optional: [request_timeout, lookback_days] ``` `required` keys missing at load time refuse the plugin to load. `optional` keys are documented for operators but not enforced. Keep this list honest — it is the contract the operator's YAML satisfies. ### 3.2 The `commands` Array Each command is a pure function on `(config, state, context, event) → response`. The manifest declares the command's identity, its side-effect class, its input/output shape, and its retry properties. ```yaml commands: - name: poll type: read description: "Fetch latest detections; emit one event per first-of-day species." idempotent: true retry_safe: true input_schema: {} output_schema: status: string events: array state_updates: object values: consume: [] emit: - event: birdnet.firstday_species values: - payload.scientific_name - payload.common_name - payload.first_id - payload.detected_at ``` #### `name` (required) The command's identity inside this plugin. Standard names that the runtime recognises: `poll`, `handle`, `health`, `init`. Plugins may declare additional names (e.g. `token_refresh` in `withings`); those are invocable via API and schedules but do not get the standard-command convenience routing. | Standard name | Purpose | Typical `type` | | ------------- | ---------------------------------------------------------------------------------------------------- | ----------------- | | `poll` | Scheduled durable observation. Emits events on observed change; emits a snapshot in `state_updates`. | `read` | | `handle` | Event-driven response. Receives an upstream event, processes it, optionally emits downstream events. | `write` (usually) | | `health` | Diagnostic check. **Emits no `state_updates`.** | `read` | | `init` | Capability discovery / affordance bundle for LLM tools. **Emits no `state_updates`.** | `read` | #### `type` (required) `read` or `write`. This is about **external observable side effects**, not about whether the command emits durable facts: - `type: read` — no external POST/PUT/DELETE. Idempotent under retry. Examples: `poll`, `fetch`, `get`, `list`, `health`. A `read` command can still emit `state_updates` (the durable observation snapshot) and can still write to a local SQLite DB the plugin owns; the constraint is on external state. - `type: write` — modifies external state via the network. Examples: `sync`, `send`, `notify`, `oauth_callback`, `delete`. Default if `type` is omitted (paranoid default). `type` determines the token scope required to invoke the command (`plugin:ro` vs `plugin:rw`). #### `description` (optional) Short human-readable summary of what this command does. Critical for the TUI, the watch UI, and LLM operators reading capability discovery. #### `idempotent` (optional, boolean) Hint that calling this command N times produces the same observable result as calling it once, given identical inputs. Used by the runtime to make safer retry decisions. Be honest: a `sync` that posts measurements to a remote API is not idempotent unless the remote API deduplicates. #### `retry_safe` (optional, boolean) Hint that this command is safe to retry on transient failure. Stronger than `idempotent` in practice because it accounts for partial-side-effect risk during retry. Default to `false` if you are unsure. #### `input_schema` / `output_schema` (optional, legacy) Either a full JSON Schema object or a compact `field: type` map. Documents the request payload and response shape for API consumers and operators. The compact form expands automatically: ```yaml input_schema: url: string depth: integer ``` These remain useful as a typed surface but are not the durability contract — that is `values` plus `fact_outputs`. #### `values` (optional but recommended) Names-only payload contract — the *Hickey-faithful* successor to typed schemas for pipeline authoring. `values.consume` declares which payload names this command reads from the request. `values.emit` declares, per emitted event type, which payload names the event carries. ```yaml values: consume: - payload.url - payload.depth emit: - event: jina_reader.scraped values: - payload.url - payload.text - payload.content_hash ``` Rules: - Entries are payload **names**, not types. Format: `payload.` or `payload..` for nested keys; `payload.*` matches all. - Pipeline authors use `with:` to remap durable context into the request payload a downstream plugin expects, and `baggage:` to claim which event payload names become durable context. The plugin's `values` declaration is a sanity-aid for that authoring. - `values` does not decide durability. Durability is decided by pipeline `baggage:` (for event payloads becoming context) and by `fact_outputs` (for `state_updates` becoming `plugin_facts`). ### 3.3 `fact_outputs` — The Durability Declaration This is the directive that decides whether your plugin participates in the append-only fact model. ```yaml fact_outputs: - when: command: poll from: state_updates fact_type: my_plugin.snapshot compatibility_view: mirror_object ``` A `fact_outputs` rule says: *"when command `poll` succeeds, take its emitted `state_updates`, record it append-only as a `my_plugin.snapshot` fact, and rebuild the `plugin_state` row by mirroring the snapshot."* #### `when.command` (required) The command name whose successful response produces this fact. One rule per command-that-emits-durable-state. A plugin may declare multiple rules (e.g. `withings` declares one for `poll` and one for `token_refresh`, because both observe durable state). #### `from` (required) Currently must be the literal string `state_updates`. The fact is sourced from the plugin's emitted snapshot. Future protocol versions may add other sources (e.g. a structured `facts` field); they will be accretive additions, not breaking changes. #### `fact_type` (required) The fact's identity. Convention: `.`, where the noun describes the kind of observation. Most migrated plugins use `.snapshot`. Use a different noun only if the plugin emits distinct kinds of observation that downstream readers should differentiate. #### `compatibility_view` (optional, default `mirror_object`) How `plugin_state` is rebuilt from the latest fact. Currently the only supported value is `mirror_object`: replace `plugin_state.state` wholesale with the latest fact's `fact_json`. This is exactly what legacy readers expect, so the migration is transparent. Future view policies (e.g. a reducer that folds multiple facts) would be added as new enum values; today, `mirror_object` is the right answer. ### 3.4 The Plugin Memory Model In One Diagram ```text plugin emits state_updates snapshot │ ▼ ┌───────────────────────────────────────────────┐ │ core (manifest fact_outputs rule) │ └───────────────────────────────────────────────┘ │ ┌──────────────────────┴──────────────────────┐ ▼ ▼ plugin_facts (append-only, plugin_state (compatibility view, the durable record): rebuilt automatically): one row per invocation, one row per plugin, {seq, fact_type, fact_json, ...} {plugin_name, state, updated_at} ``` The plugin author writes only the snapshot. Core does the rest. The compatibility view exists so legacy readers (the request envelope's `state` field, operator inspection, schedules that read prior state) keep working without change. ______________________________________________________________________ ## 4. Worked Examples ### 4.1 Minimal Plugin (no durable memory) A plugin that emits a single event when its `health` is checked. No durable state, no `fact_outputs` needed. **`plugins/notify_echo/manifest.yaml`:** ```yaml manifest_spec: ductile.plugin manifest_version: 1 name: notify_echo version: 0.1.0 protocol: 2 entrypoint: run.sh description: "Emits an echo event when polled. Stateless." concurrency_safe: true commands: - name: poll type: read description: "Emits one notify_echo.tick event." idempotent: true retry_safe: true values: consume: [] emit: - event: notify_echo.tick values: - payload.message - payload.emitted_at - name: health type: read description: "Reports plugin reachability." idempotent: true retry_safe: true config_keys: optional: [message] ``` **`plugins/notify_echo/run.sh`:** ```bash #!/usr/bin/env bash set -euo pipefail REQUEST=$(cat) COMMAND=$(echo "$REQUEST" | jq -r '.command') MESSAGE=$(echo "$REQUEST" | jq -r '.config.message // "tick"') case "$COMMAND" in poll) cat < str: return datetime.now(timezone.utc).isoformat() def snapshot_state(*, last_result, last_checked_at, last_triggered_at): """Pure constructor for the full compatibility snapshot. Every field is required at every call site — the helper never inherits silently. Callers that don't observe a field this invocation pass the prior state value explicitly. """ return { "last_result": last_result, "last_checked_at": last_checked_at, "last_triggered_at": last_triggered_at, } def poll_command(request): config = request.get("config", {}) state = request.get("state", {}) # Observe durable state. with sqlite3.connect(config["db_path"]) as conn: result = conn.execute(config["query"]).fetchone() scalar = str(result[0]) if result else None timestamp = now_iso() triggered = scalar != state.get("last_result") events = [] if triggered: events.append({ "type": config["event_type"], "payload": { "result": scalar, "previous_result": state.get("last_result"), "detected_at": timestamp, }, }) # Build the snapshot. The full compatibility-view shape is emitted every # time, even for fields this invocation did not change — the helper # guarantees that. return { "status": "ok", "result": f"observed result={scalar} triggered={triggered}", "events": events, "state_updates": snapshot_state( last_result=scalar, last_checked_at=timestamp, last_triggered_at=timestamp if triggered else state.get("last_triggered_at"), ), "logs": [{"level": "info", "message": f"polled: {scalar}"}], } def main(): request = json.load(sys.stdin) command = request.get("command") if command == "poll": response = poll_command(request) elif command == "health": response = {"status": "ok", "result": "healthy"} else: response = {"status": "error", "error": f"Unknown command: {command}", "retry": False} json.dump(response, sys.stdout) if __name__ == "__main__": main() ``` Notice: - `fact_outputs` declares `sqlite_change.snapshot` mirrored from `poll`'s `state_updates`. That single declaration is what makes core record an append-only fact every poll and rebuild `plugin_state` automatically. - `concurrency_safe: false` because the plugin owns a single-writer observation cycle. - `snapshot_state` is a pure constructor. Every field is explicit at every call site — no sentinel-`None` overlay, no implicit inheritance from prior state. The caller carries `last_triggered_at` forward by reading it from `state` and passing it explicitly. - The snapshot has the **same three keys every invocation**. A no-change poll emits a snapshot byte-identical to the prior one, which keeps `plugin_facts` free of reordering noise. - `health` does not return `state_updates` and does not mutate durable state. ______________________________________________________________________ ## 5. The `health` And `init` Pattern `health` and `init` are diagnostic-only. Neither should emit `state_updates`. The reasons are concrete: - `health` runs from the watch UI, the operator's CLI, and circuit-breaker half-open probes. If `health` mutated durable state, every diagnostic click would create a new fact with no observed change. - `init` returns an LLM/tool affordance bundle for capability discovery. Its output is a function of static metadata, not observed state. If a plugin's `health` or `init` is currently emitting `state_updates`, that is a bug — remove the emission. The first post-deploy `poll` or `token_refresh` will replace `plugin_state` wholesale via `mirror_object`, sweeping any historical residue. ______________________________________________________________________ ## 6. What Does Not Belong In `state_updates` These are the explicit non-candidates for `state_updates` / `fact_outputs`. None of them should live in `state_updates` or in a `fact_outputs` rule. They belong in `job_log` (which captures all of them automatically) or nowhere at all. | Pattern | Why it's wrong | | ---------------------------------------------- | ------------------------------------------------------------------------------------------------- | | `last_run`, `last_invoked_at` | Action trace. Updates whether or not anything was observed. Use `job_log` for run history. | | `executions_count`, `total_calls` | Monotonic counter of actions taken. Not observed durable state. | | `last_pattern`, `last_prompt`, `last_video_id` | Single-field "the most recent thing I did" markers. Diagnostic, not durable observation. | | `last_summary`, `last_error_message` | Action diagnostics. Belongs in logs. | | `last_health_check`, `last_init_at` | Diagnostic timestamps from non-mutating commands. | | Diff or partial-patch shapes | The compatibility view is rebuilt wholesale; partial patches lose information on the next mirror. | | Lists derived from `set()` union | Non-deterministic order produces a different snapshot on every poll even when nothing changed. | If your plugin has a candidate field and you're not sure whether it's observed state or action bookkeeping, ask: *"if a downstream reader reads this field, are they learning about an external observation, or about my plugin's own behaviour?"* External observation belongs in the snapshot. Plugin behaviour does not. ______________________________________________________________________ ## 7. Event Payload Convention Plugins should follow standard payload field conventions for interoperability. These are *event* payload conventions, not state conventions — they live alongside the durability model, not in conflict with it. ### 7.1 Standard Fields | Field | Type | Purpose | Used By | | ------------- | ------ | ----------------------------------- | --------------------------------------------------- | | `text` | string | Primary text content for processing | **Required** if producing text for downstream steps | | `result` | string | Final human-readable output | Terminal plugins (fabric, summarizers) | | `source_url` | string | Originating URL | Web scrapers, YouTube fetchers | | `source_type` | string | Content origin hint | All plugins | ### 7.2 Source Types - `web` — web page content (jina-reader) - `youtube` — YouTube video transcript - `file` — local file content - `llm` — LLM-generated content (fabric, claude, etc.) ### 7.3 Event Type Naming `.`. Examples: - `jina_reader.scraped` - `youtube_transcript.fetched` - `fabric.completed` - `file_handler.read` - `file_handler.written` ### 7.4 Pipeline Integration The core dispatcher automatically propagates these payload names from input events to output events: - `pattern`, `prompt`, `model` - `output_dir`, `output_path`, `filename` Plugins **do not** need to manually copy these fields — the dispatcher handles propagation. Just emit your event with the standard fields and the pipeline DSL takes care of the rest. ### 7.5 Baggage (Context) Fallback Only payload names claimed by a pipeline's `baggage:` declaration become durable context entries in the `event_context` ledger. Downstream plugins receive that accumulated baggage in `request.context`. If a field may be produced by an upstream step, prefer: 1. Read from `event.payload` (step-specific input). 1. Fall back to `request.context` for accumulated values. This makes pipelines resilient when intermediate plugins emit narrower payloads. ### 7.6 Example Event Payload ```python return { "status": "ok", "result": "Scraped https://example.com", "events": [{ "type": "jina_reader.scraped", "payload": { "url": "https://example.com", "source_url": "https://example.com", "source_type": "web", "text": "Scraped content here...", "content_hash": "abc123" } }] } ``` ______________________________________________________________________ ## 8. Built-in Plugin: `if` Classifier The `if` plugin is a general-purpose field classifier. It evaluates an ordered list of checks against a payload field and emits the **first** matching event type, with the payload unchanged. ### 8.1 Config (per instance) ```yaml plugins: check_youtube: enabled: true config: field: text checks: - contains: "youtu.be" emit: youtube.url.detected - contains: "youtube.com" emit: youtube.url.detected - startswith: "http" emit: web.url.detected - default: text.received ``` ### 8.2 Supported Checks - `contains`, `startswith`, `endswith`, `equals` (case-insensitive) - `regex` (Python `re.fullmatch` against the field value) - `default` (always matches if reached) ### 8.3 Semantics - Checks are evaluated in order; first match wins. - Missing fields are treated as empty strings. - No match + no default → `status: error` with `retry: false`. Core treats that as a v2 compatibility signal for a permanent failure. ### 8.4 Instance Naming Ductile uses manifest names as plugin identities. To create multiple instances of `if` (or any plugin), use plugin aliasing in `plugins.yaml`: ```yaml plugins: check_youtube: uses: if # inherit the if plugin's implementation enabled: true config: field: text checks: [...] ``` The aliased instance has its own config, its own facts, and its own compatibility-view row. ______________________________________________________________________ ## 9. Filesystem & Diagnostic Bundles Ductile does not provision a workspace directory for plugins. The core is dispatch, state, and routing; filesystem is the plugin's concern. - **If your plugin needs a scratch path,** create it yourself. For ephemeral work prefer `mktemp -d` (or the language equivalent) and clean up on exit. For persistent caches use `~/.cache/ductile-/` or a path declared in your plugin config and validated at startup. - **If your plugin needs an archive of its own stdout** (for offline debugging or external log shipping), tee it from inside the run script before writing the response envelope, e.g. `tee -a "$HOME/.cache/myplugin/stdout.log"`. Core does not write subprocess stdout to disk on your behalf; the operationally meaningful fragments are already captured in the database (`job_log`, `plugin_facts`, `event_context`). - **Cwd.** Plugin subprocesses inherit the dispatcher's working directory. If your plugin cares where it runs, the run script should `cd` to a path of its own choosing. ## 10. Security & Isolation - **Allowed paths.** Plugins should only read/write paths they themselves create (per the previous section) or paths explicitly named in their config. - **Execution.** Plugins run as the same OS user as the gateway. Use filesystem permissions to limit blast radius. - **Trust.** Ductile refuses to load plugins with world-writable directories or `..` in their `entrypoint`. The entrypoint must be `chmod +x`. - **No persistent state outside what is declared.** Plugins must not write to their own plugin directory at runtime. Anything durable goes through `state_updates` (subject to the manifest's `fact_outputs` rule); anything ephemeral goes to a plugin-managed scratch path. ______________________________________________________________________ ## 11. Quick Quality Checklist When you finish a new plugin, walk this list before merging: - `manifest_spec`, `manifest_version`, `name`, `version`, `protocol: 2`, `entrypoint` set. - `description` is a real one-sentence summary. - `concurrency_safe` is honestly set (`false` if the plugin owns a single-writer durable resource). - Every command has `name`, `type`, `description`, and honest `idempotent` / `retry_safe` flags. - Standard commands (`poll`, `handle`, `health`, `init`) follow the conventions in §3.2. - `health` and `init` emit no `state_updates`. - Each command declares `values.consume` / `values.emit` so pipeline authors can see the contract. - If the plugin remembers anything across invocations, it declares `fact_outputs` for the durable command(s). - The emitted snapshot is a full object, deterministic, and has the same keys every invocation of that command (presence-stable). - Nothing in `state_updates` matches the §6 anti-patterns. - `config_keys.required` is honest — required keys must actually be required. - Entrypoint is `chmod +x`. - Tests cover the snapshot constructor and the response shape. If every box ticks, the plugin is aligned with the durability model and should not need a future migration sprint to fix. ______________________________________________________________________ ## Stopwatch — timing is captured for you The dispatcher times every plugin invocation automatically. You do not need to wrap your handler in `time.now()` calls; the supervisor records a `stopwatch.Record` (plugin id, step, monotonic duration, status, etc.) to the `job_stopwatch` table in the ductile DB. Telemetry is system data; it lives in the supervisor's ledger, not in your request context or response. See [PLUGIN_DIAGNOSTICS.md](https://ductile.run/PLUGIN_DIAGNOSTICS/index.md) for the data shape and the `gateway_time` formula. ### Optional sub-spans for plugin-internal phases If you want to break down what your handler did internally (`db_query`, `http_call`, parsing, etc.), emit a list under `ductile_stopwatch_subs` at the top level of your response. Shape: ```json { "status": "ok", "result": "...", "ductile_stopwatch_subs": [ {"name": "fetch.http_get", "dur_ns": 31000000, "status": "ok"}, {"name": "fetch.body_read", "dur_ns": 11000000, "status": "ok", "bytes": 482103}, {"name": "fetch.decode", "dur_ns": 2400000, "status": "ok"} ] } ``` #### Field convention | Field | Required | Notes | | -------- | -------- | ---------------------------------------------------------------------------------------- | | `name` | yes | Dotted name `.` (e.g. `fetch.http_get`). Prefix enables filtering. | | `dur_ns` | yes | Monotonic duration in nanoseconds. Use `time.perf_counter_ns()` deltas. | | `status` | optional | `ok` / `err` / `skip`. Explains zero-duration or partial spans. | | `bytes` | optional | For I/O spans (body reads, file hashes). Quartile bytes vs. dur_ns to find slow servers. | | `count` | optional | For batch spans (files scanned, watches polled, retries attempted). | The dispatcher stores additional fields verbatim, but the convention above is what downstream tools query. New fields should be added by RFC, not by ad-hoc plugin choice — the convention is only as strong as its exemplars. See `plugins/_lib/_stopwatch.py` for a vendored Python helper that emits this shape from a context-manager API. The four exemplar plugins (`fetch`, `file_watch`, `folder_watch`, `sys_exec`) use it. #### Rules - **Sub-spans are advisory.** The dispatcher's own Record is always emitted regardless of whether you include sub-spans. A buggy or lying plugin poisons its own breakdowns only; the supervisor's timing is independent. - **The dispatcher caps the list at 32 entries per invocation by default.** If you emit more, the excess is dropped (head-keep — first 32 survive) and one warning is logged for the call. **Order matters:** put high-signal spans first. The default is appropriate for almost all plugins; see "Raising the cap" below before considering an override. - **Malformed entries are dropped silently.** Non-object items and non-list values do not raise. Defensive parsing is part of the contract. - **The dispatcher does not interpret sub-span fields beyond storing them.** Field semantics are the plugin's responsibility. Follow the convention above so quartile dashboards work across plugins. #### Raising the cap (rare) A plugin that legitimately produces more than 32 spans — typically a multi-stage pipeline coordinator with structurally distinct sub-phases — may declare a higher cap in its `manifest.yaml`: ```yaml stopwatch: max_subs: 64 ``` Range: `[1, 256]`. The hard upper of 256 is a system-wide invariant — every consumer of `subs_json` (DB row, API response, log line, future dashboards) needs a stable budget. A manifest declaring `max_subs > 256` is rejected at plugin load with a warn-level log; the plugin is not registered. Manifests omitting the field, or setting it to 0, use the default 32 — no behaviour change for existing plugins. Before adding a `stopwatch` block, ask: can the spans be aggregated? The default cap is a design forcing function. Aggregation patterns (above) will handle 95% of cases that initially look like "I need more spans." #### Aggregation patterns The 32 cap pushes you toward aggregation over per-event tracing. Patterns that work well within the cap: | Anti-pattern | Pattern | | ---------------------------------------- | ----------------------------------------------------------------------- | | One span per file in a 5000-file scan | One `.fingerprint_total` with `count=5000`, `bytes=...` | | One span per HTTP retry | One `.http_get` with `count=` annotation | | One span per loop iteration | Histogram buckets: `.loop.bucket_0_10ms`, `…bucket_10_100ms`, … | | One span per polled watch (many watches) | Aggregate totals, plus **outlier-only** per-watch spans (`> 50ms`) | For real per-event tracing, use an external tracing backend (OTel) — the stopwatch ledger is for quartile-grade aggregation, not timeline reconstruction. # Webhooks This document summarizes how webhooks work in Ductile and how to configure them safely. ## Overview - Webhooks run on a dedicated HTTP listener and enqueue plugin jobs. - Every webhook requires **HMAC-SHA256** signature verification. - Webhooks are configured in **`webhooks.yaml`** (a high-security file). - The webhook listener is configured via `webhooks.listen` in `config.yaml`. ## Configuration Model Only `config.yaml` is loaded automatically. Any webhook configuration must be referenced via the `include:` list: ```yaml include: - api.yaml - plugins.yaml - webhooks.yaml ``` If `webhooks.yaml` is not included, the listener will start **without endpoints**, and requests will return 404. ## File Layout Typical config root: ```text ~/.config/ductile/ ├── config.yaml ├── api.yaml ├── plugins.yaml ├── webhooks.yaml └── .checksums ``` ## webhooks.yaml Format ```yaml webhooks: endpoints: - name: astro_rebuild_staging path: /webhook/astro-rebuild-staging plugin: astro_rebuild_staging secret_ref: astro_webhook_secret signature_header: X-Ductile-Signature-256 max_body_size: 1MB # optional ``` Notes: - `secret_ref` is required and must reference tokens.yaml. - `signature_header` is mandatory. - `max_body_size` defaults to 1MB. ## Listener Port Set in `config.yaml`: ```yaml webhooks: listen: "127.0.0.1:8091" ``` ## Triggering a Webhook Example HMAC signature (GitHub-style `sha256=` prefix is supported): ```bash payload='{"payload":{"reason":"manual rebuild"}}' sig=$(printf '%s' "$payload" | \ openssl dgst -sha256 -hmac '' -hex | awk '{print $2}') curl -sS -X POST http://127.0.0.1:8091/webhook/astro-rebuild-staging \ -H "Content-Type: application/json" \ -H "X-Ductile-Signature-256: sha256=$sig" \ -d "$payload" ``` Response returns a `job_id` when accepted. ## Security & Integrity - `webhooks.yaml` is **high security** and must be sealed with `ductile config lock`. - In `strict_mode: true`, tampering will prevent startup. - Use `secret_ref` + `tokens.yaml` for secret management in production. ## Operational Checks - Webhook listener health: `GET /healthz` on the webhook listener port. - Verify endpoints are loaded: ```bash ductile config show --config-dir ~/.config/ductile | rg -n "webhooks" -C 3 ``` # YAML Tips for Ductile Configuration Practical techniques for keeping your `plugins.yaml` clean and maintainable. No prior YAML expertise assumed. ______________________________________________________________________ ## Anchors and Aliases: Stop Repeating Yourself ### The problem When you create multiple plugin instances that share the same base settings, you end up copy-pasting the same values over and over: ```yaml plugins: discord_test_cron: uses: discord_notify enabled: true timeout: 15s max_attempts: 2 schedules: - cron: "*/5 * * * *" command: poll config: webhook_url: "https://discord.com/api/webhooks/123/abc..." default_username: "Ductile" poll_message: "[T2] cron */5min" discord_test_window: uses: discord_notify enabled: true # repeated timeout: 15s # repeated max_attempts: 2 # repeated schedules: - every: 3m only_between: "07:00-22:00" command: poll config: webhook_url: "https://discord.com/api/webhooks/123/abc..." # repeated default_username: "Ductile" # repeated poll_message: "[T3] only_between 07-22" ``` If you want to change the timeout or webhook URL, you have to edit every block. That's fragile and error-prone. ### The solution: YAML anchors YAML has a built-in mechanism for this called **anchors** (`&`) and **aliases** (`*`). You define a block once with an anchor, then reference it by name anywhere else. The YAML parser expands the reference before any application ever reads the file — it's a pure YAML feature, not a ductile feature. #### Step 1 — Define an anchor An anchor is a name you attach to any YAML value using `&name`. It can go on a mapping (dict), a sequence (list), or a scalar (string/number). ```yaml # This defines an anchor called "discord-test" on this mapping block. # The key name ("x-discord-test") is arbitrary — pick something descriptive. x-discord-test: &discord-test uses: discord_notify enabled: true timeout: 15s max_attempts: 2 ``` #### Step 2 — Reference it with an alias Wherever you want to reuse that block, write `*anchor-name`. The YAML parser replaces it with the full content of the anchored block. ```yaml discord_test_cron: *discord-test # expands to: uses, enabled, timeout, max_attempts schedules: - cron: "*/5 * * * *" command: poll ``` But wait — you also need to *add* fields on top of the expanded block (like `schedules`). For that, use the **merge key** `<<:`. #### Step 3 — Merge with `<<:` `<<: *anchor-name` merges the referenced block's keys into the current mapping. Keys you define explicitly take priority over merged ones. ```yaml discord_test_cron: <<: *discord-test # merges: uses, enabled, timeout, max_attempts schedules: # adds this new key on top - cron: "*/5 * * * *" command: poll ``` This is equivalent to writing all four merged keys out explicitly, plus the `schedules` key. #### Step 4 — Anchors work on nested mappings too You can anchor the `config:` sub-block separately: ```yaml x-discord-config: &discord-config webhook_url: "https://discord.com/api/webhooks/123/abc..." default_username: "Ductile" ``` Then merge it inside the `config:` block of each plugin, adding only the field that differs: ```yaml discord_test_cron: <<: *discord-test schedules: - cron: "*/5 * * * *" command: poll config: <<: *discord-config # merges: webhook_url, default_username poll_message: "[T2] cron" # adds the unique field ``` ### Why this works with ductile There are two things worth understanding: **1. The YAML parser resolves anchors before ductile sees anything.** When ductile loads `plugins.yaml`, the YAML library reads the file and fully expands all anchors and merge keys first. By the time ductile's config loader processes the result, it just sees a normal, fully-populated config map. Ductile has no idea anchors were used. **2. Unknown top-level keys are silently ignored.** The anchor definitions live at the *top level* of the YAML file, outside the `plugins:` block. Ductile's config loader only reads keys it knows about (`plugins:`, `service:`, `pipelines:`, etc.). Anything else — including `x-discord-test:` and `x-discord-config:` — is silently ignored. This is why the anchor names are prefixed with `x-` by convention: it signals "this is application-level metadata, not ductile config". Any name works, but a consistent prefix avoids confusion. ### Full before/after example **Before** (80 lines, webhook URL repeated 5 times): ```yaml plugins: discord_test_cron: uses: discord_notify enabled: true timeout: 15s max_attempts: 2 schedules: - cron: "*/5 * * * *" command: poll config: webhook_url: "https://discord.com/api/webhooks/123/abc..." default_username: "Ductile Scheduler" poll_message: "[T2] cron */5min" discord_test_window: uses: discord_notify enabled: true timeout: 15s max_attempts: 2 schedules: - every: 3m only_between: "07:00-22:00" command: poll config: webhook_url: "https://discord.com/api/webhooks/123/abc..." default_username: "Ductile Scheduler" poll_message: "[T3] only_between 07-22" # ... and so on for each test instance ``` **After** (webhook URL in one place, instances are concise): ```yaml # Shared base — expanded by YAML parser, ignored as a key by ductile x-discord-test: &discord-test uses: discord_notify enabled: true timeout: 15s max_attempts: 2 x-discord-config: &discord-config webhook_url: "https://discord.com/api/webhooks/123/abc..." default_username: "Ductile Scheduler" plugins: discord_test_cron: <<: *discord-test schedules: - cron: "*/5 * * * *" command: poll config: <<: *discord-config poll_message: "[T2] cron */5min" discord_test_window: <<: *discord-test schedules: - every: 3m only_between: "07:00-22:00" command: poll config: <<: *discord-config poll_message: "[T3] only_between 07-22" ``` ### Caveats **Merge is shallow.** `<<:` merges the *top-level keys* of the anchored block. It does not deep-merge nested structures. If your anchor contains a `config:` block, and you also define a `config:` block in the instance, the instance's `config:` block wins entirely — the anchor's `config:` is not merged into it. This is why the example anchors `config` separately (as `*discord-config`) and merges it *inside* the `config:` block. **Explicit keys win over merged keys.** If a key appears in both the `<<:` block and the current mapping, the explicitly written key takes precedence. Use this to override specific values from the shared base. **Anchors are file-scoped.** An anchor defined in `plugins.yaml` cannot be referenced from `pipelines.yaml`. If you use modular config files, you need to repeat shared values across files or consolidate into a single file. **`config check` still validates the expanded result.** Anchors are transparent to ductile's validator. If a merged value is invalid, the error will point to the plugin instance, not the anchor definition. ______________________________________________________________________ ## Further reading - [YAML specification — anchors and aliases](https://yaml.org/spec/1.2.2/#anchors-and-aliases) - [YAML specification — merge keys](https://yaml.org/type/merge.html) - Ductile config reference: `CONFIG_REFERENCE.md` - Plugin instance aliasing (`uses:`): `COOKBOOK.md` # Operating # Ductile Deployment Guide This document describes how to deploy a host-local Ductile instance as a systemd user service. It reflects the first reference deployment on `matt-ThinkPad-T14s-Gen-1` (2026-02-22) and is the canonical procedure for repeating this on other hosts. See also: RFC-006 (local execution plane topology). ______________________________________________________________________ ## 1. Build the Binary Build from source and install to the user's local bin: ```bash cd /path/to/ductile go build -o ~/.local/bin/ductile ./cmd/ductile ``` Verify: ```bash ductile --version ``` The binary is self-contained — no additional runtime dependencies. ______________________________________________________________________ ## 2. Directory Layout Create a deployment root with separate `config/` and `data/` directories: ```text ductile-local/ ├── config/ │ ├── config.yaml # main config (includes others via `include:`) │ ├── api.yaml # API listen address + auth tokens │ └── plugins.yaml # plugin enable/config └── data/ ├── ductile.db # SQLite state DB (created on first start) └── outputs/ # write target for file_handler plugin ``` Create it: ```bash mkdir -p ~/admin/ductile-local/config ~/admin/ductile-local/data/outputs ``` ______________________________________________________________________ ## 3. Split Config Pattern Ductile supports modular ("grafted") configs via the `include:` key. The main config file sets global options and includes the others by relative path. ### config/config.yaml ```yaml log_level: info state: path: ./data/ductile.db plugin_roots: - /path/to/ductile/plugins include: - api.yaml - plugins.yaml ``` `plugin_roots` is a list of directories to scan for plugin executables at startup. Any plugin binary found here is *discovered*; only plugins listed in `plugins.yaml` are *configured* (and those not listed emit a warning but still load). ### config/api.yaml ```yaml api: enabled: true listen: "localhost:8081" auth: tokens: - token: scopes: ["*"] ``` Generate a token: ```bash openssl rand -hex 32 ``` Store the token in your shell environment: ```bash # ~/.zshrc export DUCTILE_LOCAL_TOKEN= ``` ### config/plugins.yaml ```yaml plugins: fabric: enabled: true timeout: 120s max_attempts: 2 config: FABRIC_DEFAULT_PATTERN: "summarize" file_handler: enabled: true timeout: 30s max_attempts: 1 config: allowed_read_paths: "${HOME}" allowed_write_paths: "${HOME}/ductile-local/data/outputs" jina-reader: enabled: true timeout: 30s max_attempts: 3 circuit_breaker: threshold: 3 reset_after: 5m config: {} ``` ______________________________________________________________________ ## 4. Validate Config Before starting the service, validate the configuration: ```bash cd ~/admin/ductile-local ductile config check --config config/config.yaml ``` Expected output: ```text Configuration valid (N warning(s)) WARN [unused] plugin "echo" discovered but not referenced in config ... ``` Warnings about undeclared plugins are expected if `plugin_roots` contains plugins you haven't explicitly configured. They are loaded but not usable without config entries. ______________________________________________________________________ ## 5. systemd User Service Create `~/.config/systemd/user/ductile-local.service`: ```ini [Unit] Description=Ductile Gateway (local prod) After=network.target [Service] Type=simple WorkingDirectory=${HOME}/ductile-local ExecStart=${HOME}/.local/bin/ductile system start --config config/config.yaml Restart=on-failure RestartSec=5s StandardOutput=journal StandardError=journal [Install] WantedBy=default.target ``` Enable and start: ```bash systemctl --user daemon-reload systemctl --user enable --now ductile-local ``` Check status: ```bash systemctl --user status ductile-local journalctl --user -u ductile-local -f ``` ______________________________________________________________________ ## 6. Verification Checklist After starting the service, verify the following: ```bash # Health — no auth required curl http://localhost:8081/healthz # Expected: # {"status":"ok","uptime_seconds":N,"queue_depth":0,"plugins_loaded":5,"plugins_circuit_open":0} # Plugin list — requires auth curl -H "Authorization: Bearer $DUCTILE_LOCAL_TOKEN" http://localhost:8081/plugins # OpenAPI schema — no auth required curl http://localhost:8081/openapi.json | head -20 ``` Confirm: - [ ] `status: ok` in healthz - [ ] `plugins_loaded` > 0 - [ ] fabric, file_handler, jina-reader appear in `/plugins` - [ ] `/openapi.json` returns valid JSON ______________________________________________________________________ ## 7. RFC-006 Topology Notes RFC-006 defines two Ductile instance roles: | Role | Purpose | | ------------------- | ----------------------------------------------------------------- | | **Boundary node** | Public-facing gateway, handles external API calls, auth, routing | | **Host-local node** | Per-host execution plane, runs plugins with local resource access | This deployment is a **host-local node**: - Listens on `localhost` only (not exposed to LAN) - Token scoped to `["*"]` for local use - Plugins have access to local filesystem (`file_handler`) and local tools (`fabric`) - Receives work dispatched from a boundary node or local AgenticLoop agent The prod Unraid instance (`192.168.20.4:8888`) is the boundary node for this network. ______________________________________________________________________ ## 8. Updating the Binary When a new version is built: ```bash # Stop the service first (optional but clean) systemctl --user stop ductile-local # Rebuild cd /path/to/ductile go build -o ~/.local/bin/ductile ./cmd/ductile # Restart systemctl --user start ductile-local systemctl --user status ductile-local ``` Or just rebuild and restart in one shot — the service will pick up the new binary on next start: ```bash cd /path/to/ductile && go build -o ~/.local/bin/ductile ./cmd/ductile && systemctl --user restart ductile-local ``` ## 9. Schema Migrations Before Deploy If a release adds additive SQLite schema, apply the matching migration script to the existing state DB before the normal deploy/restart. This is especially relevant for instances that already have a populated database. The binary still carries the mono-schema for fresh databases, but the preferred operational path for existing databases is to run explicit migrations first so schema changes are intentional and visible in deployment steps. For non-empty existing databases, Ductile validates schema on startup instead of silently adding missing upgrade-era tables or indexes. If the DB is behind, startup should fail with a migration hint rather than mutating the schema implicitly. ## 10. Backups `ductile system backup` writes an atomic, point-in-time snapshot of the SQLite state DB plus selected runtime artefacts into a single `tar.gz` archive. The DB snapshot is taken via `VACUUM INTO`, which is safe under concurrent writers — no service stop required. ```bash ductile system backup --to [--scope SCOPE] [--config PATH] ``` The four scopes are a nested ladder; each level adds to the previous: | Scope | Contents | | ------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------ | | `db` | `VACUUM INTO` snapshot of the state DB only | | `config` (default) | `db` + ductile config dir (`config.yaml`, `api.yaml`, `plugins.yaml`, `pipelines.yaml`, `webhooks.yaml`, `.checksums`) | | `plugins` | `config` + every directory under `plugin_roots` (excludes `.git`, `node_modules`, `.venv`, `venv`, `__pycache__`, `.DS_Store`, `*.pyc`, `*.pyo`) | | `all` | `plugins` + every file referenced under `environment_vars.include` | Each invocation prints its INCLUDED / EXCLUDED list to stdout before doing the work and embeds a `BACKUP_MANIFEST.txt` inside the archive recording the same information plus ductile version, commit, hostname, source paths, source DB sha256, and any boundary warnings (e.g. `api.yaml` at `config` scope, env files at `all` scope). Refuses to overwrite an existing destination — operator owns naming and retention via shell glue. ### Scheduled backups systemd-timer (Thinkpad pattern) — `~/.config/systemd/user/ductile-backup.service`: ```ini [Unit] Description=Ductile backup snapshot [Service] Type=oneshot Environment=BACKUP_DIR=%h/admin/ductile-backups/thinkpad/auto ExecStart=/bin/sh -c 'mkdir -p "$BACKUP_DIR" && \ STAMP=$(date -u +%%Y%%m%%dT%%H%%M%%SZ) && \ %h/.local/bin/ductile system backup \ --to "$BACKUP_DIR/ductile-$STAMP.tar.gz" --scope config && \ find "$BACKUP_DIR" -name "ductile-*.tar.gz" -mtime +7 -delete' ``` Paired timer `~/.config/systemd/user/ductile-backup.timer`: ```ini [Unit] Description=Nightly ductile backup at 03:00 local [Timer] OnCalendar=*-*-* 03:00:00 Persistent=true [Install] WantedBy=timers.target ``` Enable: ```bash systemctl --user daemon-reload systemctl --user enable --now ductile-backup.timer ``` launchd (Mac pattern) — equivalent `LaunchAgent` plist with `StartCalendarInterval` runs the same command sequence; see existing `com.mattjoyce.ductile-local.plist` as a template for the `ProgramArguments` shape. Pre-migration backups before any breaking schema change are a separate manual invocation under `~/admin/ductile-backups//pre--/` — they sit outside the auto-rotation directory. # Operator Guide This guide is intended for system administrators and LLM operators managing a Ductile instance. It covers day-to-day operations, monitoring, and administrative safety. ______________________________________________________________________ ## 1. System Operations ### Starting the Service The primary way to run Ductile is in the foreground: ```bash ./ductile system start ``` For production environments, we recommend using a **systemd** unit. See [Architecture](https://ductile.run/ARCHITECTURE/#14-deployment) for an example configuration. ### Reloading Configuration You can reload the configuration without restarting the service by sending a `SIGHUP` signal or using the CLI: ```bash ./ductile system reload ``` ### Backups `ductile system backup` writes a point-in-time snapshot to a single `tar.gz` archive. The DB snapshot uses SQLite `VACUUM INTO`, so the gateway can stay running. ```bash ductile system backup --to /backups/ductile-$(date -u +%Y%m%dT%H%M%SZ).tar.gz \ --scope config ``` Scope is a nested ladder; each level adds to the previous: - `db` — DB snapshot only - `config` (default) — `db` + ductile config dir - `plugins` — `config` + every directory under `plugin_roots` - `all` — `plugins` + every file under `environment_vars.include` Each archive embeds a `BACKUP_MANIFEST.txt` recording ductile version, commit, hostname, source paths, source DB sha256, included items, excluded items with reasons, plugin-root mappings, and any boundary warnings (e.g. `api.yaml` appearing at scope `config`, env files appearing at scope `all`). Inspect with `tar -xzOf BACKUP_MANIFEST.txt` without re-extracting the rest. The command refuses to overwrite an existing destination — operator owns the naming pattern and retention. For a scheduled-backup setup (systemd timer or launchd), see `docs/DEPLOYMENT.md` §10. ### Self-check `ductile system selfcheck` runs four read-only invariants against the local state DB: - PID lock check (refuses to run while the gateway holds the lock — WAL safety) - `PRAGMA integrity_check` on the SQLite file - Schema validation (`ValidateSQLiteSchema`) against the embedded baseline - `queue_terminal_freshness` — terminal-state job_queue rows older than the retention window (24h default) should not exist ```bash ductile system selfcheck --json ``` Exit code 0 = healthy, 1 = at least one check failed. Use as a deploy gate between binary swap and re-enabling the service. ______________________________________________________________________ ## 2. Monitoring & Observability ### Real-Time Dashboard (TUI) Ductile includes a built-in terminal UI for real-time visibility: ```bash ./ductile system watch --api-key "your-admin-token" ``` The watch view shows: - Service health, uptime, queue depth, and plugin count. - Metadata header (config path, binary path, version). - Pipelines with live status and last activity. - An event stream of recent activity. ### Logging Ductile emits structured JSON logs to `stdout`. These are ideal for consumption by Logstash, Fluentd, or simple `jq` queries. ```bash ./ductile system start | jq 'select(.level == "ERROR")' ``` ### SSE Event Stream For custom monitoring tools, subscribe to the live event stream: ```bash curl -N -H "Authorization: Bearer " http://localhost:8080/events ``` ______________________________________________________________________ ## 3. Configuration Management Ductile loads `config.yaml` from the config directory (typically `~/.config/ductile/`) and merges any files listed under `include:`. ### Administrative Commands Use the `config` noun for surgical administration: - **Show resolved config:** `ductile config show` (includes all defaults and merges). - **Get a specific value:** `ductile config get plugins.echo.enabled`. - **Set a value safely:** `ductile config set plugins.echo.enabled=false --apply`. ### Operational Integrity (Lock & Check) To prevent unauthorized modifications to sensitive files (like `tokens.yaml` or `webhooks.yaml`), Ductile uses **BLAKE3** hash verification. For webhook setup and signing examples, see [WEBHOOKS.md](https://ductile.run/WEBHOOKS/index.md). 1. **Authorize changes:** After editing config files, update the hashes: ```bash ductile config lock ``` 1. **Validate state:** Ductile runs an automatic check at startup. You can run it manually with: ```bash ductile config check ``` ### Strict Mode For hardened environments, enable `service.strict_mode: true` in your `config.yaml`. In strict mode: - The system **will not start** if any file fails integrity verification (no warnings). - The system **will not start** if any configuration check fails (e.g., missing dependencies). - The system **requires** at least one API token to be defined if the API is enabled. ### Managing Scoped Tokens Create scoped API tokens by passing scopes directly or by providing a scopes JSON file: ```bash ./ductile config token create --name "my-service" --scopes "jobs:ro,plugin:rw" ``` ______________________________________________________________________ ## 4. API Reference Ductile provides a REST API for programmatic control. By default, it listens on `localhost:8080`. ### Manual Triggering You can manually enqueue any plugin command via the API: ```bash curl -X POST http://localhost:8080/plugin/echo/poll -H "Authorization: Bearer " -H "Content-Type: application/json" -d '{"payload": {"message": "Hello from API"}}' ``` ### Job Inspection Retrieve the status and results of any job: ```bash curl http://localhost:8080/job/ -H "Authorization: Bearer " ``` For a full list of endpoints and schemas, see the [API Reference](https://ductile.run/API_REFERENCE/index.md). ______________________________________________________________________ ## 5. Troubleshooting - **Failed to acquire PID lock:** Another instance is running. Check `ps aux | grep ductile`. - **Plugin not running:** Ensure it is `enabled: true` in `config.yaml` and has a valid `schedule`. - **Database is locked:** SQLite concurrency limit. Ductile uses WAL mode to mitigate this, but very high API volume may still trigger it. - **Tampering detected:** Configuration file was modified without running `config lock`. Run `ductile config lock` if the change was intentional. - **Plugin directory ignored:** If a subdirectory in your `plugin_roots` contains an entrypoint (like `run.py`) but no `manifest.yaml`, Ductile will log a warning and ignore it. Add a manifest to enable discovery. # Ductile Health Check Procedure Operational procedure for reviewing the day-to-day health of a running Ductile instance. Useful as a daily status review or as the first thing to run when investigating flaky behaviour. Assumes a systemd-managed deployment. Adjust service name, binary path, and config path to match your instance. ## Step 1 — Service & binary sanity ```bash systemctl --user is-active systemctl --user status --no-pager | sed -n '1,8p' ductile version ductile system status ``` Expected: `active`. Record the `version`, `commit`, and uptime — these let you correlate errors against deploys later. **Known quirk:** when the daemon is running, `ductile system status` reports `DEGRADED` with `pid_lock: FAIL (pid )`. The PID reported IS the running daemon. This is expected when calling `system status` against a live instance — not a real failure. Verify the PID matches the systemd `Main PID` and move on. ## Step 2 — Recent errors Scan a 24h window for error-level log entries, and separately scan everything since the current binary was started, so that pre-deploy and post-deploy issues are distinguishable. ```bash # 24h error scan journalctl --user -u --since "24 hours ago" --no-pager \ | grep -i -E '"level":"ERROR"|panic|FATAL' # Since service restart (use the "Active: since" timestamp from `systemctl status`) journalctl --user -u --since "" --no-pager \ | grep -i -E '"level":"ERROR"|panic|FATAL' ``` Common patterns and their meaning: | Pattern | Meaning | | --------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------- | | `baggage path "ductile.route_depth" is immutable` | Routing/baggage propagation issue. The plugin itself may have succeeded; failure is in routed-context creation downstream. | | `plugin fingerprint check failed (strict mode)` | A plugin entrypoint was edited without `ductile config lock`. Review the change, then re-lock. | | `failed to create event context for pipeline entry` | Usually a symptom of a baggage/routing bug — investigate the underlying cause rather than the symptom. | ## Step 3 — 24h job stats per scheduled plugin Enumerate scheduled plugins from the live config: ```bash ductile config show | grep -B1 -A2 'schedules:' ``` Then collect per-plugin 24h counts: ```bash FROM=$(date -u -d '24 hours ago' +%Y-%m-%dT%H:%M:%SZ) for plugin in ; do ductile job logs --from "$FROM" --plugin "$plugin" --limit 200 --json 2>/dev/null \ | python3 -c " import sys,json d=json.load(sys.stdin) logs=d.get('logs') or [] total=d.get('total',0) succ=sum(1 for j in logs if j.get('Status')=='succeeded') fail=sum(1 for j in logs if j.get('Status')=='failed') print(f'{\"$plugin\":<22} total={total:<4} in_window={len(logs)} succ={succ:<4} fail={fail}') " done ``` Also query any event-driven plugins you care about; they may show 0 in the window, which is fine if no upstream event triggered them. **CLI gotchas:** - JSON field names are **capitalized**: `Status`, `Plugin`, `CreatedAt`, `LastError`, `Stderr`, `Result`. - `--limit` maxes at 200. When `total > in_window`, you are seeing only the most recent 200 entries, but `total` is the truthful 24h count. ## Step 4 — Investigate failures For any plugin showing fails, pull full details including `Result`, `LastError`, and `Stderr` via `--include-result`: ```bash FROM=$(date -u -d '24 hours ago' +%Y-%m-%dT%H:%M:%SZ) ductile job logs --from "$FROM" --plugin --limit 200 --include-result --json 2>/dev/null \ | python3 -c " import sys,json d=json.load(sys.stdin) for j in d.get('logs',[]): if j.get('Status')=='failed': print(f\" {j.get('CreatedAt')} cmd={j.get('Command')} attempt={j.get('Attempt')}\") for k in ('LastError','Stderr'): v=j.get(k) or '' if v: print(f' {k}:', v[:300]) res=j.get('Result') if res: print(' Result:', json.dumps(res)[:300]) " ``` Watch for cases where `Result.status == "ok"` but `LastError` is set: the plugin itself succeeded, and the failure is downstream in Ductile's routing/context layer. Those are core bugs, not plugin bugs, and usually need to be matched against recent upstream commits. For job lineage (baggage, attempts, routing context) across a pipeline: ```bash ductile job inspect ``` ## Step 5 — Deploy correlation If errors cluster before a timestamp and stop after it, confirm a deploy/restart explains it rather than transient recovery: ```bash # find service restarts journalctl --user -u --since "24 hours ago" --no-pager \ | grep -E 'Started|ductile running' # binary age ls -la $(command -v ductile) ``` Match against `git log` in the ductile source tree between the old and new `commit:` values (from `ductile version`) to identify which commits fixed which errors. ## Step 6 — Verdict Summarise as: 1. **Service state** — active/degraded (ignoring the `pid_lock` quirk), binary version, uptime. 1. **24h job totals** — overall success rate, per-plugin failure counts. 1. **Failures** — root cause(s), whether already patched in the running binary, whether operator action is needed. 1. **Post-restart window** — clean or not (most important signal for "is it healthy *now*?"). Target: the post-restart window has zero errors. Pre-restart errors with a matching fix already in the running binary are history, not present-day problems. ## Related docs - [Deployment](https://ductile.run/DEPLOYMENT/index.md) - [Operator Guide](https://ductile.run/OPERATOR_GUIDE/index.md) - [Architecture](https://ductile.run/ARCHITECTURE/index.md) # Plugin Diagnostics A structured process for diagnosing plugin health in a running Ductile instance. Covers triage, job history analysis, failure inspection, manual testing, and remediation. ______________________________________________________________________ ## Quick Triage (3 commands) Run these first. They answer "is anything broken right now?" ```bash # 1. Gateway and overall health ductile system status # 2. Recent failures across all plugins (last 24h) ductile job logs --from $(date -u -d '24 hours ago' --rfc-3339=seconds | tr ' ' 'T') \ --limit 200 --json | \ python3 -c " import json,sys d=json.load(sys.stdin) logs=d['logs'] or [] fails=[l for l in logs if l['Status']=='failed'] print(f'Total jobs: {d[\"total\"]} Failures: {len(fails)}') for f in fails: print(f' {f[\"Plugin\"]:25} {f[\"CreatedAt\"][:16]} {f[\"LastError\"]}') " # 3. Run a specific plugin's health check ductile plugin run health ``` If step 2 shows failures, move to **Per-Plugin Investigation** below. If step 3 fails, move to **Configuration Issues**. ______________________________________________________________________ ## 1. Per-Plugin Job History Get a summary of a plugin's recent activity: ```bash FROM=$(date -u -d '24 hours ago' --rfc-3339=seconds | tr ' ' 'T') ductile job logs --from $FROM --plugin --limit 200 --json | python3 -c " import json,sys from collections import Counter d=json.load(sys.stdin) logs=d['logs'] or [] statuses=Counter(l['Status'] for l in logs) print(f'Total: {d[\"total\"]} Statuses: {dict(statuses)}') if logs: print(f'Oldest: {logs[-1][\"CreatedAt\"][:16]}') print(f'Newest: {logs[0][\"CreatedAt\"][:16]}') for l in logs: if l['Status'] == 'failed': print(f' FAIL {l[\"CreatedAt\"][:16]} {l[\"LastError\"]}') " ``` **Status meanings:** | Status | Meaning | | ----------- | ----------------------------------------------------------------------------------------------------------------------------------- | | `succeeded` | Plugin ran and returned `status: ok` | | `failed` | Plugin returned `status: error` or timed out | | `skipped` | A job was explicitly skipped by orchestration logic; uncommon for `if:` pipelines because they branch through `core.switch` instead | | `retrying` | Core retry policy queued another attempt after a retryable failure | A high `succeeded` count for `core.switch` is normal for conditional pipeline steps. Only `failed` warrants investigation. ______________________________________________________________________ ## 2. Inspect a Failed Job Get the full result payload and pipeline lineage for a specific job: ```bash # Get job IDs for failed runs ductile job logs --from $FROM --plugin --limit 50 --json | python3 -c " import json,sys d=json.load(sys.stdin) for l in (d['logs'] or []): if l['Status'] == 'failed': print(l['JobID'], l['CreatedAt'][:16], l.get('LastError','')) " # Inspect the full result (including plugin stdout, error detail) ductile job logs --from $FROM --plugin --limit 50 --json --include-result | python3 -c " import json,sys d=json.load(sys.stdin) for l in (d['logs'] or []): if l['Status'] == 'failed': print('=== FAILED JOB', l['JobID'][:8], l['CreatedAt'][:16], '===') print(json.dumps(l.get('Result'), indent=2)) if l.get('Stderr'): print('STDERR:', l['Stderr']) " # Follow the pipeline lineage (what triggered this job, what did it trigger) ductile job inspect ``` **What to look for in `job inspect`:** - **Hops** — which pipeline step triggered this job and what baggage it carried - **Baggage** — the payload passed down the chain; missing keys here often explain `missing field` errors ______________________________________________________________________ ## 3. Manual Plugin Invocation Test a plugin end-to-end without waiting for a trigger: ```bash # Run with default/no payload ductile plugin run handle # Run with a payload (useful for handle commands that need input) ductile api /plugin//handle -X POST \ -b '{"payload": {"message": "test message"}}' # Run the health command to verify config ductile plugin run health ``` The `health` command validates the plugin's configuration (e.g. required API keys, webhook URLs) without performing any side effects. Use it after changing config. ______________________________________________________________________ ## 4. Configuration Issues ### Check plugin is registered ```bash ductile config show | grep -A 10 'plugins:' ductile config get plugins..enabled ``` ### Validate full config integrity ```bash ductile config check ``` This catches: missing fields, integrity hash mismatches, unreachable entrypoints. ### Verify the manifest Each plugin directory must contain a valid `manifest.yaml`. If a plugin is silently absent from scheduling, check: ```bash ls /manifest.yaml cat /manifest.yaml ``` The manifest declares supported `commands`, required `config_keys`, and the `entrypoint`. A missing or malformed manifest causes the plugin to be skipped at startup with no error. ### After any config change ```bash ductile config lock # update integrity hashes ductile config check # verify ductile system reload # apply without restart ``` ______________________________________________________________________ ## 5. Scheduled Plugin Not Firing If a plugin is scheduled but no jobs appear in the logs: 1. **Confirm the schedule is configured:** ```bash ductile config get plugins..schedules ``` 1. **Check cron expression and timezone** — Ductile cron runs in the system timezone unless overridden. A schedule of `0 7 * * * Australia/Sydney` fires at 07:00 AEST, which is 20:00 or 21:00 UTC depending on DST. 1. **Check the plugin is enabled:** ```bash ductile config get plugins..enabled ``` 1. **Look for startup errors in the journal:** ```bash journalctl --user -u ductile-local --no-pager -n 100 | grep -i 'error\|plugin' ``` ______________________________________________________________________ ## 6. Pipeline-Triggered Plugin Not Firing If a plugin is supposed to run when an upstream job completes but doesn't: 1. **Confirm the upstream job actually ran and succeeded:** ```bash ductile job logs --from $FROM --plugin --limit 10 --json | \ python3 -c "import json,sys; d=json.load(sys.stdin); [print(l['Status'], l['CreatedAt'][:16]) for l in (d['logs'] or [])]" ``` 1. **Check the pipeline `if:` condition** — `if:` predicates compile into an internal `core.switch` hop. If the condition evaluates false, Ductile bypasses the gated step and routes the false branch onward. Inspect the upstream payload and the `core.switch` result to confirm what matched. 1. **Check event routing:** ```bash ductile config show | grep -B2 -A15 'on: ' ``` 1. **Inspect the upstream job for baggage** — the downstream plugin receives the upstream job's baggage as its payload. A `missing field` error downstream usually means the upstream didn't emit that field. ```bash ductile job inspect ``` ______________________________________________________________________ ## 7. Circuit Breaker Ductile tracks consecutive plugin failures and can open a circuit breaker to stop retrying a broken plugin. Signs: - Plugin stopped firing entirely after a run of failures - `system status` shows plugin in `open` circuit state ```bash # Check circuit state ductile system breaker # Machine-readable breaker state and recent transition facts ductile system breaker --json # Reset after fixing the underlying issue ductile system reset ``` Do not reset without first understanding why the circuit opened. ______________________________________________________________________ ## 8. Reconciliation Check To verify that a plugin's fired jobs match expected outputs (e.g. confirming notifications landed): ```bash FROM=$(date -u -d '12 hours ago' --rfc-3339=seconds | tr ' ' 'T') ductile job logs --from $FROM --plugin --limit 200 --json | python3 -c " import json,sys from collections import Counter d=json.load(sys.stdin) logs=d['logs'] or [] statuses=Counter(l['Status'] for l in logs) print(f'Window: last 12h Total: {d[\"total\"]}') print('Breakdown:', dict(statuses)) " ``` Cross-reference the `total` count against expected frequency: - A `poll` plugin on a 15-minute schedule should produce ~48 jobs per 12h - An event-driven plugin should have jobs proportional to the events that triggered it - Gaps (fewer jobs than expected) can indicate scheduler drift, missed events, or a silent failure in an upstream trigger ______________________________________________________________________ ## Common Failure Patterns | Error | Likely Cause | Fix | | ---------------------------- | ---------------------------------------------------- | ---------------------------------------------------------------------------------- | | `missing repo_path/path` | Upstream step didn't emit the required baggage field | Check upstream plugin result and pipeline config mapping | | `missing webhook_url` | Plugin config lacks required key | Add key to plugin config, `config lock`, `system reload` | | `timeout` | Plugin exceeded deadline | Increase `timeout:` in plugin config or fix slow external call | | `invalid JSON input` | Plugin received malformed stdin | Check upstream payload construction; look at `Stderr` in job log | | `HTTP 4xx` from external API | Auth or request format issue | Check plugin config (tokens, endpoint URLs); run `health` command | | `HTTP 5xx` from external API | Upstream service down | Transient — check plugin error facts and core retry events; check external service | | `exit code 1` (sys_exec) | Shell command failed | Check `Stderr` in job log for command output | ______________________________________________________________________ ## Reference: Key Commands ```bash # Gateway health ductile system status ductile system watch # live TUI # Plugin testing ductile plugin run health ductile plugin run handle ductile api /plugin//handle -X POST -b '{"payload": {...}}' # Job history ductile job logs --plugin --from --limit 200 --json ductile job logs --plugin --from --limit 200 --json --include-result ductile job inspect # Config ductile config check ductile config show ductile config get plugins.. ductile config lock && ductile system reload # Circuit breaker ductile system breaker ductile system reset # Logs (systemd) journalctl --user -u ductile-local --no-pager -n 50 | grep ERROR ``` ______________________________________________________________________ ## Stopwatch — answering "is ductile slow, or is my plugin slow?" The dispatcher captures per-invocation timing automatically. Plugins do not instrument themselves; the supervisor measures them. Each plugin invocation writes one immutable `stopwatch.Record` to the `job_stopwatch` table — the supervisor's ledger. Telemetry is system data, distinct from plugin domain payload (Hickey decomplecting), so it lives in the database and never rides along in baggage. **Query directly when you need it:** ```sh sqlite3 /path/to/ductile.db "SELECT job_id, plugin, attempt, dur_ns, status FROM job_stopwatch ORDER BY id DESC LIMIT 20;" ``` Soon: surfaced via `ductile inspect ` (claude-9mf). A Record carries everything needed to attribute time: | Field | Meaning | | ----------------- | ----------------------------------------------------------- | | `plugin_id` | Plugin name | | `step_name` | Pipeline step ID, when known | | `attempt` | 1-based retry counter | | `enter_wall_ns` | Wall-clock entry timestamp (correlation only) | | `exit_wall_ns` | Wall-clock exit timestamp (correlation only) | | `dur_ns` | Monotonic spawn duration — the number to compare | | `runtime_pre_ns` | Dispatcher work between request build and spawn | | `runtime_post_ns` | Dispatcher work between spawn return and record write | | `status` | `ok`, `err`, `timeout`, or `capture_error` | | `subs` | Optional plugin-emitted sub-spans (capped at 32 per Record) | ### Attributing the bottleneck For one job, durations are local. For a pipeline of `N` steps: ```text plugin_time = Σ dur_ns (across all step records) wall_time = max(exit_wall) − min(enter_wall) gateway_time = wall_time − plugin_time ``` - If `gateway_time` is large compared to `plugin_time`, the bottleneck is **inside ductile** — dispatch, routing, or the queue. - If a single `plugin_id` dominates `plugin_time`, that plugin is the bottleneck. - If `runtime_pre_ns` or `runtime_post_ns` grows without `dur_ns` growing, the cost is in the dispatcher's pre/post work, not the plugin spawn. ### Optional sub-spans Plugins may emit internal phases (`db_query`, `http_call`) in their response under `ductile_stopwatch_subs` (see PLUGIN_DEVELOPMENT.md). The dispatcher caps at 32 entries per Record and drops the rest with a single warn-log; malformed shapes are dropped silently. Sub-spans are advisory; the Record itself is always present regardless. ### Status semantics `status` is a closed set. `capture_error` indicates a defect in the supervisor itself and should never appear in production — it exists so that timing data is still emitted in the worst case rather than silently disappearing. # Remote Event Relay Remote Event Relay lets one Ductile instance deliver an event to another Ductile instance over authenticated HTTP. Phase 1 is intentionally narrow: - point-to-point relay between named instances - HMAC-authenticated HTTP ingress - receiver-side local enqueue and local exact-match routing - at-least-once delivery It is not: - clustering - shared queueing - shared state - remote route discovery - pub/sub or broker semantics ______________________________________________________________________ ## What Happens 1. Instance `home-primary` sends an event to named instance `lab`. 1. `lab` validates the trusted peer, timestamp, key id, signature, and envelope. 1. `lab` accepts the event as a fresh local root ingress event. 1. `lab` enqueues local work and applies its own local routing. The important boundary is step 3. After acceptance, the receiver owns all further processing. ______________________________________________________________________ ## Config Layout Recommended files: ```text ~/.config/ductile/ ├── config.yaml ├── tokens.yaml ├── relay-instances.yaml ├── relay-ingress.yaml └── pipelines.yaml ``` `tokens.yaml` carries the shared HMAC secrets referenced by `secret_ref`. ______________________________________________________________________ ## Sender Example `config.yaml` ```yaml include: - tokens.yaml - relay-instances.yaml - pipelines.yaml service: name: home-primary tick_interval: 60s log_level: info plugin_roots: - /opt/ductile/plugins api: enabled: true listen: 127.0.0.1:8080 state: path: ./data/state.db ``` `tokens.yaml` ```yaml tokens: - name: relay-lab-v1 key: ${RELAY_LAB_V1_SECRET} scopes_file: scopes/relay-admin.json scopes_hash: blake3:1111111111111111111111111111111111111111111111111111111111111111 ``` `relay-instances.yaml` ```yaml instances: - name: lab enabled: true base_url: https://lab.example ingress_path: /ingest/peer/home-primary secret_ref: relay-lab-v1 key_id: v1 timeout: 10s allow: - backup.ready ``` ______________________________________________________________________ ## Receiver Example `config.yaml` ```yaml include: - tokens.yaml - relay-ingress.yaml - pipelines.yaml service: name: lab tick_interval: 60s log_level: info plugin_roots: - /opt/ductile/plugins api: enabled: true listen: 127.0.0.1:8080 state: path: ./data/state.db ``` `tokens.yaml` ```yaml tokens: - name: relay-lab-v1 key: ${RELAY_LAB_V1_SECRET} scopes_file: scopes/relay-admin.json scopes_hash: blake3:1111111111111111111111111111111111111111111111111111111111111111 ``` `relay-ingress.yaml` ```yaml remote_ingress: listen_path: /ingest/peer max_body_size: 1MB allowed_clock_skew: 5m require_key_id: true peers: - name: home-primary enabled: true secret_ref: relay-lab-v1 key_id: v1 accept: - backup.ready baggage: allow: - trace_id ``` `pipelines.yaml` ```yaml pipelines: - name: process-offsite-backup on: backup.ready steps: - id: verify-backup uses: backup-verifier - id: store-backup uses: cold-storage-sync ``` ______________________________________________________________________ ## End-to-End Example Expected flow: 1. `home-primary` emits or prepares `backup.ready`. 1. `home-primary` signs and `POST`s the relay envelope to `lab`. 1. `lab` accepts `backup.ready` from peer `home-primary`. 1. `lab` enqueues local jobs for `process-offsite-backup`. 1. `lab` runs `backup-verifier` and `cold-storage-sync` according to its own local config. CLI example: ```bash ductile relay send lab \ --event backup.ready \ --payload '{"archive_path":"/srv/backups/latest.tar.zst","archive_id":"nightly-2026-05-03"}' \ --dedupe-key backup.ready:nightly-2026-05-03 \ --origin-plugin backup-runner \ --origin-job-id job-123 \ --origin-event-id evt-456 \ --baggage '{"trace_id":"tr-789"}' ``` Wire shape: ```json { "event": { "type": "backup.ready", "payload": { "archive_path": "/srv/backups/latest.tar.zst", "archive_id": "nightly-2026-05-03" }, "dedupe_key": "backup.ready:nightly-2026-05-03" }, "origin": { "instance": "home-primary", "plugin": "backup-runner", "job_id": "job-123", "event_id": "evt-456" }, "baggage": { "trace_id": "tr-789" } } ``` Headers: - `X-Ductile-Peer` - `X-Ductile-Key-Id` - `X-Ductile-Timestamp` - `X-Ductile-Signature` The signature covers: - HTTP method - request path - timestamp - raw request body ______________________________________________________________________ ## Operational Notes - Operator-facing instance and peer names should be lower-case hyphenated, for example `home-primary` or `vps-backup`. - Event types remain lower-case dotted, for example `backup.ready`. - `remote_ingress.listen_path` is mounted on the main HTTP server and therefore uses `api.listen`. - `secret_ref` must resolve to a `tokens.yaml` entry on both sides. - `peers[].accept` and `instances[].allow` are optional policy filters, not distributed routing rules. - Remote baggage is not trusted wholesale. Only keys listed in `peers[].baggage.allow` may seed new local root context. ______________________________________________________________________ ## Failure Semantics - If delivery fails before acceptance, the sender owns the failure. - If the receiver accepts the event and downstream work later fails, the receiver owns that failure. - Delivery remains at-least-once. Duplicate safe behavior still matters. ______________________________________________________________________ ## What To Check When It Fails - `service.name` matches the sender identity used on the wire. - `secret_ref` resolves to the same shared secret on both sides. - `key_id` matches if `require_key_id: true`. - `allowed_clock_skew` is large enough for the two clocks. - `accept` includes the event type being relayed. - `api.listen` is reachable at the receiver. # Reference # Ductile: Configuration Specification **Version:** 1.1 (Tiered Directory Model)\ **Date:** 2026-02-25\ **Status:** Approved This document defines the configuration structure, integrity verification, and runtime compilation behavior for Ductile. ______________________________________________________________________ ## 1. Directory Structure Ductile uses a configuration directory, typically located at `~/.config/ductile/`. Only `config.yaml` is implicitly loaded; all other files must be referenced via `include:`. ```text ~/.config/ductile/ ├── config.yaml # [Operational] Service-level settings ├── webhooks.yaml # [High Security] Webhook endpoints & secrets (include explicitly) ├── tokens.yaml # [High Security] API token registry (include explicitly) ├── relay-instances.yaml # [Operational] Outbound named relay targets (include explicitly) ├── relay-ingress.yaml # [Operational] Inbound trusted relay peers (include explicitly) ├── routes.yaml # [Operational] Global routing rules (include explicitly) └── scopes/ # [High Security] Token scope definitions ├── admin-cli.json └── github-integration.json ``` ______________________________________________________________________ ## 2. Tiered Integrity Preflight Before starting, the system verifies all files against a monolithic `.checksums` manifest located in the configuration root. Integrity is enforced in two tiers: | Tier | Files | Missing/Mismatch Behavior | | ----------------- | -------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------- | | **High Security** | `tokens.yaml`, `webhooks.yaml`, `scopes/*.json` | **Hard Fail**: System refuses to start (EX_CONFIG). | | **Operational** | `config.yaml`, `routes.yaml`, `relay-instances.yaml`, `relay-ingress.yaml` | **Warn & Continue**: Logs a warning but loads the file (Unless `strict_mode: true` is set, in which case it is a **Hard Fail**). | ### 2.1 The Seal (`.checksums`) The `.checksums` file is a YAML manifest containing BLAKE3 hashes indexed by the **absolute path** of every authorized file. - **System Lock-in**: Moving the configuration directory breaks the seal. - **Authorization**: The `ductile config lock` command is the only way to update the manifest. ______________________________________________________________________ ## 3. Monolithic Compilation (Grafting) At runtime, the gateway compiles all discovered files into a single, monolithic configuration object. ### 3.1 Merge Logic - **Root First**: `config.yaml` is loaded first as the base. - **Explicit Includes**: Additional files are loaded from the `include:` list (and any directories listed there) in order. - **Precedence**: Later entries override earlier ones (n-1 branching). - **Matching Branches**: - **Maps (e.g., `plugins:`)**: Keys are merged. Duplicate keys are overridden by the later file. - **Arrays (e.g., `pipelines:`, `routes:`)**: Items are **appended** to the list. - **Scalars**: Later values replace earlier ones. ### 3.2 Modular Example **config.yaml (Root)** ```yaml include: - pipelines.yaml service: name: my-gateway ``` **pipelines.yaml** ```yaml pipelines: - name: video-wisdom on: discord.link ``` **Resulting Monolith:** ```yaml service: name: my-gateway pipelines: - name: video-wisdom ``` ### 3.3 Directory includes `include:` entries may point at directories. Ductile loads `*.yaml` files from that directory (non-recursive) in alphabetical order and merges them as if they were listed explicitly. ### 3.4 Naming convention for operator-facing instance identifiers When config introduces an operator-facing identifier for a Ductile instance, peer, or similarly named runtime endpoint, use lower-case hyphenated names: - `home-primary` - `lab` - `vps-backup` Do not use: - underscores: `home_primary` - spaces: `home primary` - mixed case: `HomePrimary` Recommended pattern: ```text ^[a-z0-9]+(?:-[a-z0-9]+)*$ ``` Rationale: - reads cleanly in YAML and logs - maps directly to URL path segments - avoids competing conventions for operator-facing identities - keeps names distinct from Go identifiers and internal field names `service.name` is an operator-facing identity field and should follow this convention when it names a concrete Ductile instance rather than a generic service label. ______________________________________________________________________ ## 4. File Formats ### 4.1 config.yaml (Service settings) ```yaml service: name: ductile tick_interval: 60s log_level: info log_format: json dedupe_ttl: 24h job_log_retention: 30d job_queue_retention: 24h # Omit to use the default: max(1, CPU-1). Set to 1 to force global serial dispatch. max_workers: 4 strict_mode: true # Enforce integrity & configuration checks on startup plugin_roots: - /opt/ductile/plugins - /opt/ductile/plugins-private api: enabled: true listen: 127.0.0.1:8080 state: path: ./data/state.db # macOS-only. Each path is stat()-ed once on cold start (after PID lock, # before "ductile running" log). Triggers any pending TCC popup for the # Files-and-Folders service that gates the path. Runs synchronously while # the operator is at the keyboard for the deploy. No-op on non-darwin and # when the list is empty. Skipped on SIGHUP reload (binary cdhash # unchanged → existing grants still valid). # # Configure local-volume paths only. An unreachable network mount blocks # os.Stat for the filesystem-level timeout (seconds to minutes) and # delays gateway readiness during the cold-start prewarm. tcc_paths: - /Users/me/Documents/Obsidian # triggers Documents grant - /Volumes/Projects # triggers NetworkVolumes grant ``` Relative paths (like `./data/state.db`) are resolved against the directory containing `config.yaml`. `dedupe_ttl` uses recent terminal rows in `job_queue`, so `job_queue_retention` must be at least as long as `dedupe_ttl`. The defaults are both 24h. > **Note:** the core does not provision per-job filesystem workspaces; the `workspace:` config section has been removed. Plugins that need a scratch path manage it themselves — see `docs/PLUGIN_DEVELOPMENT.md` §9. `plugin_roots` is the multi-root setting. Discovery behavior: - Duplicate roots are ignored after first occurrence. - Roots are scanned in order; if duplicate plugin names exist across roots, the first discovered plugin is kept and later duplicates are ignored. ### 4.2 Plugin definitions (included file) ```yaml plugins: echo: enabled: true parallelism: 1 notify_on_complete: true # Opt-in to job.completed lifecycle signals schedules: # Optional; omit for event-driven plugins - id: default every: 5m config: message: "Hello" ``` ### 4.2.1 Concurrency controls - `service.max_workers`: Global worker cap across all plugins. If omitted, Ductile uses `max(1, CPU-1)`. Set this to `1` to force whole-system serial dispatch. - `plugins..parallelism`: Per-plugin concurrency cap. - Constraint: `1 <= parallelism <= max_workers`. Manifest interaction: - Plugins may declare `concurrency_safe: false` in `manifest.yaml`; omitted means `true`. - The manifest hint is the plugin author's safety declaration. Operators use `plugins..parallelism` to choose how much same-plugin concurrency to allow within the global `service.max_workers` cap. ### 4.3 webhooks.yaml (High Security - Experimental) > [!IMPORTANT]\ > Webhook support is currently in early development and may not be fully functional in the current MVP. ```yaml webhooks: - name: github path: /webhook/github plugin: github-handler secret_ref: github_webhook_secret signature_header: X-Hub-Signature-256 ``` See [WEBHOOKS.md](https://ductile.run/WEBHOOKS/index.md) for full configuration details, include-mode caveats, and signing examples. ______________________________________________________________________ ## 4.4 tokens.yaml (High Security) ```yaml tokens: - name: admin-cli key: ${ADMIN_API_KEY} scopes_file: scopes/admin-cli.json scopes_hash: blake3:a3f8c2d9... ``` ______________________________________________________________________ ## 4.5 routes.yaml (Operational - Experimental) > [!IMPORTANT]\ > Global routing rules via `routes.yaml` are experimental. Most users should prefer the `pipelines` DSL for orchestration. ```yaml routes: - from: source-plugin event_type: event.name to: target-plugin ``` ______________________________________________________________________ ## 4.6 relay-instances.yaml (Operational - Experimental) `relay-instances.yaml` defines named outbound Remote Event Relay targets. ```yaml instances: - name: lab enabled: true base_url: https://lab.example ingress_path: /ingest/peer/home-primary secret_ref: relay-lab-v1 key_id: v1 timeout: 10s allow: - backup.ready - report.generated ``` Notes: - `name` is the stable operator-facing alias used by sender-side config. - `base_url` must be an absolute `http` or `https` URL. - `ingress_path` is the receiver path that accepts the trusted relay request. - `secret_ref` points at a `tokens.yaml` entry used as the shared HMAC secret. - `allow` is an optional sender-side event-type allowlist. ______________________________________________________________________ ## 4.7 relay-ingress.yaml (Operational - Experimental) `relay-ingress.yaml` defines inbound trusted peers and the local acceptance policy for Remote Event Relay. ```yaml remote_ingress: listen_path: /ingest/peer max_body_size: 1MB allowed_clock_skew: 5m require_key_id: true peers: - name: home-primary enabled: true secret_ref: relay-lab-v1 key_id: v1 accept: - backup.ready baggage: allow: - trace_id - requested_by ``` Notes: - `listen_path` is the trusted relay ingress root mounted on Ductile's HTTP server. - Relay ingress listens on `api.listen`; it does not introduce a separate listener address in Phase 1. - `allowed_clock_skew` controls timestamp validation for replay-window hardening. - `require_key_id` requires `X-Ductile-Key-Id` on inbound requests. - `peers[].accept` is an optional receiver-side event-type allowlist. - `peers[].baggage.allow` is a local policy for which remote baggage keys may seed new local root context. Accepted relay requests are treated as fresh local root ingress events: - the receiver performs normal local enqueue - the receiver performs normal local exact-match routing - no cross-instance `event_context` lineage is created See [REMOTE_EVENT_RELAY.md](https://ductile.run/REMOTE_EVENT_RELAY/index.md) for a user-level guide and an end-to-end example. ______________________________________________________________________ ## 5. Authentication Configuration Ductile authentication is configured within the `api` section of the configuration (typically in `config.yaml` or a dedicated `auth.yaml`). ### 5.1 Scoped Tokens For multi-user or production environments. ```yaml api: auth: tokens: - token: admin_token scopes: ["*"] - token: readonly_token scopes: ["plugin:ro", "jobs:ro", "events:ro"] - token: operator_token scopes: ["plugin:rw", "jobs:rw", "events:ro"] ``` ### 5.2 Token Scopes Scopes are explicit: - `*`: Full admin access. - `plugin:ro`, `plugin:rw`: Plugin and pipeline trigger access. - `jobs:ro`, `jobs:rw`: Job read/write access. - `events:ro`, `events:rw`: Event stream access. ______________________________________________________________________ ## 6. Environment Interpolation Interpolation of `${VAR}` syntax happens **after** integrity verification but **before** YAML parsing. - Secrets must never be stored in YAML files; use environment variables. - Interpolation is **forbidden** in file paths (e.g., `include:` or directory walking) to ensure a static, verifiable tree. ### 6.1 Environment file includes You can preload env vars from `.env` files before interpolation: ```yaml environment_vars: include: - .env ``` Notes: - Paths are resolved relative to the file declaring the include. - Existing process environment variables are not overridden. # Ductile: API Reference This document provides a comprehensive reference for the Ductile REST API. ## Base URL Default: `http://localhost:8080` ## Authentication All API requests (except `/healthz`, `/plugins`, `/skills`, `/openapi.json`, `/.well-known/ai-plugin.json`, and `/plugin/{name}/openapi.json`) require a Bearer token in the `Authorization` header. ```http Authorization: Bearer ``` Ductile uses scoped tokens configured in `api.auth.tokens`, with explicit scopes (e.g., `plugin:rw`, `jobs:ro`, `events:ro`). ______________________________________________________________________ ## Endpoints ### 0. Root / Discovery Unauthenticated discovery index for humans and agents. **Endpoint**: `GET /` **Response (200 OK)**: ```json { "name": "Ductile Gateway", "description": "Lightweight, open-source integration engine for the agentic era.", "uptime_seconds": 3600, "discovery": { "health": "/healthz", "skills": "/skills", "plugins": "/plugins", "openapi": "/openapi.json", "ai_plugin": "/.well-known/ai-plugin.json" } } ``` ______________________________________________________________________ ### 1. Direct Plugin Execution Execute one plugin command directly. This **bypasses pipeline routing** and enqueues exactly one job. **Endpoint**: `POST /plugin/{plugin}/{command}` **Required scopes**: - `plugin:ro` for manifest `read` commands - `plugin:rw` (or `*`) for manifest `write` commands **Request Body**: ```json { "payload": { "key1": "value1", "key2": "value2" } } ``` **Fields**: - `payload` (Object, optional): JSON object passed to the command. - For `handle`, the server wraps payload into an `api.trigger` event envelope before enqueue. **Response (202 Accepted)**: ```json { "job_id": "uuid-v4", "status": "queued", "plugin": "plugin_name", "command": "command_name" } ``` **Example (curl)**: ```bash curl -X POST http://localhost:8080/plugin/echo/poll \ -H "Authorization: Bearer test_token" \ -H "Content-Type: application/json" \ -d '{"payload":{"message":"Hello API"}}' ``` ______________________________________________________________________ ### 2. Explicit Pipeline Execution Trigger a named pipeline directly. **Endpoint**: `POST /pipeline/{pipeline}` **Required scopes**: - `plugin:rw` (or `*`) **Request Body**: ```json { "payload": { "url": "https://example.com/article" } } ``` **Query Parameters**: - `async` (Boolean, optional): If `true`, force asynchronous response. **Behavior**: - Pipeline entry dispatches are resolved first. - `execution_mode: synchronous` waits for completion unless `?async=true`. - Synchronous mode with multiple entry dispatches returns `400`; use `?async=true` for fan-out entry pipelines. **Response (Async default - 202 Accepted)**: ```json { "job_id": "uuid-v4", "status": "queued", "plugin": "pipeline", "command": "pipeline_name" } ``` **Response (Synchronous success - 200 OK)**: ```json { "job_id": "uuid-v4", "status": "succeeded", "duration_ms": 1250, "result": { "status": "ok" }, "tree": [ { "job_id": "uuid-v4", "plugin": "plugin_name", "command": "command_name", "status": "succeeded", "result": { "status": "ok" } } ] } ``` **Response (Timeout - 202 Accepted)**: ```json { "job_id": "uuid-v4", "status": "running", "timeout_exceeded": true, "message": "Pipeline still running after timeout." } ``` **Example (curl)**: ```bash curl -X POST http://localhost:8080/pipeline/url-to-fabric \ -H "Authorization: Bearer test_token" \ -H "Content-Type: application/json" \ -d '{"payload":{"url":"https://example.com"}}' ``` ______________________________________________________________________ ### 2.5 Job Tree Retrieve the execution tree for a pipeline job, including all child jobs. **Endpoint**: `GET /job/{job_id}/tree` **Response (200 OK)**: ```json [ { "job_id": "uuid-v4", "parent_job_id": null, "plugin": "pipeline", "command": "pipeline_name", "status": "succeeded", "result": { "status": "ok" }, "started_at": "2026-02-13T10:00:01Z", "completed_at": "2026-02-13T10:00:05Z" }, { "job_id": "uuid-v4-child", "parent_job_id": "uuid-v4", "plugin": "plugin_name", "command": "command_name", "status": "succeeded", "result": { "status": "ok" }, "started_at": "2026-02-13T10:00:02Z", "completed_at": "2026-02-13T10:00:04Z" } ] ``` ______________________________________________________________________ ### 3. Job Status and Results Retrieve the current status and execution results of a job. **Endpoint**: `GET /job/{job_id}` **Response (200 OK)**: ```json { "job_id": "uuid-v4", "status": "completed", "plugin": "echo", "command": "poll", "submitted_by": "api", "created_at": "2026-02-13T10:00:00Z", "started_at": "2026-02-13T10:00:01Z", "completed_at": "2026-02-13T10:00:02Z", "result": { "status": "ok", "result": "Echoed: Hello API", "events": [], "state_updates": {}, "logs": [ {"level": "info", "message": "Echoed: Hello API"} ] } } ``` **Job Statuses**: - `queued`: Awaiting dispatch. - `running`: Currently executing. - `succeeded`: Finished successfully. - `failed`: Finished with an error. - `timed_out`: Exceeded execution deadline. - `dead`: Failed and exhausted all retries. ______________________________________________________________________ ### 5. Jobs List List jobs with optional filtering. Requires `jobs:ro`, `jobs:rw`, or `*` scope. **Endpoint**: `GET /jobs` **Query Parameters**: - `plugin` (String, optional): Exact plugin name filter. - `command` (String, optional): Exact command name filter. - `status` (String, optional): Job status filter. Accepted values: - Native: `queued`, `running`, `succeeded`, `failed`, `timed_out`, `dead` - Aliases: `pending` -> `queued`, `ok` -> `succeeded`, `error` -> `failed` - `limit` (Integer, optional): Max rows returned. Default: `50`. **Response (200 OK)**: ```json { "jobs": [ { "job_id": "uuid-v4", "plugin": "withings", "command": "poll", "status": "succeeded", "created_at": "2026-02-21T10:00:00Z", "started_at": "2026-02-21T10:00:01Z", "completed_at": "2026-02-21T10:00:02Z", "attempt": 1 } ], "total": 42 } ``` Results are sorted by `created_at` descending (most recent first). ______________________________________________________________________ ### 6. Job Logs Query stored job log records for audit and troubleshooting. Requires `jobs:ro`, `jobs:rw`, or `*` scope. **Endpoint**: `GET /job-logs` **Query Parameters**: - `job_id` (String, optional): Filter by job id. - `plugin` (String, optional): Exact plugin name filter. - `command` (String, optional): Exact command name filter. - `status` (String, optional): Job status filter (same values as `/jobs`). - `submitted_by` (String, optional): Exact submitter filter. - `from` (RFC3339, optional): Completed-at lower bound. - `to` (RFC3339, optional): Completed-at upper bound. - `query` (String, optional): Full-text search over `last_error`, `stderr`, and `result`. - `limit` (Integer, optional): Max rows returned (default 50, max 200). - `include_result` (Boolean, optional): Include full `result` payloads. **Response (200 OK)**: ```json { "logs": [ { "job_id": "uuid-v4", "log_id": "uuid-v4-1", "plugin": "withings", "command": "poll", "status": "failed", "attempt": 1, "submitted_by": "api", "created_at": "2026-02-21T10:00:00Z", "completed_at": "2026-02-21T10:00:02Z", "last_error": "token expired", "stderr": "stack trace..." } ], "total": 42 } ``` Results are sorted by `completed_at` descending (most recent first). ______________________________________________________________________ ### 7. System Health Unauthenticated endpoint for health checks. Typically used by monitoring tools or load balancers. **Endpoint**: `GET /healthz` **Response (200 OK)**: ```json { "status": "ok", "uptime_seconds": 3600, "queue_depth": 0, "plugins_loaded": 5, "config_path": "/etc/ductile", "binary_path": "/usr/local/bin/ductile", "version": "1.0.0-rc.1" } ``` ______________________________________________________________________ ### 7. OpenAPI Discovery Unauthenticated endpoints for agent-driven capability discovery. Two-tier design: - **`/plugins`** — lightweight catalog for initial discovery (semantic signaling, minimal tokens) - **`/skills`** — unified skill index (atomic plugin skills + orchestrated pipeline skills) - **`/openapi.json`** — global OpenAPI 3.1 spec for all plugins - **`/plugin/{name}/openapi.json`** — scoped OpenAPI 3.1 spec for one chosen plugin - **`/.well-known/ai-plugin.json`** — OpenAI-style discovery manifest that points at `/openapi.json` #### Well-Known AI Plugin Manifest **Endpoint**: `GET /.well-known/ai-plugin.json` Returns service metadata for LLM agents and links to the global OpenAPI document. **Response (200 OK)**: ```json { "schema_version": "v1", "name_for_human": "Ductile Gateway", "name_for_model": "ductile", "description_for_human": "Integration gateway for triggering plugins and pipelines.", "description_for_model": "Discover and invoke plugins. Fetch /openapi.json for the full spec, or /plugin/{name}/openapi.json for a single plugin. Invoke commands via POST /plugin/{name}/{command}.", "auth": { "type": "bearer" }, "api": { "type": "openapi", "url": "/openapi.json" } } ``` #### Global OpenAPI **Endpoint**: `GET /openapi.json` Returns an OpenAPI 3.1 document for every discovered plugin command. #### Single Plugin (OpenAPI) **Endpoint**: `GET /plugin/{name}/openapi.json` Returns an OpenAPI 3.1 document scoped to one plugin. Use after selecting a plugin from the `/plugins` list. **Response (200 OK)**: ```json { "openapi": "3.1.0", "info": { "title": "Ductile Gateway", "version": "1.0" }, "paths": { "/plugin/echo/poll": { "post": { "operationId": "echo__poll", "summary": "Poll for data", "tags": ["echo"], "requestBody": { "required": false, "content": { "application/json": { "schema": { "type": "object", "properties": { "message": { "type": "string" } } } } } }, "responses": { "202": { "description": "Job queued" }, "400": { "description": "Bad request" }, "403": { "description": "Insufficient scope" } }, "security": [{ "BearerAuth": [] }] } } }, "components": { "securitySchemes": { "BearerAuth": { "type": "http", "scheme": "bearer" } } } } ``` **Graceful degradation:** - No `input_schema` in manifest → `requestBody` omitted - No `description` on command → summary defaults to `"{plugin}: {command}"` Returns `404` if the plugin is not found. ______________________________________________________________________ ### 8. Plugin Discovery List available plugins and retrieve their metadata/schemas. The list endpoints are unauthenticated to support lightweight agent discovery. #### List Plugins **Endpoint**: `GET /plugins` — **No auth required** **Response (200 OK)**: ```json { "plugins": [ { "name": "echo", "version": "0.1.0", "description": "A demonstration plugin", "commands": ["poll", "health"] } ] } ``` #### Get Plugin Details **Endpoint**: `GET /plugin/{name}` Requires `plugin:ro`, `plugin:rw`, or `*` scope. **Response (200 OK)**: ```json { "name": "echo", "version": "0.1.0", "description": "A demonstration plugin", "protocol": 2, "commands": [ { "name": "poll", "type": "write", "description": "Emits echo.poll events", "input_schema": { "type": "object", "properties": { "message": { "type": "string" } } } } ] } ``` ______________________________________________________________________ ### 9. Skills Index Unified, operator-facing capability index across both atomic plugin commands and named pipelines. #### List Skills **Endpoint**: `GET /skills` — **No auth required** **Response (200 OK)**: ```json { "skills": [ { "name": "plugin.echo.poll", "kind": "plugin", "description": "Emits echo.poll events", "endpoint": "/plugin/echo/poll", "tier": "WRITE", "plugin": "echo", "command": "poll" }, { "name": "pipeline.discord-fabric", "kind": "pipeline", "endpoint": "/pipeline/discord-fabric", "pipeline": "discord-fabric", "trigger": "discord.message", "execution_mode": "synchronous", "timeout_secs": 30 } ] } ``` Pipeline entries default to `execution_mode: "asynchronous"` when unset in config. ______________________________________________________________________ ### 10. System Reload Reload the configuration files without restarting the service. Requires `*` scope. **Endpoint**: `POST /system/reload` **Response (200 OK)**: ```json { "status": "ok", "reloaded_at": "2026-02-13T10:00:00Z", "message": "Configuration reloaded successfully." } ``` ______________________________________________________________________ ### 11. Analytics Retrieve operational metrics and summaries. Requires `*` scope. #### Queue Metrics **Endpoint**: `GET /analytics/queue` Returns current queue depth and worker utilization. #### Analytics Summary **Endpoint**: `GET /analytics/summary` Returns a summary of job statuses over the last 24 hours. **Response (200 OK)**: ```json { "window": "24h", "stats": { "succeeded": 450, "failed": 12, "dead": 2 } } ``` ______________________________________________________________________ ### 12. System Configuration Retrieve the reconciled system configuration with sensitive values redacted. Requires `*` scope. **Endpoint**: `GET /config/view` **Response (200 OK)**: ```json { "service": { "name": "ductile", "log_level": "info" }, "api": { "enabled": true, "listen": "127.0.0.1:8080", "tokens": [ { "scopes": ["*"] } ] }, "plugins": { "echo": { "enabled": true, "config": { "api_key": "[REDACTED]", "message": "Hello" } } }, "pipelines": [ { "name": "my-workflow", "on": "my.event" } ] } ``` Redaction rules: - Plugin config keys containing `secret`, `key`, `token`, `password`, etc., are replaced with `[REDACTED]`. - API token values are omitted (only scopes are shown). - High-security token keys are omitted. ______________________________________________________________________ ## Error Codes - `401 Unauthorized`: Missing or invalid Bearer token. - `403 Forbidden`: Token is valid but lacks the necessary scope for the requested action. - `404 Not Found`: The requested plugin, command, or job ID does not exist. - `400 Bad Request`: Invalid JSON body or missing required fields. - `500 Internal Server Error`: An unexpected error occurred on the server. # Ductile: Database Reference Ductile uses **SQLite 3** for all persistent state, job queuing, and execution history. This document provides the schema definitions and a collection of useful queries for operators. ______________________________________________________________________ ## Database Location The database is typically named `ductile.db` and resides in your configured `state.path` (default: `~/.config/ductile/ductile.db`). ______________________________________________________________________ ## Schema Overview ### Fact Rows vs Current Rows Ductile keeps current/cache rows for fast operational reads and append-only fact rows for durable explanation. Fact/history rows (durable record): - `plugin_facts` - `job_transitions` - `job_attempts` - `config_snapshots` - `event_context` - `job_log` - `circuit_breaker_transitions` Current/cache rows (derived views and operational state): - `job_queue` - `plugin_state` (compatibility view of the latest `plugin_facts` row per plugin) - `circuit_breakers` - `schedule_entries` `storage_sequences` is an internal allocator for Ductile-owned fact ordering. New `plugin_facts` rows receive a monotonic `seq`. Legacy `plugin_facts` rows may have `seq IS NULL`; those rows keep their old timestamp-only ordering and should not be treated as perfectly ordered facts. ### 1. `job_queue` The active work queue. Contains pending, running, and recently completed jobs. | Column | Type | Description | | ------------------ | ---- | ---------------------------------------------------------------- | | `id` | UUID | Unique job identifier. | | `status` | TEXT | `queued`, `running`, `succeeded`, `failed`, `timed_out`, `dead`. | | `plugin` | TEXT | Name of the plugin or alias. | | `command` | TEXT | The plugin command (e.g., `poll`, `handle`). | | `payload` | JSON | Input data for the plugin. | | `dedupe_key` | TEXT | Used to prevent duplicate enqueues. | | `event_context_id` | UUID | Reference to the baggage/context for this job. | ### 2. `job_log` The historical record of completed jobs. Used for auditing and the TUI "Overwatch." | Column | Type | Description | | ------------ | ---- | ----------------------------------------------- | | `result` | JSON | The full protocol response from the plugin. | | `stderr` | TEXT | Captured stderr (capped at 64 KB). | | `last_error` | TEXT | Human-readable error message if the job failed. | ### 3. `event_context` The "Control Plane" ledger. Stores metadata (Baggage) that propagates through pipelines. ### 4. `plugin_facts` Append-only durable record of plugin observations. Each row carries a stable snapshot a plugin emitted as `state_updates`, plus a manifest-declared `fact_type` and a Ductile-owned monotonic `seq`. **This is the durable plugin record.** New plugins should declare `fact_outputs` and emit a snapshot from their durable command. ### 5. `plugin_state` Compatibility/cache view of the latest fact, one row per plugin. Existing readers and legacy plugins still on direct write-through see the same shape they always have. The view is rebuilt automatically by core when a new fact lands, governed by the manifest's `compatibility_view` (currently `mirror_object`). For existing databases, apply required schema migrations before deploy. Startup validation reports the migration script to run when the database is behind the runtime schema. ### 6. `schedule_entries` The persistent state of the scheduler. Tracks when each schedule last fired and when it is due next. ### 7. `circuit_breakers` Current-state compatibility/cache row for scheduled poll circuit breakers. ### 8. `circuit_breaker_transitions` Append-only transition facts for circuit breakers. Use this table to explain why a breaker opened, moved half-open, closed, or was manually reset. ______________________________________________________________________ ## Useful Operator Queries ### System Health ```sql -- Count jobs by status SELECT status, COUNT(*) FROM job_queue GROUP BY status; -- Identify plugins with active circuit breakers SELECT plugin, command, state, failure_count, opened_at FROM circuit_breakers WHERE state != 'closed'; -- Show recent breaker transitions SELECT created_at, plugin, command, from_state, to_state, reason, failure_count, job_id FROM circuit_breaker_transitions WHERE plugin = 'my-plugin' AND command = 'poll' ORDER BY created_at DESC LIMIT 20; -- Check for stuck "running" jobs (orphans) SELECT id, plugin, command, started_at FROM job_queue WHERE status = 'running' AND started_at < datetime('now', '-1 hour'); ``` ### Performance & Troubleshooting ```sql -- Find the slowest successful jobs in the last 24 hours SELECT plugin, command, (strftime('%s', completed_at) - strftime('%s', started_at)) as duration_sec FROM job_log WHERE status = 'succeeded' AND completed_at > datetime('now', '-1 day') ORDER BY duration_sec DESC LIMIT 10; -- Get the latest error for a specific plugin SELECT completed_at, last_error, stderr FROM job_log WHERE plugin = 'my-plugin' AND status = 'failed' ORDER BY completed_at DESC LIMIT 1; -- Inspect recent append-only plugin facts (the durable record) SELECT seq, created_at, fact_type, job_id, command, fact_json FROM plugin_facts WHERE plugin_name = 'file_watch' ORDER BY CASE WHEN seq IS NULL THEN 1 ELSE 0 END ASC, seq DESC, created_at DESC LIMIT 20; -- Inspect a plugin's compatibility view (latest fact, mirrored) SELECT state FROM plugin_state WHERE plugin_name = 'my-plugin'; ``` ### Scheduler Inspection ```sql -- See upcoming scheduled runs SELECT plugin, schedule_id, next_run_at, last_success_at FROM schedule_entries WHERE status = 'active' ORDER BY next_run_at ASC; ``` ______________________________________________________________________ ## Maintenance ### Manual Cleanup Ductile automatically prunes `job_log` after 30 days, but you can manually vacuum or prune if needed: ```bash # Prune logs older than 7 days sqlite3 ductile.db "DELETE FROM job_log WHERE completed_at < datetime('now', '-7 days');" # Reclaim disk space sqlite3 ductile.db "VACUUM;" ``` ### Performance Tuning Ductile enables **WAL mode** and **Synchronous=NORMAL** by default for optimal performance on SSDs. You can verify this via: ```sql PRAGMA journal_mode; PRAGMA synchronous; ``` # Testing Guide This document defines the target-state testing strategy for Ductile. The goal is to preserve developer velocity during normal branch work while adding stronger runtime/system confidence gates before merge and after changes land on `main`. ______________________________________________________________________ ## 1. Testing Model Ductile uses a **staged testing strategy**: 1. **Fast tests for branch development** 1. Used constantly during normal implementation and remediation work. 1. Optimized for feedback speed and iteration velocity. 1. **Docker-backed tests for complex/system-level assurance** 1. Used during complex development when realism matters. 1. Required before merge to `main`. 1. **Full validation on `main` after merge** 1. Ensures trunk health on the actual merged state. This guide is intentionally about **assurance workflow**, not product/runtime commands. Testing orchestration belongs to repository tooling rather than the `ductile` binary. ______________________________________________________________________ ## 2. Branch Development Policy For day-to-day branch development, the default loop is the fast existing test suite. ### Default fast inner loop ```bash go test ./... ``` This is the baseline command for: - feature branches - remediation branches - refactors - fast local iteration ### Why the fast loop stays fast The default branch-development loop should optimize for: - quick feedback - frequent execution - low friction during implementation To preserve velocity, the fast inner loop should **not require Docker** and should **not be overloaded with every gate**. ______________________________________________________________________ ## 3. Docker-Backed Testing Policy Docker-backed tests are used for **complex or environment-sensitive behavior** where standard tests provide less confidence. ### Docker tests are for - webhook ingress - scheduler persistence and restart recovery - authenticated API end-to-end flows - plugin runtime/process behavior - append-only fact persistence and compatibility-state derivation - realistic service startup/configuration behavior ### Docker tests are not for - duplicating every Go test in containers - replacing fast local tests - being mandatory on every small branch iteration ### Development usage Docker-backed tests are **selective during development**: - use them for system-level work - use them when reproducing runtime-sensitive issues - use them before merge as part of the required gate ______________________________________________________________________ ## 4. Repository Test Commands Testing orchestration should live in repository tooling under `scripts/`, not in the `ductile` CLI. ### Canonical script surface - `scripts/test-fast` - `scripts/test-docker` - `scripts/test-premerge` - `scripts/test-main` ### Optional convenience wrappers A `Makefile` may provide wrappers such as: - `make test` - `make test-docker` - `make test-premerge` - `make test-main` If Make targets exist, they should wrap the canonical scripts rather than duplicating logic. ______________________________________________________________________ ## 5. Intended Meaning of Each Script ### `scripts/test-fast` Purpose: - normal branch-development assurance - fastest frequent local validation Expected scope: - standard fast repo tests Recommended initial behavior: ```bash go test ./... ``` ### `scripts/test-docker` Purpose: - Docker-backed runtime/system validation - selective during development - required in pre-merge and main validation flows Expected scope: - fixture-driven Docker scenarios - black-box/high-value runtime validation - teardown and artifact capture on failure ### `scripts/test-premerge` Purpose: - merge-grade assurance before landing to `main` Expected scope: - fast tests - lint/static checks - Docker-backed validation ### `scripts/test-main` Purpose: - full post-merge validation of trunk health Expected scope initially: - same coverage as pre-merge This may later expand to include broader smoke/regression suites. ______________________________________________________________________ ## 6. Lint Placement Policy To preserve branch-development velocity: - **lint is not required in the default fast inner loop** - **lint is required in pre-merge and main validation** This gives the desired balance: - velocity during development - stronger gates before merge and on trunk ______________________________________________________________________ ## 7. Pre-Merge Policy Before merging to `main`, the branch must pass: - the fast standard suite - lint/static checks - the Docker-backed validation suite Conceptually, pre-merge validation is: ```text scripts/test-fast + golangci-lint run ./... + scripts/test-docker ``` This is the required assurance level for merge readiness. ______________________________________________________________________ ## 8. Main Branch Policy `main` must receive a **full validation pass after merge**. ### Initial definition of full validation on `main` - `scripts/test-fast` - `golangci-lint run ./...` - `scripts/test-docker` This means `main` validation is initially **at least as strong as pre-merge validation**. ### Why `main` validation is distinct Pre-merge validation protects the merge boundary. Post-merge validation protects the actual merged state of trunk. This matters because: - nearby merges may interact unexpectedly - rebases and branch timing can hide integration issues - trunk health affects all contributors ### Main failures Failures on `main` should be treated as **trunk-health issues** and triaged promptly. Docker-backed failures on `main` should retain and surface their artifacts. ### Initial definition of `scripts/test-main` Initially, `scripts/test-main` should mean: - `scripts/test-fast` - `golangci-lint run ./...` - `scripts/test-docker` This keeps `main` validation at least as strong as pre-merge validation while leaving room to grow later. ### Future expansion Heavier smoke/regression or stress-oriented suites may be added later as `main`-only or scheduled validation, but they are not required for the initial phase of this testing strategy. ______________________________________________________________________ ## 9. Docker Harness Design Direction The Docker-backed harness should follow these principles: - **fixture-driven** - **Docker Compose based** - **explicit readiness checks** (not just sleeps) - **black-box/high-value scenarios** - **automatic artifact capture on failure** ### First-wave Docker scenarios The first high-value Docker scenarios are: - `webhook-ingress` - `scheduler-recovery` - `api-e2e` These are intentionally narrow and complement the fast test suite rather than replacing it. #### `webhook-ingress` Goal: - validate end-to-end inbound webhook handling in a real service/container environment Should verify: - service boots with webhook configuration - valid signed request is accepted - invalid signature is rejected - oversized request is rejected when configured - queued work exists after successful ingress #### `scheduler-recovery` Goal: - validate restart/crash recovery behavior using persisted runtime state Should verify: - service starts with scheduler enabled - a running/orphaned job exists before restart - service restart occurs - orphan recovery transitions work correctly after restart #### `api-e2e` Goal: - validate authenticated API and pipeline behavior with real config and runtime setup Should verify: - service boots with real config - authenticated API requests succeed - unauthorized requests fail appropriately - pipeline/API triggers create expected queued work - status/readback behavior works from the running service #### `hook-route-compilation` Goal: - validate hook runtime behavior against a real running service Should verify: - a root job with `notify_on_complete: true` fires a hook pipeline - hook-entry `call:` expands into the called pipeline entry - exactly one hook job is enqueued for the scenario - hook dispatch remains root-level rather than creating pipeline step context - the hook job payload is the expected lifecycle event envelope #### `sync-terminal-route` Goal: - validate synchronous API result selection against compiled terminal routes Should verify: - a synchronous pipeline returns `200 OK` - the compiled `if:` root appears as a `core.switch` job rather than a skipped user step - a skipped earlier step does not become the returned result - the returned result comes from the actual terminal routed step - the runtime still records the expected job completion story in the database #### `conditional-with-route` Goal: - validate compiled `if:` routing and `with:` remapping against a real running service Should verify: - a compiled `if:` step becomes a real `core.switch` hop at runtime - the false branch bypasses the gated plugin and still reaches the downstream step - the true branch runs the gated plugin and preserves the expected parent/child route shape - the gated plugin receives the `with:`-remapped payload values on the true branch - route depth and max-depth control-plane state persist in `event_context` #### `from-plugin-scoping` Goal: - validate the `from_plugin:` selector against a real running service Should verify: - a hook pipeline with `from_plugin:` matches only when the upstream plugin equals the selector - the same hook signal from a different upstream plugin does not fire the scoped pipeline - a co-resident hook pipeline without `from_plugin:` continues to fire for every matching lifecycle signal (regression for today's behaviour) - the compiled-route inspection (`GET /config/view`) surfaces `source_plugin` on the scoped route #### `context-aware-trigger-if` Goal: - validate pipeline-level `if:` evaluating against the upstream job's accumulated durable context Should verify: - a baggage value claimed by an upstream pipeline step is visible to a downstream pipeline's `if:` predicate as `context.*` - predicate true → dispatch fires - predicate false → dispatch is suppressed (no `core.switch`) - a route fired without upstream context (e.g. from webhook ingress) with a `context.*` predicate is suppressed (absent context evaluates to false, no error) ### Deferred wave-2 concerns These are valuable, but should not be in the first Docker wave: - reload/restart nuance beyond initial recovery scenarios - plugin runtime matrix testing - multi-hop expansion suites - load/stress validation - broad config matrix coverage Wave 1 should stay small, high-value, and stable. ______________________________________________________________________ ## 10. Docker Harness Architecture The Docker-backed test harness should be **fixture-driven** and orchestrated through repository tooling rather than product commands. ### Design principles - use Docker only where runtime/system realism adds meaningful confidence - keep scenarios black-box and high-value - avoid duplicating the entire Go test suite in containers - make local and CI execution use the same harness entry points ### Recommended structure Use named fixtures representing focused system scenarios. For example: ```text test/fixtures/docker/webhook-ingress/ test/fixtures/docker/scheduler-recovery/ test/fixtures/docker/api-e2e/ ``` Each fixture should define the inputs needed for that scenario, such as: - config files - fixture data - environment variables - compose overrides if required - scenario-specific assertion inputs ### Orchestration mechanism Use **Docker Compose** for harness orchestration. Recommended behavior: - shared Compose base where possible - fixture-specific overrides where needed - explicit startup and teardown lifecycle - one harness runner orchestrating one or more fixtures ### Readiness policy Readiness must be explicit. The harness should: - wait for service health or known-ready endpoints - use retries/timeouts - fail clearly if readiness is not achieved Avoid relying on arbitrary sleeps as the primary readiness mechanism. ### Fixture execution model The harness should follow a consistent lifecycle: 1. select fixture(s) 1. start services 1. wait for readiness 1. run black-box assertions 1. collect artifacts on failure 1. tear down by default ### Local targeting The design should support: - running all Docker fixtures - running a single named fixture for focused development work This keeps Docker usable during complex development without forcing full-suite runtime on every change. ## 11. Failure Artifact Policy for Docker Tests On Docker-backed test failure, the harness should automatically capture artifacts under a predictable path such as: ```text test-artifacts/docker/// ``` Recommended minimum artifact set: - container/service logs - fixture/config inputs - scenario/assertion log - failed HTTP responses where applicable - DB snapshot where applicable ### Recommended artifact structure ```text test-artifacts/docker/// run.log scenario.log compose.log service-*.log responses/ config/ db/ ``` ### Behavior - keep artifacts on local failure - upload artifacts on CI failure - tear down containers after artifact capture by default - optional debug/preserve mode may be added later ### Design intent Docker failures should be diagnosable without immediately rerunning in manual debug mode. Artifact capture should therefore be automatic rather than opt-in. ______________________________________________________________________ ## 12. CI Policy CI should mirror the local staged testing model rather than inventing separate workflows. ### Branch / PR CI Always run fast validation: - `go test ./...` - `golangci-lint run ./...` This keeps standard branch feedback fast and useful. ### Pre-merge gate Before merge, require merge-grade validation: - `scripts/test-premerge` Conceptually this means: - fast tests - lint/static checks - Docker-backed validation ### `main` CI Run full validation on merged trunk: - `scripts/test-main` Initially, `scripts/test-main` should provide at least the same coverage as pre-merge validation. ### CI job visibility CI should expose at least separate visible checks for: - fast validation - Docker validation This makes failures easier to diagnose and rerun. ### CI design principles - CI should mirror the local staged-testing model - CI should invoke the same canonical scripts developers use locally - Docker validation should be a required pre-merge gate - `main` should always receive a full post-merge validation pass ______________________________________________________________________ ## 13. Scope Boundaries ### Fast tests should own - pure logic - deterministic unit behavior - parser/validator correctness - config integrity and plugin fingerprint policy, using temporary config/plugin fixtures - router/state/queue integration using real SQLite where helpful - low-friction day-to-day confidence ### Docker tests should own - runtime system behavior - service boot with real config - restart and recovery flows - network ingress behavior - realistic end-to-end operator-facing scenarios ### Config integrity / plugin fingerprint tests Plugin fingerprinting belongs in the fast suite by default. Use temporary config directories and temporary plugin directories in Go tests rather than committed fixture trees unless the scenario needs multi-process runtime realism. Fast tests should cover: - `ductile config lock` always writes fingerprints for configured plugins - lock records both configured absolute paths and symlink-resolved paths - manifest and entrypoint bytes are hashed - configured enabled plugin mismatches fail verification - configured disabled plugin mismatches warn only - configured but undiscovered enabled plugins fail verification - stale lock entries for plugins removed from config warn only - missing `plugin_fingerprints` fails when plugins are configured Move a scenario into Docker only when the behavior depends on real service startup/reload, container filesystem mounts, or operator-facing black-box assertions that cannot be represented by temporary local fixtures. ______________________________________________________________________ ## 14. Implementation Order Once the design is accepted, implementation should proceed in sensible phases: 1. document the testing policy 1. scaffold canonical repo scripts 1. build the Docker harness base 1. implement the first Docker fixtures 1. wire CI stages to the canonical scripts 1. polish artifact capture and `main` validation behavior ______________________________________________________________________ ## 15. Script and Target Design The canonical testing interface should live in `scripts/`, with optional convenience wrappers in a `Makefile`. ### Canonical scripts - `scripts/test-fast` - `scripts/test-docker` - `scripts/test-premerge` - `scripts/test-main` These scripts are the source of truth for local and CI usage. ### Intended behavior #### `scripts/test-fast` Purpose: - fast branch-development assurance - frequent local execution during implementation Recommended initial behavior: ```bash go test ./... ``` Lint is intentionally excluded from the default fast loop to preserve iteration speed. #### `scripts/test-docker` Purpose: - Docker-backed system/runtime validation - selective during development - required in pre-merge and main validation flows Expected behavior: - boot the Docker-backed harness - run the selected fixture scenarios - perform readiness checks - capture artifacts on failure - tear down by default #### `scripts/test-premerge` Purpose: - required merge-grade validation before landing on `main` Expected behavior: - run `scripts/test-fast` - run `golangci-lint run ./...` - run `scripts/test-docker` #### `scripts/test-main` Purpose: - full post-merge validation on `main` Expected initial behavior: - same coverage as `scripts/test-premerge` This may expand later to include broader smoke/regression suites. ### Composition rules - `scripts/test-premerge` should compose lower-level scripts instead of duplicating logic - `scripts/test-main` should compose lower-level scripts instead of duplicating logic - CI should invoke the same canonical scripts developers use locally ### Optional Make wrappers Optional Make targets may wrap the scripts for ergonomics: - `make test` - `make test-docker` - `make test-premerge` - `make test-main` If present, these should delegate to `scripts/` rather than reimplementing behavior. ## 16. Summary The target state is: - **velocity during branch development** - **strong gates before merge** - **explicit trunk protection after merge** - **clear separation between product commands and assurance tooling** In short: - fast tests are for iteration - Docker tests are for system confidence - pre-merge is a gate - `main` is protected by full validation # For Agents # For Agents Ductile is designed to be **operated**, not just used. If you are an AI agent — or a human telling one to operate this — this is the entry point. ## The loop 1. Load the operator skill (see [Skills](#skills), below) into your client. 1. State a goal in natural language: *"Every morning at 7am, fetch the headlines from these RSS feeds, summarize them with Fabric, and post the summary to Discord."* 1. The agent uses the `/skills` registry and the auto-generated OpenAPI surface to discover what's available; authors a pipeline; runs it; reads the logs; iterates until the goal is met. The human role is to state goals and audit results. The agent does the wiring. If you cannot load the skill manifests directly (different client, restricted environment, just exploring), the [**Operator Handbook**](https://ductile.run/for-agents/operator-handbook/index.md) on this site mirrors the substance of `skills/ductile/` as a single agent-fetchable page. ## Why this works The five lifecycle pillars — **Run, Debug, RCA, Test, Author** — are each backed by structured affordances that an LLM can drive without reading source code: | Pillar | Primary surfaces | | ------ | -------------------------------------------------------------------- | | Run | CLI verbs, `/skills` registry, OpenAPI | | Debug | `/system/doctor`, `/system/selfcheck`, structured logs | | RCA | execution ledger (SQLite), `/stopwatch/{plugin}` latency aggregation | | Test | manifest contract, fixture conventions | | Author | plugin protocol (stdin/stdout JSON), manifest schema | See the [Constitution](https://github.com/mattjoyce/ductile/blob/main/CONSTITUTION.md) for the alignment paragraph and how each pillar is expected to evolve. ## Skills Ductile ships skill manifests that give agents structured ways to operate it. Drop these into your agent's skills directory (`cp -r skills// ~/.claude/skills//`): ### Pillar skills - [`skills/ductile/`](https://github.com/mattjoyce/ductile/tree/main/skills/ductile) — **Pillar 1: Run.** Operate, configure, deploy. - [`skills/ductile-rca/`](https://github.com/mattjoyce/ductile/tree/main/skills/ductile-rca) — **Pillar 3: RCA.** Root cause analysis from the execution ledger. - [`skills/ductile-plugin-developer/`](https://github.com/mattjoyce/ductile/tree/main/skills/ductile-plugin-developer) — **Pillar 5: Author.** Build plugins to the manifest contract. Planned: `ductile-doctor` (Pillar 2: Debug), `ductile-plugin-tester` (Pillar 4: Test). ### Discipline skills Skills that don't run Ductile — they keep it honest. - [`skills/surface-contract/`](https://github.com/mattjoyce/ductile/tree/main/skills/surface-contract) — **Doc/code seam audit.** Ousterhout × Liskov surface/contract discipline applied to the boundary between what the docs claim and what the code does. Run this when docs drift from reality, after a refactor that changes a public-facing API, or before a release. **The code is the reality; the docs are the contract** — this skill is how you keep them aligned. ## Live agentic affordances - **[`/llms.txt`](https://ductile.run/llms.txt)** — curated agent-friendly index of this site, per the [llmstxt.org](https://llmstxt.org) convention. Fetch this first; it lists the pages worth reading and links to their raw markdown. - **[`/llms-full.txt`](https://ductile.run/llms-full.txt)** — one-shot concatenation of every page named in `/llms.txt`. ~300 KB, designed for agents that want the full corpus in a single fetch. ## Planned agentic affordances These don't exist yet but are on the roadmap: - **`/api.json`** — the OpenAPI surface published as a static endpoint. - **`/schema/`** — JSON schemas for config, plugin manifest, event shape. - **`/examples/`** — known-good pipelines, runnable and human-readable. - **MCP server endpoint** at `ductile.run/mcp` — add ductile's docs and schemas to Claude Desktop / Cursor as an MCP context source. - **Operator-eval scoreboard** at `/eval` — hermetic benchmark of how well Claude / Gemini / Codex / local models operate ductile against a fixed problem set. If you are reading this as an agent and any of these are now live, prefer them over the GitHub URLs above. # Operator Handbook This is the portable, agent-facing version of the `ductile` operator skill (`skills/ductile/`). If you are an AI agent that has been told to operate a Ductile deployment but cannot load the skill manifest directly, this page gives you the same substance. The agent reads it. The human points at it. ______________________________________________________________________ ## Operating frame: the gateway is the supervisor Ductile is built on Armstrong's supervisor model. The gateway: - **Isolates** plugins via spawn-per-command — one plugin cannot corrupt another. - **Detects** errors from the outside via exit code, stdout JSON, and stderr. - **Restarts** without intervention via the queue (at-least-once delivery). - **Hot-upgrades** config via `system reload` without dropping in-flight work. As operator, you do not fight the supervisor; you **use** it. ### Reload over debug-in-place When a runtime looks wedged, the default move is **reload**, not poke. The gateway is designed to be restartable; debugging a stuck process while it holds the PID lock is harder, less informative, and risks corrupting the SQLite WAL. ```bash ductile system reload # SIGHUP, in-process hot swap # if that does not resolve: ductile system status # confirm the new generation is alive # if still wedged, restart the service supervisor (launchd / systemd / # docker compose / whatever runs ductile on this host) ``` (For *why* this is the right discipline, see [`reload_rca.md`](https://ductile.run/reload_rca/index.md) — the reload deadlock RCA is the canonical example of why hot-swap must be deterministic.) ______________________________________________________________________ ## Runtime context — your deployment Ductile is not opinionated about how you deploy it. Wherever you run it, you'll have: | What | Default | Typically | | ------------------ | ------------------------- | ---------------------------------------------------------- | | Binary | (built by you) | on `$PATH` or at a project-local path | | Config dir | `~/.config/ductile/` | overridable with `--config ` or `$DUCTILE_CONFIG_DIR` | | State DB | `/ductile.db` | SQLite, WAL mode | | API port | `127.0.0.1:8081` | from `service.api_port` in `config.yaml` | | Service supervisor | none enforced | launchd, systemd, docker compose, supervisord | | Auth token | none by default | from `tokens.yaml`, surfaced via env var of your choice | **Action for the operator setting this up.** Build your own runtime-context table for the gateways you operate — instance name, binary path, config dir, DB path, port, service supervisor, auth token env var. Keep it next to your deployment docs, not in this handbook. ______________________________________________________________________ ## CLI command reference Pattern: `ductile [flags]`. ### System ```bash ductile system start # Start gateway (foreground) ductile system status [--json] # Health: PID, state DB, plugins ductile system reload # Hot-swap config in a running gateway (SIGHUP) ductile system watch # Real-time TUI monitor ductile system reset # Reset circuit breaker ductile system skills [--config ] # Export LLM skill manifest (Markdown) ductile system selfcheck [--json] # Read-only integrity invariants ductile system backup --to # Atomic snapshot (VACUUM INTO) ductile system doctor # Startup and runtime health checks ``` ### Config ```bash ductile config check [--json] [--strict] # Validate syntax, policy, integrity ductile config lock # Authorize state (update .checksums) ductile config show [entity] # Show resolved config or entity ductile config get # Dot-notation read ductile config set = # Modify (use --dry-run to preview) ductile config init # Initialize config directory ductile config backup / restore # Archive / restore configuration ductile config token / scope # Manage API tokens and scopes ductile config plugin / route / webhook # Manage routing artefacts ``` ### Job ```bash ductile job inspect [--json] # Lineage, baggage, artifacts ductile job logs [--json] # Query stored job logs # Filters: --plugin --command --status --submitted-by # --from --to (RFC3339) --query --limit --include-result ``` ### Plugin ```bash ductile plugin list [--api-url URL] [--json] # Discover loaded plugins ductile plugin run # Manual execution ``` ### API (direct gateway calls) ```bash ductile api /jobs ductile api /plugin/echo/poll -f message="hello" ductile api /pipeline/youtube-wisdom -f url="…" ductile api /system/reload -X POST ductile api /healthz # Flags: -X METHOD, -f key=value, -H Header:val, -b BODY, --api-url, --api-key ``` ### Top-level ```bash ductile skills # Export capability registry as LLM Markdown ductile version # Version + commit + build time ``` ______________________________________________________________________ ## Universal flags | Flag | Purpose | | ---------------- | ----------------------------------------------- | | `--json` | Machine-readable output (all read commands) | | `-v, --verbose` | Internal logic, path resolution, baggage merges | | `--dry-run` | Preview mutations without committing | | `--config ` | Override config directory | ______________________________________________________________________ ## The `config lock` ritual Every config or plugin manifest edit goes through: ```bash ductile config check # validate ductile config lock # authorize new state (updates .checksums) ductile system reload # apply without restart ``` **This is the cross-skill ritual.** Plugin authoring hands off here. Incident response often discovers a forgotten-to-lock root cause. Owning this ritual is owning the seam between authoring and operating. ### Config integrity (tiered) | Tier | Files | On mismatch | | ------------- | --------------------------------------------------- | ---------------------------- | | High Security | `tokens.yaml`, `webhooks.yaml`, `scopes/*.json` | Hard fail (refuses to start) | | Operational | `config.yaml`, `plugins/*.yaml`, `pipelines/*.yaml` | Warn & continue | ______________________________________________________________________ ## Entity addressing Use `:` syntax with `config show/get/set`: ```bash ductile config show plugin:withings ductile config show pipeline:video-wisdom ductile config set plugin:withings.enabled=false ductile config show plugin:* # list all plugins ``` ______________________________________________________________________ ## Selfcheck — six read-only invariants 1. `config_discovery` — config dir resolves 1. `config_load` — config parses 1. `pid_lock` — PID file matches a running process 1. `db_integrity` — `PRAGMA integrity_check` 1. `db_schema` — required tables/columns/indexes match embedded baseline 1. `queue_terminal_freshness` — no stale terminal-state `job_queue` rows past retention **WAL safety**: when the gateway holds the PID lock, checks 4-6 are *skipped* with `detail: "skipped: active gateway holds PID lock — quiesce before selfcheck"`. The skip is correct behaviour, not a bug. Real-green pattern: run selfcheck **offline** against the new binary BEFORE installing. Once installed and running, expect "skipped" on 4-6 — the proof of correctness is that the gateway started at all, because the schema validator runs at startup and refuses to open the DB on mismatch. ______________________________________________________________________ ## Backup — atomic point-in-time snapshot ```bash ductile system backup --to [--scope SCOPE] [--config PATH] ``` Scopes (nested ladder; each adds to the previous): - `db` — DB snapshot only (SQLite `VACUUM INTO`, safe under concurrent writers) - `config` (default) — `db` + ductile config dir - `plugins` — `config` + every dir under `plugin_roots` - `all` — `plugins` + every file under `environment_vars.include` Each archive embeds `BACKUP_MANIFEST.txt` with version, commit, hostname, source paths, SHA256 of source DB, included/excluded items + reasons. Refuses to overwrite an existing `--to` destination. Inspect a manifest without re-extracting: ```bash tar -xzOf .tar.gz BACKUP_MANIFEST.txt ``` ______________________________________________________________________ ## Migrations & schema `internal/storage/schema.sql` is embedded in the binary; the schema validator runs at startup and refuses to open a DB missing any required table, column, or index. Schema changes ship as Python scripts at `scripts/migrate-*.py`, idempotent by design, run with the service quiesced. Always backup before migration: ```bash sqlite3 "PRAGMA wal_checkpoint(TRUNCATE);" && cp ``` ______________________________________________________________________ ## LLM capability discovery (`system skills`) Ductile is designed for LLM operation. Get the current live manifest: ```bash ductile system skills --config # or set DUCTILE_CONFIG_DIR and run: ductile system skills ``` Outputs Markdown listing all plugin commands with endpoints, schemas, and semantic anchors (`mutates_state`, `idempotent`, `retry_safe`) plus all configured pipelines. See [`DUCTILE_SKILLS_SCHEMA_V1.md`](https://ductile.run/DUCTILE_SKILLS_SCHEMA_V1/index.md) for the contract that output obeys. ______________________________________________________________________ ## Common workflows ### Trigger a pipeline via API ```bash curl -X POST http://:/pipeline/ \ -H "Authorization: Bearer $DUCTILE_TOKEN" \ -H "Content-Type: application/json" \ -d '{"payload": {"key": "value"}}' ``` ### Trigger a plugin directly (bypasses routing) ```bash curl -X POST http://:/plugin//poll \ -H "Authorization: Bearer $DUCTILE_TOKEN" \ -d '{"payload": {}}' ``` ### Inspect a failed job (routine — no incident analysis needed) ```bash ductile job inspect -v --json ``` ### Check gateway health ```bash curl http://:/healthz ``` ______________________________________________________________________ ## Architecture summary (operator view) - **Governance hybrid**: control plane is SQLite `event_context` baggage; filesystem state is plugin-managed. The core does not provision per-job workspaces. - **Spawn-per-command**: each plugin invocation is a fresh process (polyglot: bash, python, go, any executable). - **At-least-once**: jobs survive crashes and are recovered on restart. - **Immutable audit**: `origin_*` baggage keys can never be overwritten by plugins. ______________________________________________________________________ ## Job statuses `queued` → `running` → `succeeded` / `failed` / `timed_out` / `dead` If you see `dead` or persistent `failed`, treat as an incident: hand off to root-cause analysis (the `ductile-rca` skill) rather than continuing routine operation. ______________________________________________________________________ ## When to load other skills | Companion skill | When to load it | | -------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------- | | `ductile-plugin-developer` | The work requires touching a plugin's code, manifest, or pipeline composition — not just its config. | | `ductile-rca` | Symptoms are not understood. *Stuck, hanging, tripped, missing, wrong.* Routine `job inspect` for a known-good system does **not** need RCA. | | `surface-contract` | Docs and code have drifted; you need to audit and re-align them. | Full ductile incident lifecycle (`ductile-rca` + this handbook + `ductile-plugin-developer`) is real and worth keeping in mind. ______________________________________________________________________ ## Reference docs In the same docs site: - [Architecture](https://ductile.run/ARCHITECTURE/index.md) — the technical deep dive - [Deployment](https://ductile.run/DEPLOYMENT/index.md) — host-local deployment + backup patterns - [Operator Guide](https://ductile.run/OPERATOR_GUIDE/index.md) — day-to-day commands with examples - [Health Check](https://ductile.run/HEALTH_CHECK/index.md) — invariants checked by `selfcheck` - [SQL Tightening Log](https://ductile.run/SQL_TIGHTENING_LOG/index.md) — schema-change audit trail - [Reload RCA](https://ductile.run/reload_rca/index.md) — canonical worked example of the reload deadlock RCA pattern In the repo: - [`AGENTS.md`](https://github.com/mattjoyce/ductile/blob/main/AGENTS.md) — the contributor contract; the design grounding behind these commands - [`CONSTITUTION.md`](https://github.com/mattjoyce/ductile/blob/main/CONSTITUTION.md) — the five pillars; this handbook is Pillar 1 (Run)