Testing Guide¶
This document defines the target-state testing strategy for Ductile. The goal is to preserve developer velocity during normal branch work while adding stronger runtime/system confidence gates before merge and after changes land on main.
1. Testing Model¶
Ductile uses a staged testing strategy:
- Fast tests for branch development
- Used constantly during normal implementation and remediation work.
- Optimized for feedback speed and iteration velocity.
- Docker-backed tests for complex/system-level assurance
- Used during complex development when realism matters.
- Required before merge to
main. - Full validation on
mainafter merge - Ensures trunk health on the actual merged state.
This guide is intentionally about assurance workflow, not product/runtime commands. Testing orchestration belongs to repository tooling rather than the ductile binary.
2. Branch Development Policy¶
For day-to-day branch development, the default loop is the fast existing test suite.
Default fast inner loop¶
This is the baseline command for: - feature branches - remediation branches - refactors - fast local iteration
Why the fast loop stays fast¶
The default branch-development loop should optimize for: - quick feedback - frequent execution - low friction during implementation
To preserve velocity, the fast inner loop should not require Docker and should not be overloaded with every gate.
3. Docker-Backed Testing Policy¶
Docker-backed tests are used for complex or environment-sensitive behavior where standard tests provide less confidence.
Docker tests are for¶
- webhook ingress
- scheduler persistence and restart recovery
- authenticated API end-to-end flows
- plugin runtime/process behavior
- append-only fact persistence and compatibility-state derivation
- realistic service startup/configuration behavior
Docker tests are not for¶
- duplicating every Go test in containers
- replacing fast local tests
- being mandatory on every small branch iteration
Development usage¶
Docker-backed tests are selective during development: - use them for system-level work - use them when reproducing runtime-sensitive issues - use them before merge as part of the required gate
4. Repository Test Commands¶
Testing orchestration should live in repository tooling under scripts/, not in the ductile CLI.
Canonical script surface¶
scripts/test-fastscripts/test-dockerscripts/test-premergescripts/test-main
Optional convenience wrappers¶
A Makefile may provide wrappers such as:
- make test
- make test-docker
- make test-premerge
- make test-main
If Make targets exist, they should wrap the canonical scripts rather than duplicating logic.
5. Intended Meaning of Each Script¶
scripts/test-fast¶
Purpose: - normal branch-development assurance - fastest frequent local validation
Expected scope: - standard fast repo tests
Recommended initial behavior:
scripts/test-docker¶
Purpose: - Docker-backed runtime/system validation - selective during development - required in pre-merge and main validation flows
Expected scope: - fixture-driven Docker scenarios - black-box/high-value runtime validation - teardown and artifact capture on failure
scripts/test-premerge¶
Purpose:
- merge-grade assurance before landing to main
Expected scope: - fast tests - lint/static checks - Docker-backed validation
scripts/test-main¶
Purpose: - full post-merge validation of trunk health
Expected scope initially: - same coverage as pre-merge
This may later expand to include broader smoke/regression suites.
6. Lint Placement Policy¶
To preserve branch-development velocity: - lint is not required in the default fast inner loop - lint is required in pre-merge and main validation
This gives the desired balance: - velocity during development - stronger gates before merge and on trunk
7. Pre-Merge Policy¶
Before merging to main, the branch must pass:
- the fast standard suite
- lint/static checks
- the Docker-backed validation suite
Conceptually, pre-merge validation is:
This is the required assurance level for merge readiness.
8. Main Branch Policy¶
main must receive a full validation pass after merge.
Initial definition of full validation on main¶
scripts/test-fastgolangci-lint run ./...scripts/test-docker
This means main validation is initially at least as strong as pre-merge validation.
Why main validation is distinct¶
Pre-merge validation protects the merge boundary. Post-merge validation protects the actual merged state of trunk.
This matters because: - nearby merges may interact unexpectedly - rebases and branch timing can hide integration issues - trunk health affects all contributors
Main failures¶
Failures on main should be treated as trunk-health issues and triaged promptly.
Docker-backed failures on main should retain and surface their artifacts.
Initial definition of scripts/test-main¶
Initially, scripts/test-main should mean:
- scripts/test-fast
- golangci-lint run ./...
- scripts/test-docker
This keeps main validation at least as strong as pre-merge validation while leaving room to grow later.
Future expansion¶
Heavier smoke/regression or stress-oriented suites may be added later as main-only or scheduled validation, but they are not required for the initial phase of this testing strategy.
9. Docker Harness Design Direction¶
The Docker-backed harness should follow these principles: - fixture-driven - Docker Compose based - explicit readiness checks (not just sleeps) - black-box/high-value scenarios - automatic artifact capture on failure
First-wave Docker scenarios¶
The first high-value Docker scenarios are:
- webhook-ingress
- scheduler-recovery
- api-e2e
These are intentionally narrow and complement the fast test suite rather than replacing it.
webhook-ingress¶
Goal: - validate end-to-end inbound webhook handling in a real service/container environment
Should verify: - service boots with webhook configuration - valid signed request is accepted - invalid signature is rejected - oversized request is rejected when configured - queued work exists after successful ingress
scheduler-recovery¶
Goal: - validate restart/crash recovery behavior using persisted runtime state
Should verify: - service starts with scheduler enabled - a running/orphaned job exists before restart - service restart occurs - orphan recovery transitions work correctly after restart
api-e2e¶
Goal: - validate authenticated API and pipeline behavior with real config and runtime setup
Should verify: - service boots with real config - authenticated API requests succeed - unauthorized requests fail appropriately - pipeline/API triggers create expected queued work - status/readback behavior works from the running service
hook-route-compilation¶
Goal: - validate hook runtime behavior against a real running service
Should verify:
- a root job with notify_on_complete: true fires a hook pipeline
- hook-entry call: expands into the called pipeline entry
- exactly one hook job is enqueued for the scenario
- hook dispatch remains root-level rather than creating pipeline step context
- the hook job payload is the expected lifecycle event envelope
sync-terminal-route¶
Goal: - validate synchronous API result selection against compiled terminal routes
Should verify:
- a synchronous pipeline returns 200 OK
- the compiled if: root appears as a core.switch job rather than a skipped user step
- a skipped earlier step does not become the returned result
- the returned result comes from the actual terminal routed step
- the runtime still records the expected job completion story in the database
conditional-with-route¶
Goal:
- validate compiled if: routing and with: remapping against a real running service
Should verify:
- a compiled if: step becomes a real core.switch hop at runtime
- the false branch bypasses the gated plugin and still reaches the downstream step
- the true branch runs the gated plugin and preserves the expected parent/child route shape
- the gated plugin receives the with:-remapped payload values on the true branch
- route depth and max-depth control-plane state persist in event_context
from-plugin-scoping¶
Goal:
- validate the from_plugin: selector against a real running service
Should verify:
- a hook pipeline with from_plugin: matches only when the upstream
plugin equals the selector
- the same hook signal from a different upstream plugin does not fire
the scoped pipeline
- a co-resident hook pipeline without from_plugin: continues to fire
for every matching lifecycle signal (regression for today's behaviour)
- the compiled-route inspection (GET /config/view) surfaces
source_plugin on the scoped route
context-aware-trigger-if¶
Goal:
- validate pipeline-level if: evaluating against the upstream
job's accumulated durable context
Should verify:
- a baggage value claimed by an upstream pipeline step is visible to a
downstream pipeline's if: predicate as context.*
- predicate true → dispatch fires
- predicate false → dispatch is suppressed (no core.switch)
- a route fired without upstream context (e.g. from webhook ingress)
with a context.* predicate is suppressed (absent context evaluates
to false, no error)
Deferred wave-2 concerns¶
These are valuable, but should not be in the first Docker wave: - reload/restart nuance beyond initial recovery scenarios - plugin runtime matrix testing - multi-hop expansion suites - load/stress validation - broad config matrix coverage
Wave 1 should stay small, high-value, and stable.
10. Docker Harness Architecture¶
The Docker-backed test harness should be fixture-driven and orchestrated through repository tooling rather than product commands.
Design principles¶
- use Docker only where runtime/system realism adds meaningful confidence
- keep scenarios black-box and high-value
- avoid duplicating the entire Go test suite in containers
- make local and CI execution use the same harness entry points
Recommended structure¶
Use named fixtures representing focused system scenarios. For example:
test/fixtures/docker/webhook-ingress/
test/fixtures/docker/scheduler-recovery/
test/fixtures/docker/api-e2e/
Each fixture should define the inputs needed for that scenario, such as: - config files - fixture data - environment variables - compose overrides if required - scenario-specific assertion inputs
Orchestration mechanism¶
Use Docker Compose for harness orchestration.
Recommended behavior: - shared Compose base where possible - fixture-specific overrides where needed - explicit startup and teardown lifecycle - one harness runner orchestrating one or more fixtures
Readiness policy¶
Readiness must be explicit.
The harness should: - wait for service health or known-ready endpoints - use retries/timeouts - fail clearly if readiness is not achieved
Avoid relying on arbitrary sleeps as the primary readiness mechanism.
Fixture execution model¶
The harness should follow a consistent lifecycle: 1. select fixture(s) 2. start services 3. wait for readiness 4. run black-box assertions 5. collect artifacts on failure 6. tear down by default
Local targeting¶
The design should support: - running all Docker fixtures - running a single named fixture for focused development work
This keeps Docker usable during complex development without forcing full-suite runtime on every change.
11. Failure Artifact Policy for Docker Tests¶
On Docker-backed test failure, the harness should automatically capture artifacts under a predictable path such as:
Recommended minimum artifact set: - container/service logs - fixture/config inputs - scenario/assertion log - failed HTTP responses where applicable - DB snapshot where applicable
Recommended artifact structure¶
test-artifacts/docker/<timestamp>/<fixture>/
run.log
scenario.log
compose.log
service-*.log
responses/
config/
db/
Behavior¶
- keep artifacts on local failure
- upload artifacts on CI failure
- tear down containers after artifact capture by default
- optional debug/preserve mode may be added later
Design intent¶
Docker failures should be diagnosable without immediately rerunning in manual debug mode. Artifact capture should therefore be automatic rather than opt-in.
12. CI Policy¶
CI should mirror the local staged testing model rather than inventing separate workflows.
Branch / PR CI¶
Always run fast validation:
- go test ./...
- golangci-lint run ./...
This keeps standard branch feedback fast and useful.
Pre-merge gate¶
Before merge, require merge-grade validation:
- scripts/test-premerge
Conceptually this means: - fast tests - lint/static checks - Docker-backed validation
main CI¶
Run full validation on merged trunk:
- scripts/test-main
Initially, scripts/test-main should provide at least the same coverage as pre-merge validation.
CI job visibility¶
CI should expose at least separate visible checks for: - fast validation - Docker validation
This makes failures easier to diagnose and rerun.
CI design principles¶
- CI should mirror the local staged-testing model
- CI should invoke the same canonical scripts developers use locally
- Docker validation should be a required pre-merge gate
mainshould always receive a full post-merge validation pass
13. Scope Boundaries¶
Fast tests should own¶
- pure logic
- deterministic unit behavior
- parser/validator correctness
- config integrity and plugin fingerprint policy, using temporary config/plugin fixtures
- router/state/queue integration using real SQLite where helpful
- low-friction day-to-day confidence
Docker tests should own¶
- runtime system behavior
- service boot with real config
- restart and recovery flows
- network ingress behavior
- realistic end-to-end operator-facing scenarios
Config integrity / plugin fingerprint tests¶
Plugin fingerprinting belongs in the fast suite by default. Use temporary config directories and temporary plugin directories in Go tests rather than committed fixture trees unless the scenario needs multi-process runtime realism.
Fast tests should cover:
- ductile config lock always writes fingerprints for configured plugins
- lock records both configured absolute paths and symlink-resolved paths
- manifest and entrypoint bytes are hashed
- configured enabled plugin mismatches fail verification
- configured disabled plugin mismatches warn only
- configured but undiscovered enabled plugins fail verification
- stale lock entries for plugins removed from config warn only
- missing plugin_fingerprints fails when plugins are configured
Move a scenario into Docker only when the behavior depends on real service startup/reload, container filesystem mounts, or operator-facing black-box assertions that cannot be represented by temporary local fixtures.
14. Implementation Order¶
Once the design is accepted, implementation should proceed in sensible phases:
- document the testing policy
- scaffold canonical repo scripts
- build the Docker harness base
- implement the first Docker fixtures
- wire CI stages to the canonical scripts
- polish artifact capture and
mainvalidation behavior
15. Script and Target Design¶
The canonical testing interface should live in scripts/, with optional convenience wrappers in a Makefile.
Canonical scripts¶
scripts/test-fastscripts/test-dockerscripts/test-premergescripts/test-main
These scripts are the source of truth for local and CI usage.
Intended behavior¶
scripts/test-fast¶
Purpose: - fast branch-development assurance - frequent local execution during implementation
Recommended initial behavior:
Lint is intentionally excluded from the default fast loop to preserve iteration speed.
scripts/test-docker¶
Purpose: - Docker-backed system/runtime validation - selective during development - required in pre-merge and main validation flows
Expected behavior: - boot the Docker-backed harness - run the selected fixture scenarios - perform readiness checks - capture artifacts on failure - tear down by default
scripts/test-premerge¶
Purpose:
- required merge-grade validation before landing on main
Expected behavior:
- run scripts/test-fast
- run golangci-lint run ./...
- run scripts/test-docker
scripts/test-main¶
Purpose:
- full post-merge validation on main
Expected initial behavior:
- same coverage as scripts/test-premerge
This may expand later to include broader smoke/regression suites.
Composition rules¶
scripts/test-premergeshould compose lower-level scripts instead of duplicating logicscripts/test-mainshould compose lower-level scripts instead of duplicating logic- CI should invoke the same canonical scripts developers use locally
Optional Make wrappers¶
Optional Make targets may wrap the scripts for ergonomics:
- make test
- make test-docker
- make test-premerge
- make test-main
If present, these should delegate to scripts/ rather than reimplementing behavior.
16. Summary¶
The target state is: - velocity during branch development - strong gates before merge - explicit trunk protection after merge - clear separation between product commands and assurance tooling
In short:
- fast tests are for iteration
- Docker tests are for system confidence
- pre-merge is a gate
- main is protected by full validation