Dynobox Docs
Dynobox is a local test runner for agent and skill workflows. You describe a task, choose one or more local agent harnesses, and assert on observable behavior such as tool calls, shell commands, files in the sandbox, transcripts, HTTP requests, and final messages.
Dynobox is useful when you want repeatable checks for agent behavior before shipping a prompt, skill, or workflow change.
Start Here
- Getting Started: install the CLI, scaffold a dyno, and run your first scenario.
- Config Authoring: write JavaScript, TypeScript, or YAML dynos with the
@dynobox/sdkhelpers. - CLI Reference: commands, flags, output modes, JSON reports, and exit behavior.
- CI Integration: run Dynobox in GitHub Actions and publish JSON reports as build artifacts.
What Dynobox Tests
Dynobox runs each scenario in an isolated temporary work directory. Setup commands create the fixture, the selected harness performs the task, and assertions evaluate what happened.
You can assert:
- Tool calls, including expected and prohibited shell commands.
- Skill instruction loading with
skill.invoked(...). - Ordered tool-call sequences.
- Files present inside the scenario work directory.
- Harness transcript and final-message text.
- HTTP requests made by local child-process tools that honor proxy environment variables.
Supported Harnesses
Dynobox currently runs local scenarios through:
- Claude Code via the
claudeexecutable. - OpenAI Codex via the
codexexecutable.
Each harness must already be installed, authenticated, and available on PATH.
Supported Config Formats
Dynobox discovers *.dyno.{mjs,js,ts,mts,yaml,yml} files recursively when you run a directory. Explicit file paths can use non-*.dyno.* names, such as dynobox.config.ts, as long as they are loadable Dynobox configs.
Supported authoring formats:
- TypeScript or JavaScript with
defineDyno(...)from@dynobox/sdk. - YAML with
kind-discriminated assertion objects.
CommonJS config files (.cjs and .cts) are not supported because the SDK is ESM-only.
Current Limits
Dynobox is under active development and is currently focused on local execution. These areas are not complete yet:
- HTTP capture for harness-native web tools and binaries that ignore proxy/CA environment variables.
- Hosted or remote runner execution.
- Rich multi-iteration controls from authored configs.