Testing and development¶
Genswarms ships two layers of tests: fast ExUnit unit tests for individual modules
and an end-to-end harness (mix genswarms.test) that validates and runs the example
swarm configurations. A mock backend lets you exercise topologies and routing
without any LLM API calls. This page covers both, plus formatting and the
development server.
Unit tests¶
Run the ExUnit suite with mix test:
mix test # run all tests
mix test --cover # run with coverage report
mix test test/genswarms/routing/router_test.exs # a single file
mix test test/genswarms/routing/router_test.exs:42 # a single test by line
mix test --only tag_name # only tests with a given tag
Tests run with the synchronous EventStore.Sqlite backend (set in config/test.exs
via config :genswarms, :event_store, Genswarms.Observability.EventStore.Sqlite) so
that persist→query is deterministic, rather than the buffered default. The same
config disables the Phoenix endpoint (server: false) and lowers the log level to
:warning. Key test files include:
| File | Covers |
|---|---|
test/genswarms/agents/agent_protocol_test.exs |
@agent: message parsing |
test/genswarms/routing/router_test.exs |
message routing (incl. system objects) |
test/genswarms/config/loader_test.exs |
config loading (.exs/.json/.yaml/.yml) |
test/genswarms/config/swarm_config_test.exs |
config validation |
test/genswarms/agents/inbox_test.exs |
the message queue |
Formatting¶
Run mix format before committing; CI and reviewers expect formatted code.
End-to-end harness¶
mix genswarms.test discovers, validates, and runs every example. It:
- Discovers all swarm configs (
.exs) and sim files (.sim) recursively underexamples/. - Validates each one.
- Runs each (starts the swarm, waits the full timeout, then stops it).
- Captures a per-example log.
- Reports pass/fail/skip for each, with a combined summary.
mix genswarms.test # validate + run all examples
mix genswarms.test --validate-only # only validate configs, don't run
mix genswarms.test --example tic-tac-toe # test a specific example
mix genswarms.test --mock script.json # run real agents with the mock script (no LLM)
mix genswarms.test --timeout 60000 # custom timeout per swarm (ms)
mix genswarms.test --steps 3 # steps for .sim examples
mix genswarms.test --logs-dir /tmp/logs # custom logs directory
mix genswarms.test --quiet # suppress per-example info lines
Flags¶
All flags use --flag value (single dash, hyphenated) form and are parsed in
strict mode — unknown flags are dropped silently.
| Flag | Type | Default | Purpose |
|---|---|---|---|
--validate-only |
boolean | off | Validate configs only; skip running. No logs directory or summary.log is written. |
--example <name> |
string | all | Keep only files whose path contains the literal substring /<name>/ |
--timeout <ms> |
integer | 60000 |
Per-swarm/per-sim run timeout in milliseconds |
--steps <n> |
integer | 3 |
Number of steps for .sim examples (ignored by .exs configs) |
--mock <path> |
string | none | Expand <path> and export it as SUBZEROCLAW_MOCK_SCRIPT for LLM-free runs |
--logs-dir <path> |
string | .test-logs |
Directory for captured run logs (expanded with Path.expand/1) |
--quiet |
boolean | off | Suppress per-example info output (failures are still printed) |
The --example filter matches a path segment, so it works against the
directory name. For instance, --example tic-tac-toe selects
examples/tic-tac-toe/tic_tac_toe_swarm.exs because the path contains
/tic-tac-toe/. The bundled example directories are: bridge, bwrap-skills,
dynamic-swarm, massive-swarm, party, and tic-tac-toe.
Output¶
Unless --validate-only is set, each example writes a <name>.log to the logs
directory (default .test-logs/), plus a combined summary.log. The log
filename is derived from the swarm name (the config :name, not the file
path), sanitized by replacing every character outside [a-zA-Z0-9_-] with _
(so hyphens and underscores are preserved). For the bundled examples, whose
config names are tic-tac-toe and party-test, the resulting files are:
A per-example .log records the relative path, swarm name, agent/object counts,
topology, timeout, and the final status (or TIMEOUT / ERROR). The
summary.log contains the N passed, N failed, N skipped header followed by one
✓/✗/⊘ line per example.
Exit codes:
0— all examples passed (or were skipped).1— at least one example failed, or no.exs/.simfiles were found underexamples/.
With --validate-only, no logs directory is created and no summary.log is
written — results are printed to the console only.
A run is a skip (⊘) when an .exs file evaluates to something that is not
a map (the (not a swarm config) case), or when a .sim file is found but
SubzeroSim is not available in the project. Note that an .exs file evaluating
to a map without a :name key is not a skip — it counts as a pass
(reported as (valid map config)); only a swarm config (a map with a :name
key) is actually started and run.
Testing without an LLM¶
There are two distinct ways to avoid real LLM calls; they serve different goals.
The :mock backend — test orchestration¶
backend: :mock is a stub that spawns no external process and produces no agent
output. It is for testing swarm orchestration — topology, routing, and
dynamic add/remove/scale — deterministically and instantly:
%{
name: "test-swarm",
agents: [
%{name: :researcher, backend: :mock},
%{name: :coder, backend: :mock}
],
topology: [{:researcher, :coder}]
}
MockBackend.send_input/2 and deploy_skills/2 are no-ops, and handle_output/2
always returns empty output, so the backend does not exercise agent reasoning —
only the machinery around agents. An optional %{script: [...]} (via
{:mock, %{script: [...]}}) is stored on the backend struct for introspection
but is never used to generate responses. See backends.md.
--mock / SUBZEROCLAW_MOCK_SCRIPT — run real agents with canned responses¶
To run real agents (local/docker/bwrap) end to end without calling an LLM, give subzeroclaw a mock script:
The task expands the path with Path.expand/1 and exports it as
SUBZEROCLAW_MOCK_SCRIPT, which is passed through to the agents (including bwrap
sandboxes). The subzeroclaw runtime — not Genswarms — reads the script and
returns canned responses instead of calling the API. The script format is
defined by subzeroclaw.
You can also set SUBZEROCLAW_MOCK_SCRIPT directly in the environment to get the
same behavior outside the test harness.
Development server¶
Start the Phoenix API server (REST + WebSocket, no HTML) for local development:
The server defaults to port 4000 (override with the PORT environment variable).
A typical loop is to start the server, then drive it from the CLI or HTTP client:
genswarms start examples/tic-tac-toe/tic_tac_toe_swarm.exs # start a swarm as a daemon
genswarms events --follow # watch the event stream live
See also¶
- backends.md — backend types including the mock backend
- cli.md —
genswarmscommand reference - configuration.md — swarm config DSL and validation