Testing and development¶

GenSwarms ships two layers of tests: fast ExUnit unit tests for individual modules and an end-to-end harness (mix genswarms.test) that validates and runs the example swarm configurations. A mock backend lets you exercise topologies and routing without any LLM API calls. This page covers both, plus formatting and the development server.

Unit tests¶

Run the ExUnit suite with mix test:

mix test                                            # run all tests
mix test --cover                                    # run with coverage report
mix test test/genswarms/routing/router_test.exs     # a single file
mix test test/genswarms/routing/router_test.exs:42  # a single test by line
mix test --only tag_name                            # only tests with a given tag

Tests run with the synchronous EventStore.Sqlite backend (set in config/test.exs via config :genswarms, :event_store, Genswarms.Observability.EventStore.Sqlite) so that persist→query is deterministic, rather than the buffered default. The same config disables the Phoenix endpoint (server: false) and lowers the log level to :warning. Key test files include:

File	Covers
`test/genswarms/agents/agent_protocol_test.exs`	`@agent:` message parsing
`test/genswarms/routing/router_test.exs`	message routing (incl. system objects)
`test/genswarms/config/loader_test.exs`	config loading (`.exs`/`.json`/`.yaml`/`.yml`)
`test/genswarms/config/swarm_config_test.exs`	config validation
`test/genswarms/agents/inbox_test.exs`	the message queue

Formatting¶

mix format    # format all source per .formatter.exs

Run mix format before committing; CI and reviewers expect formatted code.

End-to-end harness¶

mix genswarms.test discovers, validates, and runs every example. It:

Discovers all swarm configs (.exs) and sim files (.sim) recursively under examples/.
Validates each one.
Runs each (starts the swarm, waits the full timeout, then stops it).
Captures a per-example log.
Reports pass/fail/skip for each, with a combined summary.

mix genswarms.test                           # validate + run all examples
mix genswarms.test --validate-only           # only validate configs, don't run
mix genswarms.test --example tic-tac-toe     # test a specific example
mix genswarms.test --mock script.json        # run real agents with the mock script (no LLM)
mix genswarms.test --timeout 60000           # custom timeout per swarm (ms)
mix genswarms.test --steps 3                 # steps for .sim examples
mix genswarms.test --logs-dir /tmp/logs      # custom logs directory
mix genswarms.test --quiet                   # suppress per-example info lines

Flags¶

All flags use --flag value (single dash, hyphenated) form and are parsed in strict mode — unknown flags are dropped silently.

Flag	Type	Default	Purpose
`--validate-only`	boolean	off	Validate configs only; skip running. No logs directory or `summary.log` is written.
`--example <name>`	string	all	Keep only files whose path contains the literal substring `/<name>/`
`--timeout <ms>`	integer	`60000`	Per-swarm/per-sim run timeout in milliseconds
`--steps <n>`	integer	`3`	Number of steps for `.sim` examples (ignored by `.exs` configs)
`--mock <path>`	string	none	Expand `<path>` and export it as `SUBZEROCLAW_MOCK_SCRIPT` for LLM-free runs
`--logs-dir <path>`	string	`.test-logs`	Directory for captured run logs (expanded with `Path.expand/1`)
`--quiet`	boolean	off	Suppress per-example info output (failures are still printed)

The --example filter matches a path segment, so it works against the directory name. For instance, --example tic-tac-toe selects examples/tic-tac-toe/tic_tac_toe_swarm.exs because the path contains /tic-tac-toe/. The bundled example directories are: bridge, bwrap-skills, dynamic-swarm, massive-swarm, party, and tic-tac-toe.

Output¶

Unless --validate-only is set, each example writes a <name>.log to the logs directory (default .test-logs/), plus a combined summary.log. The log filename is derived from the swarm name (the config :name, not the file path), sanitized by replacing every character outside [a-zA-Z0-9_-] with _ (so hyphens and underscores are preserved). For the bundled examples, whose config names are tic-tac-toe and party-test, the resulting files are:

.test-logs/
├── tic-tac-toe.log
├── party-test.log
└── summary.log

A per-example .log records the relative path, swarm name, agent/object counts, topology, timeout, and the final status (or TIMEOUT / ERROR). The summary.log contains the N passed, N failed, N skipped header followed by one ✓/✗/⊘ line per example.

Exit codes:

0 — all examples passed (or were skipped).
1 — at least one example failed, or no .exs/.sim files were found under examples/.

With --validate-only, no logs directory is created and no summary.log is written — results are printed to the console only.

A run is a skip (⊘) when an .exs file evaluates to something that is not a map (the (not a swarm config) case), or when a .sim file is found but SubzeroSim is not available in the project. Note that an .exs file evaluating to a map without a :name key is not a skip — it counts as a pass (reported as (valid map config)); only a swarm config (a map with a :name key) is actually started and run.

Testing without an LLM¶

There are two distinct ways to avoid real LLM calls; they serve different goals.

The `:mock` backend — test orchestration¶

backend: :mock is a stub that spawns no external process and produces no agent output. It is for testing swarm orchestration — topology, routing, and dynamic add/remove/scale — deterministically and instantly:

%{
  name: "test-swarm",
  agents: [
    %{name: :researcher, backend: :mock},
    %{name: :coder, backend: :mock}
  ],
  topology: [{:researcher, :coder}]
}

MockBackend.send_input/2 and deploy_skills/2 are no-ops, and handle_output/2 always returns empty output, so the backend does not exercise agent reasoning — only the machinery around agents. An optional %{script: [...]} (via {:mock, %{script: [...]}}) is stored on the backend struct for introspection but is never used to generate responses. See backends.md.

`--mock` / `SUBZEROCLAW_MOCK_SCRIPT` — run real agents with canned responses¶

To run real agents (local/docker/apple_container/bwrap) end to end without calling an LLM, give subzeroclaw a mock script:

mix genswarms.test --mock path/to/script.json

The task expands the path with Path.expand/1 and exports it as SUBZEROCLAW_MOCK_SCRIPT, which is passed through to the agents (including bwrap sandboxes). The subzeroclaw runtime — not GenSwarms — reads the script and returns canned responses instead of calling the API. The script format is defined by subzeroclaw.

You can also set SUBZEROCLAW_MOCK_SCRIPT directly in the environment to get the same behavior outside the test harness.

Development server¶

Start the Phoenix API server (REST + WebSocket, no HTML) for local development:

mix phx.server

The server defaults to port 4000 (override with the PORT environment variable). A typical loop is to start the server, then drive it from the CLI or HTTP client:

genswarms start examples/tic-tac-toe/tic_tac_toe_swarm.exs   # start a swarm as a daemon
genswarms events --follow                                     # watch the event stream live