Extending roksbnkctl
This chapter is the hacking guide for contributors. It covers the four most common extension shapes — adding a new execution backend, a new test suite, a new tool to an existing backend, a new chapter to the book — plus the PRD process the project uses to coordinate larger changes and the four-agent sprint-dispatch pattern Sprints 0-6 ran on.
For building the binary, see Chapter 31 — Building from source. For using the binary, see the rest of the book.
Adding a new execution backend
A backend is anything implementing the Backend interface in internal/exec/backend.go. The four backends shipped at v1.0 — local, docker, k8s, ssh:<target> — are each a single file under that package.
The end-to-end shape:
-
Implement the interface. Create
internal/exec/<your-backend>.go. The contract isRun(ctx context.Context, argv []string, opts RunOpts) (int, error). Honouropts.Stdin/Stdout/Stderr,opts.WorkDir,opts.Env,opts.Credentials,opts.HostMounts, andopts.RunAsUser. Return the subprocess exit code as the first return; second is for backend-side errors (couldn’t start, ctx cancelled, etc.). -
Register it. Call
exec.Register(name string, b Backend)from the package’sinit()block. TheResolveBackend(spec string)function ininternal/exec/backend.godispatches--backend <name>to the registered backend. -
Handle credentials safely. Read PRD 04 before touching
opts.Credentials. The cardinal rule: never pass credential values via argv — they end up inpsoutput, container metadata, and process accounting. Pass by reference (env var by name, projected Secret, SSHSetEnv) and let the runtime do the value plumbing. -
Wire the redactor. Wrap
opts.Stdoutandopts.Stderrwithinternal/exec.NewRedactorbefore handing them to the subprocess. The redactor masks any credential value that leaks into the tool’s stdout/stderr. Thelocalanddockerbackends do this in their wrappers; copy the pattern. -
Add a doctor check. Doctor’s per-backend availability check needs to recognise your backend. Add an entry under
internal/cli/doctor_backend.goreporting whether your backend’s prerequisites are satisfied (e.g., “is the daemon running”, “is the SDK reachable”). Green-by-default on a stock dev box is the goal — yellow-skip rather than red-fail when prerequisites are missing. -
Add per-backend cred-audit assertions. The cred-leak audit at
internal/exec/audit_test.go(and Phase M of the e2e plan) needs to know what surfaces your backend produces — container inspection, process listing, log files. Add aTestCredAudit_<YourBackend>subtest asserting the API key value never appears in any of them. -
E2E phase. Add a new phase to
scripts/e2e-test-backends.shwith concrete pass/fail criteria. Cross-link from PRD 05 so the test plan stays the source of truth. -
Documentation. Add a deep-dive subsection to Chapter 17 — Execution backends and a decision-tree entry to Chapter 18 — Choosing a backend per tool. Without docs the backend doesn’t exist for users.
A backend PR that lands all eight steps is a complete contribution; one that lands the code but skips the audit and docs will get a “please come back with…” review comment.
Adding a new test suite
The test subtree (roksbnkctl test <suite>) holds three suites at v1.0: connectivity, dns, throughput. Adding a fourth (e.g., tls-handshake, latency, tcp-flowstate) follows a five-step recipe:
-
Implement the runner. Create
internal/test/<suite>.go. The suite produces results in theroksbnkctl.<suite>.v1JSON schema — pick a top-level shape consistent with the existing suites (ProbeResultfor single-probe,ProbeSuiteResultfor an aggregate withresults[]). -
Wire a subcommand. Add
internal/cli/test_<suite>.gowith a cobra command undertest. The flag surface should mirror the existing suites’ patterns —--target,--iterations,-o json,--backend(when the suite is backend-aware). -
Pick the backends. Most test suites are backend-aware (the suite runs from a network vantage that the backend selects). DNS and throughput accept
local/k8s/ssh:<target>and rejectdocker; connectivity is currentlylocal-only. Decide which backends make sense for your suite — the deciding question is “does the vantage change the answer?”. -
Wire the JSON schema constant. Add
roksbnkctl.<suite>.v1to your suite’s output. CI assertions diff against this — bumping the version is a breaking change, document it in CHANGELOG. -
Add an E2E phase. New phase under PRD 05 and corresponding script section in
scripts/e2e-test-backends.sh. -
Documentation. New chapter or major section in Part VI of the book (currently chapters 20-23). Cross-link from Chapter 23 — The E2E test plan and Chapter 18 — Choosing a backend per tool.
The DNS probe is the canonical worked example — read internal/test/dns.go + internal/cli/test.go to see all six steps in their landed form, plus the Sprint 5 architect prompt for the design framing.
Adding a new tool to an existing backend
The docker, k8s, and ssh backends each maintain a map of tool-name → image / package. Adding a new tool (e.g., mtr, tcpdump, helm) means an entry in each backend’s map.
Docker backend
internal/exec/docker.go::toolImages maps tool names to image specs:
var toolImages = map[string]string{
"ibmcloud": "ghcr.io/jgruberf5/roksbnkctl-tools-ibmcloud",
"iperf3": "ghcr.io/jgruberf5/roksbnkctl-tools-iperf3",
"terraform": "hashicorp/terraform:1.5.7",
"<your>": "<your-image-ref>",
}
Tag resolution is handled by SetToolImageTag (set in internal/cli/root.go::init) — a :dev tag for a from-source binary, :<release-tag> for a tagged release. If your image needs its ENTRYPOINT bypassed (e.g., for image-specific argv mangling), add a jobToolCmdOverride entry.
K8s backend
internal/exec/k8s.go holds two patterns — long-lived ops pod (for tools that share state, like ibmcloud) and one-shot Job (for tools that produce a single output, like iperf3 or DNS probes). New tools pick one pattern:
- Ops pod: add the tool’s image to the ops pod’s container spec at install time, or
kubectl execinto the existing ops pod and run the host-installed binary. - One-shot Job: build a Pod template using the same image conventions as iperf3, run, stream logs, capture exit code, delete. The Job pattern is the right call for tools where the result is the only thing that matters.
SSH backend
internal/exec/ssh.go maintains a map of tool names to apt-package names for the --bootstrap auto-install:
// toolPackage carries apt-repo metadata + package name; see the
// production form in internal/exec/ssh.go for the full struct shape
// (IBM repo URL + GPG key + apt-source line for ibmcloud-cli, etc.).
var toolPackages = map[string]toolPackage{
"ibmcloud": { /* IBM apt repo + key + "ibmcloud-cli" */ },
"iperf3": { /* plain ubuntu-main "iperf3" */ },
"<your>": { /* repo + key + "<deb-package>" */ },
}
The bootstrap step runs apt-get install -y <packages> on the SSH target when the tool isn’t already on PATH. Non-Debian targets are out of scope for v1.0; the bootstrap fails clearly with a message pointing at the manual-install path.
For each backend, the implementation work is small (one map entry). The doctor checks, e2e coverage, and docs are the bulk — same shape as adding a new backend, scaled to the smaller surface.
Adding a new chapter to the book
The book is mdBook with markdown source under book/src/. Adding a chapter:
- Create
book/src/<NN>-<slug>.md— the file. Numbered prefix for sort order. - Add the chapter to
book/src/SUMMARY.md— the TOC. Use the existing parts (Concepts, Getting Started, Cluster Lifecycle, …) or add a new part if it doesn’t fit. - Run
make book-serveto live-preview athttp://localhost:3000with auto-reload. - Cross-link from related chapters at the bottom (the “Cross-references” section every chapter ends with).
- Push.
.github/workflows/book.ymlre-deploys togh-pageson every merge tomain.
The book follows a consistent style:
- Lower-case prose, sentence-case section headers.
- Code blocks for any command, inline
codefor filenames and identifiers. - Short paragraphs, one idea each.
- Examples should be runnable as written.
- PRD references use the full GitHub URL (
https://github.com/jgruberf5/roksbnkctl/blob/main/docs/prd/03-EXECUTION-BACKENDS.md) to avoid the published-book 404 issue surfaced in Sprint 1.
The PRD process
The project uses numbered Product Requirements Documents under docs/prd/ for larger feature work — anything that touches multiple files, spans more than one sprint, or has open design questions that need to be settled before code lands.
When a feature warrants a PRD vs. a direct PR:
| Use a PR | Use a PRD |
|---|---|
| Single-file change | Multi-file change across internal/{exec,cli,config,…} |
| Bug fix | New subsystem (a new backend, a new test suite) |
| Doc fix | New surface that needs a stable contract (a JSON schema, a workspace-config field) |
| Refactor with no behaviour change | A change that needs threat-model thinking (creds, network, multi-tenancy) |
| Drive-by polish | Anything cross-cutting >50 LOC |
The PRD lifecycle:
- Draft: open as a markdown file under
docs/prd/NN-<TITLE>.md. The structure should follow the existing PRDs (00-OVERVIEW, 01-SSH, 02-KUBECTL, 03-BACKENDS, 04-CREDS, 05-E2E): goal, approach, file-by-file plan, test plan, acceptance criteria, open questions. - Review: open a PR adding the PRD. Discuss in the PR. Open questions get resolved by edit or by punting to a follow-up issue.
- Implement: the PRD becomes the implementation plan. Per-sprint tasks land in
docs/PLAN.mdreferencing the PRD by number. - Land: code PRs reference the PRD; the PRD itself is the spec, code is the implementation. When the implementation diverges from the PRD, the PRD gets updated to match — never the other way around (the binary’s behaviour is the source of truth).
The PLAN.md per-sprint planning rhythm interleaves code + tests + docs per sprint. Each sprint’s prompts (under prompts/sprint<N>/) translate the PLAN into concrete agent tasks.
The four-agent sprint dispatch
Larger sprints (Sprints 3-6) are dispatched as four parallel agents:
- Architect — designs the surface, drafts the book chapters that explain it, files architect-side issues.
- Staff engineer — writes the production Go and shell code, modifies the bundled HCL when needed.
- Tech-writer — reviews the architect’s chapters for accuracy, fluency, and cross-link integrity. Files tech-writer-side issues.
- Validator — writes / extends the e2e test scripts and CI workflows, files validator-side issues.
The dispatch lives at prompts/sprint<N>/{architect,staff,tech-writer,validator}.md — one prompt per agent. Each agent runs independently against the same repo snapshot. An integrator at the end folds the four agents’ outputs together, resolves the issues each filed against the others, and commits the aggregate.
When to dispatch four agents vs. just open a PR:
| Direct PR | Four-agent sprint |
|---|---|
| Single feature, single sprint, <10 files | Multi-feature sprint with code + docs + tests scope |
| Bug fix | New PRD landing |
| Drive-by improvement | Sprint-gate milestone work |
| You’re the only contributor | Coordinating with reviewers who’d otherwise serialise |
prompts/README.md documents the agent-coordination pattern. The sprint dispatch is the project’s way of running review-and-implementation in parallel rather than serial — it works when the surfaces are well-separated (code vs docs vs tests don’t conflict on file ownership) and the integrator has enough context to merge the four lanes.
Worked example: adding a new execution backend
End-to-end Part IX scenario: you want to add a podman backend (rootless container runtime as an alternative to docker) so users on Fedora/RHEL hosts that ship podman by default don’t have to install Docker just to use the --backend docker workflow. Same surface, different daemon. The walkthrough below tracks the eight-step recipe above with concrete file paths and a diff-shaped sketch of each change.
# 1. Implement the interface — new backend file
cat > internal/exec/podman.go <<'GO'
package exec
import (
"context"
"os/exec"
)
type podmanBackend struct{}
func (p *podmanBackend) Run(ctx context.Context, argv []string, opts RunOpts) (int, error) {
args := append([]string{"run", "--rm"}, dockerStyleArgs(opts)...)
args = append(args, opts.Image)
args = append(args, argv...)
cmd := exec.CommandContext(ctx, "podman", args...)
return runWithRedactor(cmd, opts)
}
func init() {
Register("podman", &podmanBackend{})
}
GO
# 2. Add the tool image mapping (podman uses the same OCI images as docker)
# Edit internal/exec/podman.go — add a toolImages map analogous to docker.go,
# or share the docker.go map by exporting it. The two registries are
# compatible; you'd typically share.
# 3. Doctor check — internal/cli/doctor_backend.go
# Add a `checkPodmanBackend()` function that runs `podman info` once with a
# 2s timeout. Green if exit 0, yellow if podman not found, red if podman
# present but daemon unreachable.
# 4. Wire credentials — re-use the docker backend's cred-propagation logic
# (the `-e VAR` pattern works identically for podman). Pass opts.Credentials
# by env-var reference, never by argv. See internal/exec/docker.go::
# dockerStyleArgs for the pattern to copy.
# 5. Add cred-audit test
cat > internal/exec/podman_audit_test.go <<'GO'
package exec_test
func TestCredAudit_Podman(t *testing.T) {
// Run a no-op command via the podman backend with a known API key,
// then inspect `podman inspect`'s output for the key value. Assert
// the value never appears in the container's labels, env, or args.
}
GO
# 6. E2E phase — extend scripts/e2e-test-backends.sh
# Add Phase P (or extend Phase K) with a parallel sequence to K2-K6 but
# using --backend podman. Cross-link to PRD 05.
# 7. Documentation — chapters 17 + 18
# - Chapter 17: add a "Podman backend" section parallel to "Docker backend",
# noting it's rootless-by-default and a drop-in alternative.
# - Chapter 18: add a row to the per-tool matrix; add a decision-tree entry
# ("I'm on a podman-only host"); update the at-a-glance table.
# 8. Run the full test suite
go build ./...
go vet ./...
go test ./...
DRY_RUN=1 ./scripts/e2e-test-backends.sh
The PR should land all eight steps in one commit-set. A reviewer will look for: registered init(), doctor check, cred-audit test, e2e phase, and the two chapter additions. Without the audit + docs, the PR isn’t complete — see the cardinal rule at the top of the Adding a new execution backend section.
The same pattern applies to a new test suite, a new tool on an existing backend, or a new chapter — the eight-step recipe is the long version; the worked example is the copy-paste short version. Pick the shape that matches your contribution.
Cross-references
- Chapter 17 — Execution backends — the four-backend matrix you’re extending.
- Chapter 19 — The in-cluster ops pod — the k8s-backend pattern your new tool might join.
- Chapter 20-22 — the three existing test suites your new suite would join.
- Chapter 23 — The E2E test plan — where your new phase belongs.
- Chapter 31 — Building from source — the build-side counterpart to the hacking side.
- PRD 00 — Overview — the PRD index.
docs/PLAN.md— the per-sprint planning rhythm.prompts/README.md— the four-agent dispatch pattern.