Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Execution backends: local, docker, k8s, ssh

roksbnkctl runs a handful of external tools as part of its job — ibmcloud, terraform, iperf3, eventually dig-equivalents and others. By default each tool runs as a child process on your laptop. That’s fine for some tools and wrong for others: iperf3 from your laptop measures your laptop’s internet uplink, not the cluster’s bandwidth. Likewise, terraform via docker (the --backend docker mode covered below) lets you pin a frozen tool version for CI reproducibility without installing it on the host.

The execution-backend system lets you pick where each tool runs without changing the surface command. The same roksbnkctl ibmcloud ks cluster ls invocation can run as a local process, inside a vendored container, inside the cluster, or on a remote SSH host — selected by a flag or a per-tool default in your workspace config.

This chapter is the user-facing reference for all four backends. After the introduction, each backend gets its own deep-dive section covering the mechanics, the credential-propagation rules, the failure modes, and a short “when to use it” callout. Chapter 18 is the decision-tree companion that picks one for a given (tool, scenario) pair.

Architecture at a glance

The four backends sit between roksbnkctl (the binary on your laptop) and the external tools each backend runs. Every backend produces the same observable behaviour from the user’s point of view — the same roksbnkctl ibmcloud ks cluster ls invocation — but routes the actual tool execution to a different network vantage.

graph LR
    User[laptop<br/>roksbnkctl binary]
    subgraph local[local backend]
        L_tf[terraform]
        L_ibm[ibmcloud]
        L_iperf[iperf3]
        L_dns[dns probe]
    end
    subgraph docker[docker backend]
        D_ibm[ibmcloud<br/>frozen image]
        D_tf[terraform<br/>frozen image]
    end
    subgraph k8s[k8s backend]
        K_ops[ops pod<br/>ibmcloud]
        K_job[one-shot Job<br/>iperf3 / dns]
    end
    subgraph ssh[ssh:target backend]
        S_ibm[ibmcloud<br/>on jumphost]
        S_iperf[iperf3<br/>on jumphost]
    end
    Cluster[ROKS cluster<br/>cert-manager + flo + BNK]
    Jump[SSH jumphost<br/>auto-discovered from terraform]
    IBMAPI[IBM Cloud API<br/>+ IAM]

    User --> local
    User --> docker
    User --> k8s
    User --> ssh
    local --> IBMAPI
    docker --> IBMAPI
    k8s --> Cluster
    Cluster --> IBMAPI
    ssh --> Jump
    Jump --> IBMAPI
    Jump -.->|optional<br/>private path| Cluster

    classDef bk fill:#f4f4f4,stroke:#666,color:#000;
    class local,docker,k8s,ssh bk;

Chapter 18 is the decision-tree companion that picks a backend for a given scenario; the rest of this chapter is the per-backend mechanics.

The four backends at a glance

BackendWhat it does
localos/exec — spawns the tool as a child process, inheriting your env and PATH
dockerdocker run against a vendored image (ghcr.io/jgruberf5/roksbnkctl-tools-<tool>:<v>); frozen toolchain version
k8sRuns inside the cluster, either in a long-lived ops pod or as a one-shot Job; auth via the pod’s ServiceAccount token
sshRuns on a registered SSH target via the built-in SSH client; opt-in apt-bootstrap of missing tools on Ubuntu

Each backend solves a different problem:

  • local: fastest startup, simplest mental model, requires the host tool to exist on PATH.
  • docker: reproducible across dev machines, no host install needed, frozen at a known-good tool version.
  • k8s: network-correct (private IPs reachable, cluster-internal services accessible), zero host install, in-cluster identity via ServiceAccount.
  • ssh: pre-cluster ops from a known-IP bastion, customer-firewall workflows, air-gapped environments where the laptop can’t reach IBM Cloud APIs but the jumphost can.

All four implementations conform to the same Go interface (internal/exec.Backend) so callers don’t branch on backend type — they just call backend.Run(ctx, argv, opts) and let the implementation handle the mechanics. That uniformity is what lets the same roksbnkctl ibmcloud ks cluster ls work across all four with no surface-level change.

The --backend CLI flag

Override the per-tool default for a single invocation:

# Local (the implicit default for ibmcloud + terraform)
roksbnkctl ibmcloud ks cluster ls

# Same command, in a vendored docker image
roksbnkctl ibmcloud --backend docker ks cluster ls

# Same command, in the cluster (requires `roksbnkctl ops install` first)
roksbnkctl ibmcloud --backend k8s ks cluster ls

# Same command, on a remote SSH host
roksbnkctl ibmcloud --backend ssh:jumphost ks cluster ls

Format:

--backend local|docker|k8s|ssh:<target>

The ssh:<target> form pins the SSH backend to a specific named target from roksbnkctl targets list (registered via roksbnkctl targets add; see Chapter 16).

The flag is persistent at the root — it works for any command that runs an external tool. Commands that don’t run external tools (like roksbnkctl ws list) ignore it.

The flag wins over the workspace-config default. If config.yaml says iperf3: { backend: k8s } and you pass --backend local, the local backend runs.

Backend-failure semantics

Each backend has a different failure surface. The convention is:

  • Backend startup failure (Docker daemon unreachable, k8s API unreachable, SSH connect refused, binary not on PATH for local) ⇒ exit code 127, with a message naming the cause. No silent fallback to local. Silent fallback hides intent and produces confusing test results.
  • Backend mid-run failure (the container started but couldn’t pull a sub-resource; the pod was OOMKilled before the wrapped tool ran; the SSH session died after apt-get install but before the tool exec) ⇒ exit code 126, distinct from 127 so CI can tell “we never got going” from “we got going then broke”.
  • Tool exit code (the actual ibmcloud / terraform / iperf3 exit code, anything in 0-125 or 128-255) ⇒ propagated 1:1, including non-zero codes.
  • Context cancellation / timeout ⇒ exit code 137 (the conventional SIGKILL-on-signal code).

This way, your CI script can tell “the tool said X failed” (typical exit codes) from “we never reached the tool” (127) from “we reached the tool, then the backend died mid-flight” (126) from “we ran out of time” (137).

Per-tool defaults from exec:

Workspace config carries the per-tool default backend in the exec: block:

# ~/.roksbnkctl/<workspace>/config.yaml
exec:
  ibmcloud:  { backend: local }
  iperf3:    { backend: k8s }
  terraform: { backend: local }

The defaults shipped today:

ToolDefault backendSupported backendsWhy
terraformlocallocal, dockerThe terraform-exec local path is the established workflow. State handling is simplest here. The docker backend runs frozen hashicorp/terraform:1.5.7 with a bind-mounted state dir — see § terraform via docker. k8s and ssh are deferred to v1.x.
ibmcloudlocallocal, docker, k8s, ssh:<target>Most users have it on PATH or are happy installing it. Compliance/firewall scenarios opt in via --backend ssh:jumphost or docker.
iperf3k8slocal, k8s, ssh:<target>Throughput from a laptop’s uplink isn’t the cluster’s bandwidth. The k8s default runs the iperf3 client adjacent to (or inside) the cluster so the number reflects cluster fabric, not your office Wi-Fi.
dnslocallocal, k8s, ssh:<target>Single-vantage by default; --gslb-compare fans out across configured vantages for GSLB validation. See Chapter 21.

Chapter 12 — Workspace config covers the exec: block schema in detail; this chapter just notes its place in the backend system.

Chapter 18 — Choosing a backend per tool is the decision tree for “which backend should I pick for this tool in this scenario”.

Per-backend deep dives

local backend

The default for ibmcloud and terraform. os/exec.CommandContext(ctx, argv[0], argv[1:]...), inheriting the parent process’s environment, PATH, and working directory. Mechanically the simplest of the four — no container, no cluster, no network handshake.

os/exec shape

internal/exec/local.go resolves argv[0] via exec.LookPath, then builds a *exec.Cmd:

bin, err := exec.LookPath(argv[0])
// fall through to argv[0] verbatim if it's an absolute path that LookPath rejects
cmd := exec.CommandContext(ctx, bin, argv[1:]...)
cmd.Env    = effectiveEnv     // os.Environ() + opts.Env + Credentials.EnvVars()
cmd.Dir    = opts.WorkDir     // empty → inherit caller's CWD
cmd.Stdin  = opts.Stdin
cmd.Stdout = redactor(opts.Stdout, creds)
cmd.Stderr = redactor(opts.Stderr, creds)

The redactor wrap is defense-in-depth — see Chapter 14 §“The redactor”. If a wrapped tool ever prints IBMCLOUD_API_KEY value to stdout (a debug trace, an error message), the redactor replaces it with [REDACTED] before the bytes leave the binary.

Env propagation

Three sources, in order:

  1. The host process’s environment (os.Environ()) — your shell’s PATH, HOME, KUBECONFIG, etc.
  2. RunOpts.Env — caller-supplied KEY=VALUE strings (e.g., IBMCLOUD_REGION=ca-tor from the workspace config).
  3. Credentials.EnvVars()IBMCLOUD_API_KEY=… plus the legacy IC_API_KEY=… alias older ibmcloud versions accept.

os/exec documents that for duplicate keys the last entry wins. So caller-supplied vars override host env, and credential vars override caller-supplied — meaning a workspace’s API key always wins over a stale IBMCLOUD_API_KEY in your shell.

The local backend does not scrub the host env. If you have an unrelated AWS_ACCESS_KEY_ID in your shell, the wrapped tool sees it. That’s by design — local is the “trust the user’s shell” path; if you want a hermetic env, switch to docker.

Working directory

RunOpts.WorkDir becomes cmd.Dir. Empty → inherit the caller’s CWD (Cobra’s RootCmd.Run runs from wherever the user invoked roksbnkctl).

When RunOpts.Files is non-empty and WorkDir is empty, the local backend creates a tempdir under os.TempDir(), writes each Files entry as a 0600 file inside, and uses the tempdir as WorkDir. The tempdir is removed via defer after Run returns. This is mostly there for symmetry with the docker / k8s / ssh backends; today’s ibmcloud passthrough never uses it.

Signal handling

exec.CommandContext wires ctx cancellation to the child: when the ctx ticks past its deadline (or the user hits Ctrl-C and the root cobra command cancels), Go sends SIGKILL (the default Cmd.Cancel) to the child. The child has no opportunity to clean up; this is intentional — we’d rather kill a stuck terraform than wait on an indefinite hang.

The kill is process-only, not process-group. If terraform has spawned grandchildren (the IBM provider’s helpers, an SSH key generator, etc.) those grandchildren may outlive the ctx-cancel by a few seconds. We haven’t seen this matter in practice; if it does, a pgid kill is a small follow-up.

Exit-code mapping

OutcomeExit codeSource
Child exits 00child
Child exits non-zero (e.g., terraform plan saw drift)child’s exit code, 1-125 or 128-255child
argv[0] not on PATH and not an absolute path127local backend (POSIX shell convention)
Child binary couldn’t be exec’d despite being present (e.g., not executable)126local backend (mid-run failure: we found the binary but couldn’t spawn it)
Ctx cancelled mid-run, child SIGKILL’d137128 + SIGKILL

Note the 126 vs 127 split: 127 means “we never reached the tool” (binary missing, daemon unreachable, SSH refused); 126 means “we reached the tool but the backend itself broke after that point” (couldn’t fork, container created but crashed, pod scheduled but evicted before exec). Sprint 3 collapsed both to 127 in the local + docker implementations; this sprint splits them per PRD 03 §“Backend interface”. CI scripts that distinguish “test infra broken” from “real test failure” can now key on the difference.

When to use it

  • You have the tool installed and on PATH already.
  • You want the fastest startup — no container daemon, no SSH handshake, no cluster API call.
  • You’re running terraform against the workspace’s local state (the established workflow).
  • You’re debugging and want the simplest mental model for “where did that output come from”.

Chapter 18 §“Decision tree” expands these into a per-(tool, scenario) walkthrough.

docker backend

Runs the tool inside a vendored container image, talking to the local docker daemon over its socket. docker on PATH is not required — roksbnkctl uses the official Docker Go SDK (github.com/moby/moby/client) and dials the socket directly.

roksbnkctl ibmcloud --backend docker ks cluster ls

Container shape

Mechanically (the ibmcloud passthrough; iperf3 client is similar with a different image and ports):

docker run --rm \
  -v <tempdir>/kubeconfig:/root/.kube/config:ro \  # if Credentials.KubeconfigBytes set
  -e IBMCLOUD_API_KEY \                            # bare name; value inherits
  -e IC_API_KEY \                                  # legacy alias
  ghcr.io/jgruberf5/roksbnkctl-tools-ibmcloud:<v> \
  ks cluster ls

internal/exec/docker.go doesn’t shell out to docker run; it builds a container.Config + container.HostConfig and calls cli.ContainerCreateContainerStartContainerLogs(stream=true). The bash-style above is the conceptual equivalent.

There’s no workspace-wide bind-mount. Per-invocation mounts come from three sources only:

  1. Credentials.KubeconfigBytes — written to <tempdir>/kubeconfig (mode 0600) on the host, bind-mounted as a single file at /root/.kube/config read-only. Single-file mount per PRD 04 §“Anti-patterns” — bind-mounting ~/.kube/ exposes other clusters’ configs.
  2. RunOpts.Files — each name → bytes entry written to <tempdir>/<basename> and bind-mounted at /work/<basename>. The container’s WorkingDir is set to /work so callers can reference files by relative path. (ibmcloud passthrough doesn’t use this; it lands when the iperf3 client backend wants to ship iperf3.json to the pod, or when a future tool wants a config file.)
  3. RunOpts.WorkDir — overrides WorkingDir if explicitly set.

The tempdir is removed via defer after Run returns, regardless of exit code or panic.

Credential propagation specifics

Three things matter, all enforced by internal/exec/creds.go::Credentials.DockerArgs(...):

  1. --env IBMCLOUD_API_KEY (bare name, no =value). The docker daemon looks up the value from the daemon’s environment at container-create time, not from argv. So the literal API key string never appears in docker inspect, docker ps -a --format, or the daemon’s container metadata. PRD 04 §“Anti-patterns” calls out the --env IBMCLOUD_API_KEY=$KEY form as a leak vector — we don’t use it. See Chapter 14 — Credentials.DockerArgs() for the full call shape.
  2. Single-file kubeconfig mount, read-only. Not the parent dir. The container can read exactly the kubeconfig you handed it — nothing else under ~/.kube/.
  3. Stdout/stderr through the redactor. Same defense-in-depth as the local backend: if the wrapped tool prints the API key value (rare but possible), the redactor masks it before the bytes leave roksbnkctl’s process.

:dev tag resolution

The vendored images live at:

ToolImage
ibmcloudghcr.io/jgruberf5/roksbnkctl-tools-ibmcloud:<tag> (vendored from icr.io/ibm-cloud/ibmcloud-cli upstream)
iperf3ghcr.io/jgruberf5/roksbnkctl-tools-iperf3:<tag> (Alpine + iperf3)
terraformhashicorp/terraform:<v> (official upstream)

The <tag> for the vendored per-tool images (ibmcloud, iperf3) is resolved at runtime by internal/exec/docker.go::toolImageTag(). It reads the binary’s internal/version.Version (set via ldflags at build time): a release-built binary like v0.10.0 pulls :v0.10.0; a dev build (Version == "dev") pulls :dev. Sprint 4 landed this version-pinning in place of Sprint 3’s hard-coded :dev so a go install of a tagged release pulls a matching tagged image rather than a :dev that may not exist for the published binary. The terraform row is the exception — it points at the upstream hashicorp/terraform image and stays pinned to a specific version (currently 1.5.7) regardless of roksbnkctl’s own version.

The :dev tag is still the local-development idiom: cd tools/docker && make build-all builds and tags every tools image as :dev locally; a dev-build roksbnkctl finds them via the local docker cache without a ghcr.io round-trip.

If you’re cutting a custom tools image and want roksbnkctl to pick it up, the simplest path is docker tag your-image ghcr.io/jgruberf5/roksbnkctl-tools-ibmcloud:dev locally — the docker backend pulls the local-cached version first.

Auto-remove and ctx-cancel-kill

Two cleanup mechanisms work together:

  • AutoRemove: true in HostConfig. The docker daemon removes the container as soon as it exits, regardless of exit code. No docker ps -a clutter, no manual docker rm ever required.
  • Ctx-cancel triggers ContainerKill. When ctx.Done() fires, the docker backend issues cli.ContainerKill(ctx, id, "SIGKILL") and waits a few seconds for the daemon to confirm. The --rm then takes care of removal. Net effect: hitting Ctrl-C during a stuck ibmcloud login doesn’t leave a zombie container behind.

Combined with the daemon’s own watchdog on the container, the worst case is a few seconds of “container is dying” between Ctrl-C and the container disappearing. We haven’t seen leaked containers in dev or CI.

Image build pipeline

Image versions are tagged in lock-step with roksbnkctl releases; the GitHub Actions workflow that builds + pushes them runs on every release tag. See Chapter 31 — Building from source for the build pipeline details.

terraform via docker

terraform is the second tool routed through the docker backend (alongside ibmcloud). The shape is similar to ibmcloud (docker run against a vendored image, single-file mounts for sensitive data, no creds in argv) but with two terraform-specific concerns: state persistence across runs, and host-user UID alignment so state files written inside the container stay readable on the host.

State persistence via bind-mount

Terraform’s local state file lives at terraform.tfstate in the working directory. For the docker backend the working directory has to be a host-side path bind-mounted into the container, not a container-internal path that disappears on --rm. The docker backend bind-mounts the workspace’s state directory into the container:

docker run --rm \
  -v ~/.roksbnkctl/<workspace>/state:/state \
  --workdir /state/tf-source/embedded-terraform \
  --user $(id -u):$(id -g) \
  hashicorp/terraform:1.5.7 \
  apply -auto-approve

Concretely:

  • Host source: ~/.roksbnkctl/<workspace>/state/ — the same directory the local terraform backend writes state to today, so switching between --backend local and --backend docker against the same workspace doesn’t fork state.
  • Container target: /state — the bind-mount root inside the container.
  • Container working directory: /state/tf-source/<source>/ (e.g., /state/tf-source/embedded-terraform/ for the default embedded source) — the same path the local backend resolves to, so terraform sees the same main.tf either way.
  • The HCL is bind-mounted alongside state. The embedded HCL is materialised at run time into ~/.roksbnkctl/<workspace>/state/tf-source/<source>/ (chapter 31 covers the embedded-source layout); since state/ is the bind-mount root, both terraform.tfstate and the HCL tree land inside the container together. There’s no separate HCL projection.

The bind-mount is read-write — terraform needs to write terraform.tfstate, rotate terraform.tfstate.backup, and populate the .terraform/ cache. Combined with --rm, the file lifecycle is: container creates state, container exits, --rm removes the container, state files persist on the host. Subsequent runs (re-mounted at the same host path) pick up where the prior run left off.

Image: hashicorp/terraform:1.5.7

The image is the official upstream hashicorp/terraform published by HashiCorp on Docker Hub, pinned to a literal version in internal/exec/docker.go’s toolImages map (currently 1.5.7). The pin is intentional — the embedded HCL has been validated against this terraform version, and the docker backend’s whole point is reproducibility. Bumping the pin is a deliberate change to the binary and lands as a release.

The vendored per-tool images (ibmcloud, iperf3) get their tag from the roksbnkctl binary’s own version (see § :dev tag resolution above). Terraform is the exception — the binary’s version doesn’t follow upstream terraform’s release cadence, so the pin stays literal.

The UID/GID alignment gotcha

Linux Docker containers run as root by default. With a root-owned container writing into a bind-mount, the resulting host files end up owned by root — and any subsequent local-backend terraform apply (or even a cat ~/.roksbnkctl/<ws>/state/terraform.tfstate) hits permission errors. The docker backend works around this by passing --user $(id -u):$(id -g) explicitly:

docker run --rm \
  --user 1000:1000 \                                     # host's caller-uid:caller-gid
  -v ~/.roksbnkctl/dev-tor/state:/state \
  --workdir /state/tf-source/embedded-terraform \
  hashicorp/terraform:1.5.7 \
  apply -auto-approve

The container process runs as the host user, so files written into the bind-mount are owned by the host user — same as a local-backend terraform apply would have produced. Switching backends mid-debug doesn’t strand state files behind a permission wall.

The UID/GID values are read from the host process at run time (Go’s os.Getuid() / os.Getgid()). On macOS this is mostly cosmetic — Docker Desktop’s VM normalises ownership on the host bind-mount automatically — but it’s required for clean Linux behaviour, so the backend always passes the flag.

Supported commands

The terraform docker backend honours --backend docker for the four lifecycle commands:

roksbnkctl up    --backend docker  [--var-file <path>] [--auto]
roksbnkctl plan  --backend docker  [--var-file <path>]
roksbnkctl apply --backend docker  [--var-file <path>] [--auto]
roksbnkctl down  --backend docker  [--var-file <path>] [--auto]

Flags that the local terraform backend honours (--var-file, --auto, plus the -w/--workspace selector) plumb through to the docker backend identically — the backend’s job is to spawn hashicorp/terraform:1.5.7 with the right argv; it doesn’t filter or rewrite the lifecycle commands’ flags. (--auto is roksbnkctl’s shorthand for terraform’s -auto-approve; the wrapper renames it for terseness and consistency across up/apply/down.)

roksbnkctl up --backend docker is the apply-with-auto-approve shorthand the existing local lifecycle uses; --backend docker switches the spawn target without changing the command shape.

Deferred: k8s and ssh terraform backends

--backend k8s and --backend ssh:<target> for terraform are not in v1.0. The blocker is state-handling: the local backend keeps state on the host filesystem, the docker backend bind-mounts the same path, but k8s (run terraform in a one-shot Job) and ssh:<target> (run terraform on a remote host) need a story for shipping state between the run vantage and the canonical workspace state dir. Designs under consideration include a versioned ConfigMap/Secret pair for k8s and an scp-pre-and-post atomic move for ssh; both are deferred to v1.x once the trade-offs have settled (see docs/PLAN.md §“What’s deliberately deferred to post-v1.0”).

PRD 03 §“State concerns” is the design spec; trying --backend k8s against terraform errors at parse time:

$ roksbnkctl up --backend k8s
error: terraform doesn't support backend `k8s` at v1.0 (state-handling design
       open; tracked in PRD 03 § State concerns); supported: local, docker

When to use it

  • You’re on a clean dev machine without ibmcloud installed and don’t want to install it.
  • You need a frozen tool version for CI reproducibility.
  • You’re debugging a “works on my machine” issue and want to factor out the host install variable.

When docker is the wrong call:

  • The tool needs network access that your laptop has but the container doesn’t (rare; default bridge networking usually preserves laptop’s egress).
  • You’re running iperf3 and want a network-locality benefit — docker doesn’t give you that vs local. Use k8s instead.
  • You’re running a DNS probe and want a different network vantage — same network identity as the host, no value-add. The DNS subcommand rejects --backend docker by design.
  • You’re on Windows. Linux/macOS docker daemons are in scope; Windows Docker Desktop coverage is deferred to a future round.

k8s backend

Runs the wrapped tool inside the cluster. Two distinct execution patterns share the same Backend.Run interface:

PatternUsed forLives inLifetime
Long-lived ops podad-hoc ibmcloud commands, future interactive shellsroksbnkctl-ops namespacemanually managed via roksbnkctl ops install/uninstall
One-shot Jobiperf3 client runs, future terraform runs, future DNS probesroksbnkctl-test namespaceper-invocation; auto-deleted after ttlSecondsAfterFinished: 60

The split mirrors the two latency budgets. Long-lived pods amortise the pod-startup cost across many invocations — perfect for ibmcloud iam oauth-tokens which you might run twenty times in a debugging session. One-shot Jobs are clean (no leftover state, no concurrency questions) — perfect for iperf3 -c <server> which runs once, emits its JSON, and exits.

Long-lived ops pod pattern

The pod is named roksbnkctl-ops in the roksbnkctl-ops namespace. roksbnkctl ops install deploys it (see Chapter 19 for the full lifecycle). The image bundles ibmcloud CLI plus kubectl as backup; future iterations may add oc, terraform, etc. The container inside the pod is named tools.

Backend.Run(ctx, argv, opts) for the ops-pod path is essentially:

exec, _ := remotecommand.NewSPDYExecutor(restConfig, "POST",
    clientset.CoreV1().RESTClient().Post().
        Resource("pods").Namespace("roksbnkctl-ops").Name("roksbnkctl-ops").
        SubResource("exec").
        VersionedParams(&corev1.PodExecOptions{
            Container: "tools",
            Command:   argv,
            Stdin:     opts.Stdin != nil,
            Stdout:    true,
            Stderr:    true,
            TTY:       opts.TTY,
        }, scheme.ParameterCodec).URL())
exec.StreamWithContext(ctx, remotecommand.StreamOptions{
    Stdin: opts.Stdin, Stdout: redactor(opts.Stdout, creds), Stderr: redactor(opts.Stderr, creds), Tty: opts.TTY,
})

The exit code comes back via the SPDY channel’s metav1.Status — the executor surfaces it as a exec.CodeExitError. We propagate that as the backend’s exit code, same as local propagates exec.ExitError.ExitCode().

opts.WorkDir is ignored for the ops pod path. The pod’s WorkingDir is fixed at container-spec time (/work); per-exec working-dir changes would require recreating the pod. Callers that need a specific cwd should cd <dir> && it into argv (the local backend’s symmetric escape hatch).

One-shot Job pattern

For each invocation, the backend builds a batchv1.Job spec, applies it, streams logs from the Job’s pod, waits for completion, reads the exit code from the pod’s container status, and lets ttlSecondsAfterFinished clean up.

Skeleton:

apiVersion: batch/v1
kind: Job
metadata:
  generateName: roksbnkctl-iperf3-client-     # randomized; multiple runs don't collide
  namespace: roksbnkctl-test
spec:
  ttlSecondsAfterFinished: 60                  # auto-delete the Job + its Pod 60s after completion
  backoffLimit: 0                              # no retries; the test reports failure once and stops
  template:
    spec:
      restartPolicy: Never
      containers:
      - name: iperf3-client
        image: ghcr.io/jgruberf5/roksbnkctl-tools-iperf3:<v>
        command: ["iperf3", "-c", "<server-svc>", "-J"]
        envFrom:
        - secretRef:
            name: roksbnkctl-job-creds-<random>   # projected per invocation
        volumeMounts:
        - name: files
          mountPath: /work
      volumes:
      - name: files
        projected:
          sources:
          - secret:
              name: roksbnkctl-job-files-<random>  # one Secret per invocation, holds RunOpts.Files

Three details to call out:

  1. Projected Secret for cred propagation. Credentials.IBMCloudAPIKey (when set) becomes a one-shot Secret, mounted via envFrom: secretRef. Per PRD 04 §“In-cluster pod” this beats argv (which would show in kubectl describe pod) and beats inline env: blocks (which surface in kubectl get pod -o yaml). The Secret carries the same ttlSecondsAfterFinished-equivalent lifecycle: when the Job’s ttlSecondsAfterFinished deletes the Job, the owning controller’s GC sweeps the Secret too via ownerReferences.
  2. Log streaming via client-go. Once the Job’s pod is in Running state, clientset.CoreV1().Pods(ns).GetLogs(name, &corev1.PodLogOptions{Follow: true}).Stream(ctx) returns an io.ReadCloser that we copy through the redactor into opts.Stdout. The stream stays open until the pod terminates or ctx cancels.
  3. Exit-code extraction. When the pod transitions to Succeeded or Failed, we read pod.Status.ContainerStatuses[0].State.Terminated.ExitCode and return that as the backend’s exit code. A Failed pod with ExitCode: 0 (rare; usually OOMKilled or evicted) maps to backend exit code 126 — backend mid-run failure rather than tool failure.

The roksbnkctl-test namespace is a fresh namespace dedicated to one-shot test workloads. It’s separate from roksbnkctl-ops (the long-lived pod’s home) so RBAC can be scoped tighter — see Chapter 19 §“RBAC”.

iperf3 server side

Worth calling out because it’s the asymmetric piece. The iperf3 test deploys a server bare Pod + Service into roksbnkctl-test, then runs the client as the one-shot Job described above:

SideResourceLifetime
Serverroksbnkctl-iperf3 bare Pod + Service (LoadBalancer for --mode north-south; ClusterIP for --mode east-west)torn down after the client Job completes
Clientone-shot JobttlSecondsAfterFinished: 60

The bare-Pod (rather than Deployment) shape is intentional — the iperf3 server is single-shot, scoped to one test, torn down on completion; the controller-managed replica machinery a Deployment provides is unused and would only confuse the cleanup story. Service type is driven by --mode: north-south measures laptop-to-cluster bandwidth and needs a publicly reachable endpoint (LoadBalancer); east-west measures node-to-pod and stays in-cluster (ClusterIP). See Chapter 22 — Throughput testing for the user-facing flag surface.

The client Job’s argv is iperf3 -c <server-cluster-ip-or-lb> -J. The -J JSON flows back via log streaming, parsed in internal/test/throughput.go, surfaced as roksbnkctl test throughput JSON output.

The server pod’s securityContext is set to satisfy OpenShift’s restricted-v2 SCC: runAsNonRoot: true, allowPrivilegeEscalation: false, seccompProfile.type: RuntimeDefault, capabilities.drop: [ALL]. iperf3 listens on port 5201 (unprivileged) so no root is needed. The Sprint 3 cluster baseline tripped the SCC by missing one or more of these fields; the manifest the k8s backend emits this sprint sets all four.

When to use it

  • You’re running iperf3 and want a number that reflects cluster fabric, not your office Wi-Fi.
  • You’re running ibmcloud from a network that can reach the cluster but not *.cloud.ibm.com directly. The ops pod has both lines of sight; your laptop has only one.
  • You want a cluster-side ad-hoc shell for debugging — roksbnkctl exec --backend k8s -- bash (when implemented) drops into the ops pod.

When k8s is the wrong call:

  • The cluster doesn’t exist yet (roksbnkctl ops install requires a working kubeconfig). Use local or ssh for pre-cluster ops.
  • You haven’t run roksbnkctl ops install. Run it first; it’s a one-time setup per cluster.
  • You’re running terraform--backend k8s for terraform is deferred to a future release pending a state-handling design (see PRD 03 §“State concerns”).

Chapter 19 is the full reference for the cluster-side mechanics: namespace, ServiceAccount, ClusterRole, Secret, lifecycle.

ssh backend

Runs the wrapped tool on a registered SSH target. Builds on Sprint 1’s internal/remote.Client (the same SSH client backing the --on flag); this section assumes you’ve read Chapter 16 for the target-config and host-key TOFU framing.

roksbnkctl ibmcloud --backend ssh:jumphost ks cluster ls
roksbnkctl ibmcloud --backend ssh:bastion --bootstrap iam oauth-tokens

Per-tool apt-bootstrap and the --bootstrap flag

Before exec’ing the wrapped tool, the SSH backend probes whether it’s installed:

ssh <target> 'command -v <tool>'

Exit 0 → tool present, proceed. Non-zero → tool missing. What happens next depends on --bootstrap:

  • Without --bootstrap (the default). The backend errors with exit 127 and a clear message:

    error: tool `iperf3` not found on ssh target jumphost; re-run with --bootstrap to install via apt-get,
           or pre-install on the target manually
    

    No sudo apt-get ever runs. The backend won’t surprise the user with package-manager invocations or sudo password prompts on a remote they didn’t expect mutation on.

  • With --bootstrap. The backend runs the per-tool bootstrap recipe. For Ubuntu (the only OS supported this round), the recipe is roughly:

    # ibmcloud needs IBM's apt repo + GPG key first
    curl -fsSL https://download.clis.cloud.ibm.com/Linux/Ubuntu/repo.gpg | sudo apt-key add -
    echo 'deb https://download.clis.cloud.ibm.com/Linux/Ubuntu jammy main' \
      | sudo tee /etc/apt/sources.list.d/ibmcloud.list
    sudo -n apt-get update -y
    sudo -n apt-get install -y ibmcloud-cli
    

    iperf3 is simpler — no repo addition, just sudo -n apt-get install -y iperf3.

The opt-in default reflects PRD 03 open question §“--bootstrap opt-in for SSH”: silent sudo apt-get on a remote host is the kind of surprise that erodes operator trust, especially when the remote is shared between teams. Make the user say “yes, install for me”.

Bootstrap failure modes split between the two backend-failure exit codes per §“Backend-failure semantics”: 127 when we never got going (couldn’t reach the repo, no apt mapping, tool missing without --bootstrap); 126 when we got partway in and then something broke (sudo / OS-detect / install).

FailureExitWhat you see
--bootstrap not set and tool missing127“tool <name> not found on ssh target <target>; re-run with –bootstrap to install via apt-get, or pre-install on the target manually”
sudo requires a password (NOPASSWD not configured)126sudo: a password is required → “the SSH user needs passwordless sudo for apt-get install. Configure <user> ALL=(ALL) NOPASSWD: /usr/bin/apt-get in /etc/sudoers, or pre-install <pkg> manually.”
Non-Ubuntu OS (lsb_release -is doesn’t return Ubuntu)126“auto-install only supports Ubuntu. Pre-install <pkg> on the target (RHEL: yum install <pkg>).”
Network unreachable from target (apt-get can’t reach the repo)127“target can’t reach the package repo. Check the target’s egress policy or pre-install <pkg> manually.”
No apt mapping for the requested tool126“no bootstrap recipe known for tool <name>; the SSH backend only auto-installs ibmcloud + iperf3 today”

File materialisation

RunOpts.Files entries are written to a per-invocation tempdir on the remote. The tempdir is /tmp/roksbnkctl.<random>/ where <random> is a fresh 16-byte hex string per Run:

# pseudo-flow
ssh <target> 'mkdir -m 0700 /tmp/roksbnkctl.<rand>'
scp <local-temp>/<basename> <target>:/tmp/roksbnkctl.<rand>/<basename>
ssh <target> '
  trap "rm -rf /tmp/roksbnkctl.<rand>" EXIT
  cd /tmp/roksbnkctl.<rand>
  <argv...>
'

The trap … EXIT is shell-builtin; it fires on normal exit, on set -e failure, on SIGINT (Ctrl-C), on SIGTERM. So even if the user kills their roksbnkctl invocation mid-run, the remote tempdir is cleaned up by the wrapper script’s own trap before the SSH session terminates.

The 0700 mode on the tempdir ensures only the SSH user can read it during the brief on-disk window. On shared bastions (multi-user jumphosts) this matters — and it’s why we materialise to /tmp (which the user owns) rather than /var/tmp or some shared scratch path.

Kubeconfig follows the same pattern: Credentials.KubeconfigBytes becomes <tempdir>/kubeconfig, the wrapper exports KUBECONFIG=<tempdir>/kubeconfig, the trap removes the file on exit. PRD 04 §“Kubeconfig options for SSH backend” calls this “Option A” — scp-and-cleanup. We picked it over the in-memory <() process-substitution alternative because it’s robust across remote shells and sshd configs.

Env propagation: SetEnv vs wrapper script

OpenSSH supports two ways to pass an env var to a remote command:

  1. ssh -o SetEnv=KEY=VALUE target … — client tells the server “please add this to the env”. Works only if the server’s sshd_config has AcceptEnv KEY matching. Most stock sshd configs don’t enable AcceptEnv for arbitrary keys.
  2. Wrapper script with export KEY=VALUE — the script writes the env var into its own process before exec "$@". Works regardless of sshd config, but the value lives briefly in a 0700 file on the remote.

The SSH backend tries SetEnv first. On the first connect to a new target, it sends a sentinel env var (ROKSBNKCTL_SETENV_TEST=ok) and runs echo "$ROKSBNKCTL_SETENV_TEST". If the output is ok, SetEnv works on this target — the result is cached in workspace state, and subsequent runs use SetEnv directly.

If the sentinel doesn’t surface, sshd silently dropped it (it logs refused setenv request on the server side, but clients don’t see that). The backend falls back to a wrapper script:

#!/bin/sh
# /tmp/roksbnkctl.<rand>/wrap.sh, mode 0700, owner-readable only
trap 'rm -f "$0"' EXIT
set +o history
export IBMCLOUD_API_KEY='<value>'
exec "$@"

Then: ssh <target> /tmp/roksbnkctl.<rand>/wrap.sh ibmcloud iam oauth-tokens.

The wrapper-script path is the Sprint 1 validator Issue 4 carry-over — the same shape --on uses for env passing today. Risks (file content includes the secret) are mitigated by:

  • Mode 0700 so only the SSH user can read.
  • set +o history so the value doesn’t leak into shell history.
  • trap 'rm -f "$0"' EXIT deletes the wrapper as soon as it exits — including on Ctrl-C, since the trap covers SIGINT/SIGTERM by virtue of being in the script’s main process.
  • The key is never in argv, so ps -ef on the remote doesn’t show it.

roksbnkctl targets show <name> reports which mechanism the target uses (e.g., env propagation: SetEnv (AcceptEnv ok) or env propagation: wrapper script (sshd refused SetEnv)) so users can choose to enable AcceptEnv server-side if they prefer.

Bootstrap failure modes (consolidated)

SymptomCauseRemediation
sudo: a password is requiredNOPASSWD sudo not configuredAdd <ssh-user> ALL=(ALL) NOPASSWD: /usr/bin/apt-get to /etc/sudoers.d/roksbnkctl on the target
auto-install only supports Ubuntu/etc/os-release ID is not ubuntuPre-install the tool manually; RHEL: sudo yum install <pkg>; Alpine: sudo apk add <pkg>
target can't reach the package repoTarget’s egress policy blocks download.clis.cloud.ibm.com (or upstream Ubuntu mirrors)Pre-install or open egress; doctor’s --backend ssh:<target> flags this
tool not found on ssh target …; re-run with --bootstrap--bootstrap not passed and tool missingRe-run with --bootstrap, or pre-install on the target

When to use it

  • You’re running ibmcloud from a customer-firewalled office where the corporate jumphost can reach IBM Cloud APIs but your laptop can’t.
  • You’re working in an air-gapped environment where roksbnkctl runs on your laptop but the IBM Cloud API conversations have to happen from a specific bastion’s IP.
  • You want a low-overhead remote-exec path that doesn’t require a cluster (the k8s backend’s prereq).

When ssh is the wrong call:

  • The target lacks the tool and you don’t want to mutate it. Skip --bootstrap; the backend errors clearly without installing anything.
  • The target isn’t Ubuntu and you don’t want to pre-install. Bootstrap won’t work; pre-install or use local/docker/k8s.
  • You’re running iperf3 to measure cluster bandwidth. SSH puts the client somewhere on the network path to the cluster but not necessarily adjacent to it — k8s is the right answer for that case.

Chapter 16 covers the lighter-weight --on jumphost predecessor that uses the same targets: config block. The SSH backend is the heavier-duty form: file materialisation, env propagation hardening, opt-in bootstrap. Chapter 18 is the decision tree.

The Backend interface

For the curious, the Go interface every backend conforms to:

package exec

type Backend interface {
    Run(ctx context.Context, argv []string, opts RunOpts) (int, error)
    Name() string
}

type RunOpts struct {
    Stdin           io.Reader
    Stdout, Stderr  io.Writer
    Env             []string         // KEY=VALUE pairs
    WorkDir         string           // best-effort; some backends ignore (k8s)
    TTY             bool             // request PTY where supported
    Files           map[string][]byte // files materialized at exec time
    Credentials     *Credentials     // routed via PRD 04's per-backend mechanism
}

type Credentials struct {
    KubeconfigBytes []byte
    IBMCloudAPIKey  string
}

All four implementations satisfy this interface. Call sites in cli/cluster.go, cli/test.go, etc., get a Backend from the registry and call Run(...) — no branching on backend type. The uniformity is what makes the system extensible without rewriting callers each time a backend lands.

The Credentials struct is the bridge between the resolver chain (env → keychain → config-b64 → prompt) covered in Chapter 14 and the per-backend propagation rules in PRD 04. Each backend translates the struct into the mechanism appropriate to where it runs: env vars for local, --env KEY (no =value) for docker, secretKeyRef for k8s, SetEnv or wrapper script for ssh.

Cross-references