The in-cluster ops pod
The k8s execution backend has two execution patterns: a long-lived ops pod for ad-hoc commands, and one-shot Jobs for throughput tests, DNS probes, and other per-invocation workloads. Chapter 17 §“K8s backend” covered the interface mechanics — how Backend.Run dispatches into either pattern.
This chapter is the reference for the pod itself: what roksbnkctl ops install deploys, what RBAC it grants, where credentials live, how to rotate them, and how to debug when something goes wrong.
If you’ve never run roksbnkctl ops install, you can read this chapter front-to-back; otherwise the § Operability section near the end is the troubleshooting jump-off point.
What the ops pod is
A long-lived pod in the roksbnkctl-ops namespace, running an image bundled with the tools roksbnkctl may want to invoke cluster-side: ibmcloud CLI plus kubectl as a fallback, with oc and terraform reserved for future iterations.
The pod sits idle waiting for kubectl exec calls. Each roksbnkctl ibmcloud --backend k8s … invocation routes through client-go’s SPDY executor, runs the wrapped tool inside the existing pod, streams stdout/stderr back, and returns the exit code. No pod create/start latency between invocations — a session of twenty ibmcloud commands pays the startup cost once.
Compared to the one-shot Job pattern (used for iperf3 and the DNS probe), the ops pod trades a bit of resource-usage idle-state for substantially lower per-call latency. It’s the right shape when you want to debug interactively or run many small commands.
roksbnkctl ops install
Idempotent setup. Run once per cluster; re-run any time you want to refresh the image, rotate the API key Secret, or recover from a partial uninstall.
roksbnkctl ops install
What it does, step by step:
1. Create the namespace
apiVersion: v1
kind: Namespace
metadata:
name: roksbnkctl-ops
labels:
app.kubernetes.io/name: roksbnkctl
app.kubernetes.io/component: ops-pod
The roksbnkctl-ops namespace is dedicated to the long-lived pod. Separate from roksbnkctl-test (where one-shot Jobs run) so RBAC can be scoped per namespace — see § RBAC below.
2. Create the ServiceAccount
apiVersion: v1
kind: ServiceAccount
metadata:
name: roksbnkctl-ops
namespace: roksbnkctl-ops
The pod runs as this SA. Its projected token is auto-mounted at /var/run/secrets/kubernetes.io/serviceaccount/, which is what the bundled kubectl uses for in-cluster authentication. The IBM Cloud API key (a separate credential) reaches the pod through a Kubernetes Secret — see § Credential propagation below.
3. Create the ClusterRole + ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: roksbnkctl-ops
rules:
- apiGroups: [""]
resources: ["pods", "pods/exec", "pods/log"]
verbs: ["get", "list", "watch", "create", "delete"]
- apiGroups: ["batch"]
resources: ["jobs"]
verbs: ["get", "list", "watch", "create", "delete"]
- apiGroups: [""]
resources: ["secrets"]
verbs: ["get", "list"]
resourceNames: ["roksbnkctl-ibm-creds"]
- apiGroups: [""]
resources: ["services"]
verbs: ["get", "list", "create", "delete"]
- apiGroups: ["apps"]
resources: ["deployments"]
verbs: ["get", "list", "create", "delete"]
- apiGroups: [""]
resources: ["namespaces"]
verbs: ["get", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: roksbnkctl-ops
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: roksbnkctl-ops
subjects:
- kind: ServiceAccount
name: roksbnkctl-ops
namespace: roksbnkctl-ops
The full manifest lives at internal/exec/k8s_install.yaml (embedded into the binary). § RBAC walks through what each rule is for.
4. Create or update the credential Secret
v1.2+ note. This step describes the static-key Secret applied under
--trusted-profile=offand under theauto-fallback when IAM perms don’t allow the trusted-profile path. Under--trusted-profile=autosuccess the Secret is still applied with empty data fields (placeholder soenvFrom: secretRefalways resolves); the cred propagation happens via the trusted-profile annotation on the SA + theIAM_PROFILE_IDenv var instead. See §“Trusted-profile flow (v1.2+)” below.
apiVersion: v1
kind: Secret
metadata:
name: roksbnkctl-ibm-creds
namespace: roksbnkctl-ops
annotations:
helm.sh/resource-policy: keep # don't sweep on accidental destroy
type: Opaque
stringData:
IBMCLOUD_API_KEY: <resolved-key-value>
The key value comes from the workspace’s resolver chain (env → keychain → config-b64 → prompt) — see Chapter 14 for the resolution order. The Secret carries two keys (IBMCLOUD_API_KEY and the legacy alias IC_API_KEY) both populated from the same resolved value, so older ibmcloud CLI versions that look for the IC_ name find it.
If the Secret already exists (re-running ops install after a key rotation), roksbnkctl does a client-side Get + Update: the Secret’s data is overwritten with the freshly resolved value, the roksbnkctl.io/rotated-at annotation is stamped with the current timestamp, and the rest of the Secret’s metadata is left untouched. roksbnkctl ops show surfaces last cred rotation: <timestamp> by reading that annotation.
5. Create the Pod
apiVersion: v1
kind: Pod
metadata:
name: roksbnkctl-ops
namespace: roksbnkctl-ops
labels:
app: roksbnkctl-ops
roksbnkctl.io/managed: "true"
spec:
serviceAccountName: roksbnkctl-ops
restartPolicy: Always
securityContext:
runAsNonRoot: true
seccompProfile:
type: RuntimeDefault
containers:
- name: tools
image: ${OPS_IMAGE} # resolved from roksbnkctl's version at install time
imagePullPolicy: IfNotPresent
command: ["sleep", "infinity"]
envFrom:
- secretRef:
name: roksbnkctl-ibm-creds
securityContext:
allowPrivilegeEscalation: false
runAsNonRoot: true
capabilities:
drop: ["ALL"]
resources:
requests: { cpu: 50m, memory: 64Mi }
limits: { cpu: 500m, memory: 256Mi }
Three details to call out:
command: ["sleep", "infinity"]— the pod’s own command. EachBackend.Runinvocation issues akubectl execagainst this idle process, which means the pod’s main process never exits as long as the pod is healthy.securityContextis set explicitly for OpenShift’srestricted-v2SCC. Pod-levelrunAsNonRoot+seccompProfile.type: RuntimeDefault; container-levelallowPrivilegeEscalation: false+capabilities.drop: [ALL]+runAsNonRoot— the same fields the iperf3 server pod sets, for the same reason.envFrom: secretRef— the API key reaches the pod’s env without ever touching the pod manifest’s argv orenv:block.kubectl describe pod roksbnkctl-opsshows the secret reference name but not the value, per PRD 04 §“In-cluster pod”.
6. Wait for readiness
roksbnkctl ops install waits for Pod.Status.Phase == Running and the container’s Ready condition before returning. Default timeout is 60 seconds; longer for clusters with slow image pulls (the ghcr.io image is ~80 MiB). Failures surface a kubectl describe pod roksbnkctl-ops excerpt for context.
Trusted-profile flow (v1.2+)
The static-key Secret described above is the v1.0.x / v1.1.x path. In v1.2.0 it becomes the fallback: the default ops install invocation auto-provisions an IBM Cloud IAM trusted profile linked to the ops pod’s ServiceAccount, and the static API key no longer needs to land in any Kubernetes Secret. PRD 04 §“Resolved in Sprint 9” → “Trusted-profile auto-provisioning (k8s backend)” is the design reference; this section is the operational walkthrough.
v1.3.0 closes both the provisioning and the runtime sides of this flow — ops install --trusted-profile=auto provisions the profile, and the in-pod ibmcloud login wrap detects the SA’s trusted-profile annotation and authenticates via the projected SA token at runtime. The v1.2.x partial-closure history (provisioning shipped, runtime deferred) is preserved in CHANGELOG v1.3.0 → ### Changed for readers who specifically want the chronology.
roksbnkctl ops install --trusted-profile=auto
--trusted-profile=auto is the default as of v1.2 — running roksbnkctl ops install with no flag picks the auto path. Naming the flag explicitly is useful in scripts that pin behaviour or in docs that want to read unambiguously:
$ roksbnkctl ops install --trusted-profile=auto
✓ Provisioned IAM trusted profile roksbnkctl-ops-canada-roks (iam-Profile-9f2…)
✓ created namespace roksbnkctl-ops
✓ created sa roksbnkctl-ops/roksbnkctl-ops
✓ created secret roksbnkctl-ops/roksbnkctl-ibm-creds
✓ created clusterrole roksbnkctl-ops
✓ created crb roksbnkctl-ops
✓ created pod roksbnkctl-ops/roksbnkctl-ops
→ Waiting for ops pod to be Ready (60s timeout)
✓ Ops pod is Ready (trusted profile roksbnkctl-ops-canada-roks)
Re-runs against an existing install emit updated <kind> … / <kind> … exists instead of created for each resource that already matches the desired state. The trusted-profile provisioning line above is the single line internal/cli/ops.go emits for the whole IBM IAM-side flow (perm probe + profile create + compute-resource link + SA annotation) — the work happens silently inside resolveTrustedProfileForInstall; the one line you see is the receipt.
What just happened, in order (the binary doesn’t narrate these steps but they’re what’s actually going on):
- IAM perm probe.
ops installcalls IBM IAM Identity to confirm the resolved API key hasiam-identityperms. On403, the flag value drives the next step:autofalls back to the static-key Secret with a warning (see §“--trusted-profile=autofalling back” below);onerrors out with a non-zero exit. - Profile creation. Names the profile
roksbnkctl-ops-<workspace>so multiple workspaces against the same IBM Cloud account don’t race for a single shared name. The compute-resource link binds the profile to your cluster’s OIDC issuer URL + theroksbnkctl-ops/roksbnkctl-opsServiceAccount specifically — other SAs on the same cluster can’t assume the profile. - Policy attachment. v1.2 ships with no default policies attached — the profile inherits whatever IAM policies your account has set up for trusted profiles in general (typically nothing, until you grant). A future cycle will surface
ibmcloud.trusted_profile.policiesas a workspace-config block; tracked under v1.x deferred. If you need the profile to actually authorise specific actions (Container Registry pulls, Cloud Object Storage reads), grant the policies via IBM Cloud Console oribmcloud iam trusted-profile-policy-createafterops installreturns. - SA annotation. The ServiceAccount gets
iam.cloud.ibm.com/trusted-profile: roksbnkctl-ops-<workspace>plus theroksbnkctl.io/trusted-profile-managed: "true"marker that signalsops uninstallto delete the profile during cleanup. - Pod creation. The pod’s container always has
envFrom: secretRef: roksbnkctl-ibm-creds; what changes between modes is the Secret’s contents. Under--trusted-profile=autosuccess the Secret is created with empty data —IBMCLOUD_API_KEYis the empty string — plus an extraIAM_PROFILE_IDenv var pointing at the provisioned profile’s ID, and a projected ServiceAccount-token volume mounted at/var/run/secrets/tokens/token(audienceiam) so the pod has a cluster-issued JWT the IBM IAM endpoint will accept. The in-podibmcloud loginwrap detects the SA’s trusted-profile annotation and runsibmcloud login -a https://cloud.ibm.com --cr-token @/var/run/secrets/tokens/token --profile "$IAM_PROFILE_ID" -r "${IBMCLOUD_REGION:-us-south}" --quiet— the--cr-token @<path>form reads the projected SA token from disk; IBM IAM validates that JWT against the trusted profile’sROKS_SAcompute-resource link (the linkinternal/ibm/trusted_profile.go::ensureLinkprovisions). The static API key never transits the pod env. Under--trusted-profile=off(or theauto-fallback) the Secret carries the resolved API key, no projected token volume is mounted, and the wrap runsibmcloud login --apikey "$IBMCLOUD_API_KEY"— the v1.0.x path.
Verifying the profile is in use
The ServiceAccount carries the truth-of-record annotation:
$ oc get serviceaccount roksbnkctl-ops -n roksbnkctl-ops -o yaml
# or, kubectl-equivalent via the bundled passthrough:
$ roksbnkctl k get sa roksbnkctl-ops -n roksbnkctl-ops -o yaml
apiVersion: v1
kind: ServiceAccount
metadata:
annotations:
iam.cloud.ibm.com/trusted-profile: Profile-ccba11f2-3b1f-4b1a-b8a4-aeed2b7b3320 # ← the IBM IAM Profile ID
roksbnkctl.io/trusted-profile-managed: "true" # ← ops uninstall will delete it
roksbnkctl.io/provisioned-at: "2026-05-13T14:08:33Z"
name: roksbnkctl-ops
namespace: roksbnkctl-ops
End-to-end smoke test of the runtime cred flow:
$ roksbnkctl --backend k8s ibmcloud iam oauth-tokens
IAM token: Bearer eyJ…
The token is fresh-each-call: the in-pod wrap detects the SA’s iam.cloud.ibm.com/trusted-profile annotation, trades the pod’s projected SA token for an IAM token against the trusted profile, and returns it to the caller. No static API key transits the pod env.
The first invocation may take 30–60 seconds after ops install returns because IBM IAM needs to pick up the cluster’s OIDC issuer URL before it will accept the projected SA token as proof for the profile. The wrap absorbs this with a 3-attempt × 20s-backoff retry — up to ~40s of waiting inside the wrap before it surfaces the failure. On triple-fail the wrap prints trusted-profile login failed after 3 attempts: <captured-stderr> to your terminal (the captured stderr will include the underlying ibmcloud login diagnostic — typically the “Unable to authenticate” / FAILED banner shape). If your first smoke test produces that line, give IAM a few more seconds and re-run. After the first successful call the wrap’s auth state is cached for the pod’s lifetime, so subsequent roksbnkctl --backend k8s ibmcloud <subcommand> invocations don’t re-pay the propagation window.
--trusted-profile=auto falling back
auto falls back to the v1.0.x static-key Secret when any of three pre-conditions for trusted-profile provisioning aren’t met. The warning prints first (the fallback decision is made before any cluster-side resource is applied), then the rest of the install proceeds with the v1.0.x static-key shape:
$ roksbnkctl ops install
warning: IAM perm 'iam-identity' missing; using static-key Secret. Pass `--trusted-profile=off` to silence.
✓ created namespace roksbnkctl-ops
✓ created sa roksbnkctl-ops/roksbnkctl-ops
✓ created secret roksbnkctl-ops/roksbnkctl-ibm-creds
✓ created clusterrole roksbnkctl-ops
✓ created crb roksbnkctl-ops
✓ created pod roksbnkctl-ops/roksbnkctl-ops
→ Waiting for ops pod to be Ready (60s timeout)
✓ Ops pod is Ready (static-key Secret)
The three warning shapes (in source order — internal/cli/ops.go resolveTrustedProfileForInstall):
| Trigger | Warning |
|---|---|
Workspace has no registered cluster yet (cluster-outputs.json missing — run cluster up or cluster register first) | warning: trusted-profile mode 'auto' needs a registered cluster (<err>); falling back to static-key Secret. Pass –trusted-profile=off to silence. |
| Registered cluster lookup against the IBM Cloud API failed (network, key auth, cluster deleted out-of-band) | warning: trusted-profile mode 'auto' couldn't look up cluster (<err>); falling back to static-key Secret. Pass –trusted-profile=off to silence. |
API key lacks IAM iam-identity permission (the most common fallback) | warning: IAM perm 'iam-identity' missing; using static-key Secret. Pass –trusted-profile=off to silence. |
All three are non-fatal; the install completes and the pod works exactly as it did in v1.0.x. The warnings are terse on purpose — the actionable detail belongs in this chapter, not in every stderr line. Three ways to clear them permanently:
- Run
cluster uporcluster registerfirst if the warning names the missing cluster registration.ops installre-run after the registration completes will detect the cluster and switch to the trusted-profile path. - Ask your IAM admin to grant
iam-identityOperator role on the API key (or use a different key that already has it) if the warning names the missing IAM perm. Re-runops install— the install detects the changed perm posture on re-run and replaces the static-key Secret with a trusted-profile binding. - Opt out via
--trusted-profile=off(next subsection) if you don’t want the warning every install.
--trusted-profile=off
Explicit opt-out. Skips the IAM perm check entirely and provisions the v1.0.x static-key Secret:
$ roksbnkctl ops install --trusted-profile=off
✓ created namespace roksbnkctl-ops
✓ created sa roksbnkctl-ops/roksbnkctl-ops
✓ created secret roksbnkctl-ops/roksbnkctl-ibm-creds
✓ created clusterrole roksbnkctl-ops
✓ created crb roksbnkctl-ops
✓ created pod roksbnkctl-ops/roksbnkctl-ops
→ Waiting for ops pod to be Ready (60s timeout)
✓ Ops pod is Ready (static-key Secret)
Use cases:
- Reproducing v1.0.x behaviour exactly — for byte-for-byte parity tests against an older deployment, or for scripts whose assertions match the v1.0.x
ops showoutput verbatim. - Air-gapped clusters that can’t reach the IBM IAM API at runtime — without that connectivity, the pod can’t trade its projected SA token for an IAM token, so the trusted-profile path is non-functional regardless of perms.
- Cred rotation runbooks that already automate the static-key path and aren’t yet ready to switch to the projected-token model.
The third value, --trusted-profile=on, is the inverse — it forces the trusted-profile path and refuses to fall back on perm-missing, returning a non-zero exit with the same warning text. Use it in CI to surface IAM-perm regressions explicitly.
Cleanup on ops uninstall
roksbnkctl ops uninstall --confirm honors the roksbnkctl.io/trusted-profile-managed: "true" annotation on the SA and deletes the IBM Cloud trusted profile alongside the cluster-side objects:
$ roksbnkctl ops uninstall --confirm
✓ deleted trusted profile roksbnkctl-ops-canada-roks
✓ deleted pod roksbnkctl-ops
✓ deleted secret roksbnkctl-ibm-creds
✓ deleted serviceaccount roksbnkctl-ops
✓ deleted clusterrolebinding roksbnkctl-ops
✓ deleted clusterrole roksbnkctl-ops
✓ deleted namespace roksbnkctl-ops
✓ deleted namespace roksbnkctl-test
The trusted profile is deleted first, before the cluster-side objects, so even if the cluster API becomes unreachable mid-uninstall the IBM Cloud-side state isn’t left orphaned. The Secret is always deleted regardless of mode (it’s always rendered by ops install, just with empty data under --trusted-profile=auto success).
The trusted-profile delete is best-effort — if the calling user’s API key has lost iam-identity perms in the meantime (or the key itself has rotated and the new key doesn’t have those perms), the cluster-side objects still delete and a warning line is printed instructing the user to delete the profile manually via the IBM Cloud console. The annotation remains correct documentation of what was provisioned; roksbnkctl ops install on a fresh cluster will pick a fresh profile name unconditionally so an orphaned profile from a prior install doesn’t collide.
--trusted-profile=off installs leave no trusted profile to clean up — ops uninstall --confirm just deletes the cluster-side objects + the static-key Secret as it did in v1.0.x.
roksbnkctl ops show
Reports current state without making any changes:
$ roksbnkctl ops show
namespace: roksbnkctl-ops
pod: roksbnkctl-ops
phase: Running
ready: true
image: ghcr.io/jgruberf5/roksbnkctl-tools-ibmcloud:v0.9.0
rbac subject: system:serviceaccount:roksbnkctl-ops:roksbnkctl-ops
trusted-profile: Profile-ccba11f2-3b1f-4b1a-b8a4-aeed2b7b3320
secret: roksbnkctl-ibm-creds (rotated 2026-05-10T11:03:17Z)
What each line surfaces:
- Pod phase + readiness —
Running+trueis green; anything else means the pod is unhealthy andBackend.Runcalls will fail. The container count is exactly one (tools);ready: trueis a single bool, not a2/2-style ratio. - Image — the
:v…tag matches theroksbnkctlrelease the image was published with (resolved at install time from the binary’s version; see Chapter 17 §:devtag resolution). Mismatched against yourroksbnkctl --versionmeans re-runningops installwill pull the matching image. - RBAC subject — the SA the pod runs as.
kubectl describe clusterrole roksbnkctl-opsprints the full ruleset (the ClusterRoleBinding is named the same as the role). - Trusted-profile line — reads the SA’s
iam.cloud.ibm.com/trusted-profileannotation. The value is the IBM IAM Profile ID (Profile-<uuid>, the canonical IAM identifier —runOpsInstallannotates withtp.IDrather than the friendlyroksbnkctl-ops-<workspace>name so the value is grep-friendly against IBM Cloud IAM audit logs; the parenthetical form on the✓ Provisioned IAM trusted profile …install line cross-references both). Present + non-empty meansops installran with--trusted-profile=autoor=onand the provisioning succeeded; the runtime cred flow is going via the projected SA token. Under--trusted-profile=off(orauto-fallback when IAM perms were missing) the line readstrusted-profile: (none — static-key Secret path). - Secret line — the cred Secret’s name + the
roksbnkctl.io/rotated-atannotation thatops installstamps each time the Secret is applied. Always emitted (the Secret manifest is always rendered — empty data under trusted-profile success, populated under the static-key paths). If the Secret resource is missing entirely on the cluster, the line readssecret: (missing: …).
The current output is a fixed seven-line key/value block; a structured --output json mode is on the v1.x roadmap once ops show grows additional fields (image-id hash, env-hash reconciliation against the live pod, etc.). See docs/PLAN.md §“What’s deliberately deferred to post-v1.0”.
roksbnkctl ops uninstall
Full removal. Run when decommissioning the cluster, or when you want a clean re-install. The command is a destructive-action gate: by default it prints a preview of what would be deleted and exits successfully; the actual deletion only runs with --confirm.
$ roksbnkctl ops uninstall
Would delete (re-run with --confirm to proceed):
- Pod roksbnkctl-ops/roksbnkctl-ops
- Secret roksbnkctl-ops/roksbnkctl-ibm-creds
- ServiceAccount roksbnkctl-ops/roksbnkctl-ops
- ClusterRole/ClusterRoleBinding roksbnkctl-ops
- Namespace roksbnkctl-ops
- Namespace roksbnkctl-test
$ roksbnkctl ops uninstall --confirm
✓ deleted pod roksbnkctl-ops
✓ deleted secret roksbnkctl-ibm-creds
✓ deleted serviceaccount roksbnkctl-ops
✓ deleted clusterrolebinding roksbnkctl-ops
✓ deleted clusterrole roksbnkctl-ops
✓ deleted namespace roksbnkctl-ops
✓ deleted namespace roksbnkctl-test
Note that the cluster-scoped objects (ClusterRole, ClusterRoleBinding) get cleaned too — they’re not garbage-collected by namespace deletion since they live above the namespace. roksbnkctl ops uninstall --confirm makes this explicit so a stale roksbnkctl-ops ClusterRole can’t outlive a namespace removed via kubectl delete ns.
Both managed namespaces (roksbnkctl-ops and roksbnkctl-test) are deleted. The roksbnkctl-test namespace is where one-shot Job pods (iperf3 client, future probes) land — by the time you’re running uninstall you’ve already concluded those test workloads are finished, so removing the namespace alongside the ops-pod surface keeps the cluster clean.
When to run uninstall:
- Cluster decommission — the cluster is going away, clean up cluster-scoped objects before destroying it.
- Cred rotation when paranoid — the rotation story (next section) doesn’t require uninstall, but if you’re worried about old secrets persisting in etcd snapshots, an uninstall + re-install regenerates the Secret cleanly.
- Image upgrade with a major manifest change — if the embedded
k8s_install.yamlevolves (new RBAC rule, security-context tweak),uninstall+installis the cleanest way to apply.
RBAC: the ClusterRole rules
The full ClusterRole rule set (transcribed from internal/exec/k8s_install.yaml):
| API group | Resources | Verbs | Why |
|---|---|---|---|
batch | jobs | get, list, watch, create, delete | One-shot Job lifecycle (iperf3 client, future probes). The backend creates the Job, watches it, reads logs, deletes it. |
"" (core) | pods | get, list, watch | The backend lists/watches pods to find Job-spawned pods + to wait for the ops pod’s Ready state. No create/delete — pods are owned by their Jobs (or by ops install’s user-side privilege), not by the pod’s SA. |
"" (core) | pods/log | get, list | Log streaming from one-shot Job pods (the bytes the wrapped tool wrote to stdout/stderr). |
"" (core) | pods/exec | create, get | kubectl exec is a create against the pods/exec subresource — the SPDY-channel verb the long-lived ops-pod path uses. |
"" (core) | secrets (named roksbnkctl-ibm-creds) | get | The pod reads the cred Secret directly only if a future workflow opts to (kubelet’s projection of envFrom: secretRef runs as kubelet, not as this SA). The resourceNames filter keeps the SA from reading any other Secret in the namespace — least-privilege per PRD 04 §“In-cluster pod”. |
Notably not granted:
podscreate / delete — the SA can list and watch pods but can’t create or delete them. Pod lifecycle is mediated by Jobs (which the SA does manage) and by the ops pod itself (whichops installcreates with the user’s privilege, not the SA’s).secretscreate / update / delete / list — the pod never writes Secrets, and can’t even list to discover which Secrets exist in the namespace. The install-time Secret creation is done by the user invokingops install(whose kubeconfig has cluster-admin or comparable), not by the pod’s SA. Combined with theresourceNamesfilter onget, this is the tightest practical surface that still lets the pod consume its own cred.services,deployments,namespaces— the SA can’t touch these at all. The iperf3 server fixture (when the throughput test runs) is provisioned byroksbnkctl test throughputrunning on the caller’s side using the user’s kubeconfig, not by anything inside the ops pod.clusterroles,clusterrolebindings— the pod never modifies its own RBAC.*cluster-admin — explicitly avoided. The pod has exactly the verbs it needs and nothing else.
This matches PRD 04 §“Least privilege per backend” and PRD 03 §“K8s”: the ops pod is a powerful tool but its blast radius is bounded.
To audit the rules on a running cluster:
kubectl describe clusterrole roksbnkctl-ops
kubectl auth can-i --as=system:serviceaccount:roksbnkctl-ops:roksbnkctl-ops \
'*' '*' --all-namespaces # should print mostly "no"
Credential propagation
v1.2+ note. What follows is the static-key propagation path. As of v1.2 it’s the fallback rather than the default —
--trusted-profile=autoinstalls assume an IBM Cloud trusted profile via the pod’s projected SA token and the static API key never lands in a Kubernetes Secret. See §“Trusted-profile flow (v1.2+)” above for that path. The hop-by-hop description below still describes what happens under--trusted-profile=off(and under theauto-fallback when IAM perms don’t allow the trusted-profile path).
The IBMCLOUD_API_KEY reaches the wrapped tool in three hops:
resolver chain (env → keychain → config-b64 → prompt)
↓ on the laptop, at `roksbnkctl ops install` time
Kubernetes Secret roksbnkctl-ibm-creds in roksbnkctl-ops namespace
↓ applied by `ops install` via kubectl-equivalent
Pod env (IBMCLOUD_API_KEY=…) via `envFrom: secretRef`
↓ kubelet reads Secret, sets env on container start
Wrapped tool (`ibmcloud iam oauth-tokens`) reads from os.Getenv
Three properties this gives you:
- The key never appears in argv.
kubectl describe pod roksbnkctl-opsshowsenvFrom: secretRef: name: roksbnkctl-ibm-creds, not the value.kubectl get pod roksbnkctl-ops -o yamlshows the same. - The key never appears in the pod’s own logs. The wrapped tool uses the env var; the env var name (not value) is what the pod’s startup logs print.
- The redactor is the defense-in-depth backstop. If the wrapped tool ever prints the value (e.g.,
ibmcloud --debug), the SPDY stream from the pod is wrapped throughinternal/exec/redact.gobefore reaching the caller’s stdout — same as the local + docker backends.
The Secret carries two keys today — IBMCLOUD_API_KEY and the legacy IC_API_KEY alias older ibmcloud versions accept — both populated from the same resolved value. Names are stable; embedded in internal/exec/k8s_install.yaml. Future cluster-side credentials (an AWS access key, a GCP service-account JSON) will add new keys to the same Secret rather than spinning up new Secrets, simplifying RBAC.
Rotation: rotating the API key
v1.2+ note. Under
--trusted-profile=auto/=on(default), there’s nothing to rotate — the ops pod’s IAM tokens are short-lived and the IBM IAM endpoint refreshes them transparently each time the SDK trades the projected SA token. Key rotation only matters when the install ran with--trusted-profile=offor fell back to the static-key Secret because the resolved key lackediam-identityperms. The procedure below covers that static-key case.
When the IBM Cloud API key changes (key rotation, account takeover, key compromise), you need to update the cluster-side Secret. The flow:
# 1. Update the local resolver chain — pick whichever source you populated
# initially (the chain order is: env > keychain > config-b64; see chapter 14):
export IBMCLOUD_API_KEY=<new-key> # env (one-shot)
# or update the keychain entry directly: `keyring` / `secret-tool` / Keychain.app
# or edit ~/.roksbnkctl/<workspace>/config.yaml's api_key_b64 field
# 2. Re-run ops install — this re-resolves the key, updates the cluster
# Secret, and rolls the pod
roksbnkctl ops install
What ops install does on re-run:
- The Secret
roksbnkctl-ibm-credsis updated with the new value via a client-side Get + Update (the existing Secret’sdatais overwritten, theroksbnkctl.io/rotated-atannotation is refreshed, the rest of the metadata is left alone). - The pod’s env, however, is set at container-start time — kubelet reads the Secret value when the pod is created, not on every Secret update. So an updated Secret doesn’t propagate to the running pod’s env until the pod is recreated.
ops installtherefore deletes and recreates the bare ops pod after the Secret update. New pod → kubelet reads the updated Secret → env contains the new value. (Re-creation takes a few seconds for the image cache hit; up to ~30 seconds on a cold cluster.)
The ops pod is a bare Pod, not a Deployment or DaemonSet, so kubectl rollout restart won’t work on it (rollout restart only operates on controller resources). The canonical way to force a fresh pod is roksbnkctl ops install (idempotent — it’ll handle the delete-and-recreate). If you really want to do it by hand:
kubectl delete pod roksbnkctl-ops -n roksbnkctl-ops
# then re-run `roksbnkctl ops install` to create the replacement;
# the bare pod has no controller, so nothing else will recreate it.
roksbnkctl ops show will report phase: Pending briefly during recreation, then Running + ready: true once kubelet finishes projecting the updated Secret.
Operability
Things to know when something’s wrong.
Where pod logs go
roksbnkctl k logs -n roksbnkctl-ops roksbnkctl-ops
# or
kubectl logs -n roksbnkctl-ops roksbnkctl-ops
The pod’s main process is sleep infinity, so the log is mostly empty. Each kubectl exec invocation runs in its own ephemeral process — those processes’ stdout/stderr go back through the SPDY channel to the caller, not into the pod’s log. So kubectl logs is helpful for debugging pod startup (image pull failures, SCC denials, OOMKills) but not for “what did ibmcloud iam oauth-tokens actually print” — that’s just the caller’s stdout.
For a paper trail of recent invocations, capture roksbnkctl ibmcloud --backend k8s … 2>&1 | tee /tmp/ibmcloud.log on the calling side.
Debugging a stuck ops install
ops install waits up to 60 seconds for the pod to become Ready. If it times out:
roksbnkctl k describe -n roksbnkctl-ops pod/roksbnkctl-ops
roksbnkctl k get -n roksbnkctl-ops events --sort-by=.lastTimestamp | tail -20
Common causes:
| Symptom | Cause | Fix |
|---|---|---|
ImagePullBackOff | ghcr.io rate limit, or image tag doesn’t exist | check roksbnkctl --version, ensure ghcr.io is reachable from the cluster |
CreateContainerConfigError referencing the Secret | Secret was deleted between Secret apply and Pod create (race) | re-run roksbnkctl ops install (idempotent) |
RunContainerError with SCC denial | the cluster’s PodSecurity admission rejected the manifest | kubectl get events will name the missing field; usually means an OpenShift cluster expects the restricted-v2 profile and a manifest field is wrong — file an issue with the event message |
Pod stuck in Pending with no Events | cluster is at capacity / out of CPU | scale the cluster or trim resources; the pod requests 50m CPU + 128Mi mem, very small |
Cluster API outage during ops install
If the kube-apiserver becomes unreachable mid-install (transient cloud-provider issue, kubeconfig expired, network partition), ops install fails fast at whichever step hit the apiserver:
✓ applied namespace roksbnkctl-ops
applying secret roksbnkctl-ops/roksbnkctl-ibm-creds ... ERROR: Get "https://...": dial tcp: i/o timeout
The install is partial at that point — earlier steps succeeded, later steps didn’t. ops install is idempotent, so just re-run once the apiserver is back; the steps that already completed are no-ops the second time, the steps that didn’t will run.
If the apiserver is permanently gone (cluster destroyed): ops uninstall will fail the same way, since it also needs the apiserver. In that case the cluster-scoped objects (ClusterRole, ClusterRoleBinding) become orphans you can clean up manually if you ever rebuild the cluster, or ignore if you’re done with this cluster’s identity entirely.
Verifying the install end-to-end
A one-liner sanity check:
roksbnkctl ibmcloud --backend k8s iam oauth-tokens
If the SA/Secret/RBAC/Pod chain is healthy, this prints a fresh OAuth token. If it errors, the error message names which link in the chain broke (pod not found, Secret missing, exec denied, ibmcloud CLI exit non-zero).
Chapter 26 — Troubleshooting covers the broader “ops pod is unhappy” failure modes alongside other end-user troubleshooting.
Cross-references
- PRD 03 — pluggable execution backends, §“K8s” — the ops-pod design rationale.
- PRD 04 — credential propagation, §“In-cluster pod” — Secret-based propagation rules.
- Chapter 14 — Credentials and the resolver chain — where the
IBMCLOUD_API_KEYvalue comes from before it lands in the Secret. - Chapter 17 §“K8s backend” — the interface mechanics this chapter complements.
- Chapter 18 — Choosing a backend per tool — when
--backend k8sis the right call. internal/exec/k8s_install.yaml— the embedded RBAC manifests: https://github.com/jgruberf5/roksbnkctl/blob/main/internal/exec/k8s_install.yamlinternal/cli/ops.go— theroksbnkctl ops install/show/uninstallcommand implementation: https://github.com/jgruberf5/roksbnkctl/blob/main/internal/cli/ops.go