Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

The cluster phase (cluster up/down)

A roksbnkctl workspace is two phases on top of each other: a durable cluster phase (the ROKS cluster + cluster-shared services that take 30+ minutes to provision) and a short-lived trial phase (the BNK trial that iterates on top in 5-10 minutes). The cluster phase is exposed as its own command pair, roksbnkctl cluster up / roksbnkctl cluster down, so the cluster survives across many BNK trial cycles.

As of v1.1.0, this two-phase shape is the default for every new workspace. A fresh roksbnkctl up provisions the cluster phase first, then the trial phase, against separate state directories. Tearing down only the trial — the common iteration case — uses roksbnkctl bnk down and leaves the cluster intact. The unscoped up / down verbs are now shape-aware composites that delegate to the right phase commands underneath.

Workspaces created against v1.0.x that have cluster modules and trial modules in the same terraform.tfstate (the legacy single-state shape) keep working — roksbnkctl up and down continue to operate against them in-place, byte-for-byte the way they did in v1.0. See § Legacy single-state workspaces at the bottom of the chapter to identify which shape a workspace is.

This chapter covers what each phase deploys, why the two state directories are separate, the deploy_bnk=false override that makes “cluster only” work, the cluster-outputs.json artefact written on success, a worked example, and the legacy single-state shape. The companion BNK-trial chapter, Chapter 10, covers roksbnkctl bnk up / bnk down for the trial layer.

What’s deployed where

The bundled HCL has roughly two halves. The cluster phase owns the durable, cluster-scoped resources:

  • The ROKS cluster itself (VPC + subnets + worker pool)
  • A transit gateway (so the test jumphost can reach cluster internals)
  • The registry COS (Cloud Object Storage) instance — used by the BNK trial as its FAR image / license / schematic store
  • cert-manager (Helm release into the cluster)
  • The TGW jumphost VM (an Ubuntu VSI in the same VPC, used by --on jumphost)

The trial phase owns the BNK-specific resources:

  • F5 Lifecycle Operator (flo) Helm release
  • cne_instance Kubernetes manifest
  • BNK license + admin certs
  • Various cluster-side bits: ServiceAccounts, RoleBindings, Secrets

Two-phase split: cluster up provisions the first list; roksbnkctl up (the trial) provisions the second.

┌─────────────────────────────────────────────────────────┐
│  cluster phase (durable, reused across many trials)     │
│    ROKS cluster + VPC + transit gateway                 │
│    registry COS instance                                │
│    cert-manager (Helm)                                  │
│    TGW jumphost                                         │
├─────────────────────────────────────────────────────────┤
│  trial phase (one trial — destroyed by `roksbnkctl down`)│
│    flo (F5 Lifecycle Operator)                          │
│    cne_instance                                         │
│    license / admin cert / SCC bindings                  │
└─────────────────────────────────────────────────────────┘

The split exists because ROKS clusters take 30-50 minutes to provision and roughly $0.30/hour to run. Re-creating the cluster every time you want to re-test a BNK trial is wasteful; reusing one cluster for many trials cuts iteration time from “an hour” to “a few minutes”.

The two state directories

To keep cluster state and trial state from tangling, roksbnkctl uses separate Terraform state directories:

~/.roksbnkctl/<workspace>/
  state/                   # BNK trial state — written by `roksbnkctl up/down`
    terraform.tfstate
    terraform.tfvars
  state-cluster/           # cluster phase state — written by `roksbnkctl cluster up/down`
    terraform.tfstate
    cluster-phase-override.tfvars

Each phase’s commands read and write only their own state directory. Both phases use the same Terraform source (the bundled HCL) but with different effective tfvars — the trick is the deploy_bnk flag.

The deploy_bnk=false override

The bundled HCL has a top-level deploy_bnk boolean. When true, the BNK trial modules (flo, cne_instance, license) run; when false, they’re skipped and Terraform only provisions the cluster-phase resources.

roksbnkctl cluster up and roksbnkctl cluster down force deploy_bnk = false by writing a small auto-generated tfvars override into the cluster state directory:

# ~/.roksbnkctl/<workspace>/state-cluster/cluster-phase-override.tfvars
# Generated by roksbnkctl. Do not edit by hand.
# Cluster-phase override: BNK trial modules (flo / cne_instance /
# license) are skipped. cert-manager and the testing jumphost still run
# — they're cluster-shared singletons that belong with the cluster.
deploy_bnk = false

This file is layered onto the var-file chain after user-supplied --var-file flags so the override always wins. The user’s terraform.tfvars and --var-file <path> arguments still apply for everything else (region, RG, cluster name, worker count, …) — only deploy_bnk is forced.

roksbnkctl up doesn’t write this override file; its tfvars chain leaves deploy_bnk at the upstream default (true), so the trial modules run.

cluster-outputs.json — the cluster identity record

When roksbnkctl cluster up apply succeeds, it reads the relevant Terraform outputs (cluster name, ID, region, RG, VPC, registry COS) and writes them to a workspace-scoped JSON file:

~/.roksbnkctl/<workspace>/cluster-outputs.json

Sample contents:

{
  "cluster_name": "bnk-quickstart",
  "cluster_id": "cre6h4l20jjsg4kvt3a0",
  "region": "us-south",
  "resource_group_id": "abc123...",
  "vpc_id": "r006-...",
  "registry_cos_crn": "crn:v1:bluemix:public:cloud-object-storage:global:a/...",
  "registry_cos_name": "bnk-quickstart-cos-instance",
  "master_url": "https://c106.us-south.containers.cloud.ibm.com:31415",
  "openshift_version": "4.14_openshift",
  "source": "cluster-up",
  "recorded_at": "2026-05-08T14:22:08Z"
}

The source field discriminates between cluster-up (we created it) and cluster-register (we discovered an existing cluster — see Chapter 9). Subsequent commands read this file to learn the workspace’s cluster identity without hitting IBM APIs.

roksbnkctl cluster down deletes the file as part of its post-destroy cleanup. roksbnkctl cluster show pretty-prints it for human readers:

roksbnkctl cluster show
workspace:        default
source:           cluster-up
recorded_at:      2026-05-08T14:22:08Z

cluster_name:     bnk-quickstart
cluster_id:       cre6h4l20jjsg4kvt3a0
region:           us-south
resource_group:   abc123...
openshift:        4.14_openshift
master_url:       https://c106.us-south.containers.cloud.ibm.com:31415

vpc_id:           r006-...
registry_cos:     bnk-quickstart-cos-instance
registry_cos_crn: crn:v1:bluemix:public:cloud-object-storage:global:a/...

Worked example: cluster up → kubectl get nodes → cluster down

The cluster-only flow, end to end:

Step 1 — roksbnkctl init

If you don’t have a workspace yet, initialise one. This is the same init flow as the trial path; the cluster commands reuse the workspace’s config.

roksbnkctl init

Step 2 — roksbnkctl cluster up --auto

Provisions the cluster phase only:

roksbnkctl cluster up --auto

Sample output (heavily abridged):

→ terraform plan (cluster phase: deploy_bnk=false forced)
→ Layering user tfvars from ~/.roksbnkctl/default/state-cluster/cluster-phase-override.tfvars (overrides config.yaml-derived values)
→ terraform init
→ terraform apply
  module.roks_cluster.ibm_container_vpc_cluster.cluster: Creating...
  module.roks_cluster.ibm_container_vpc_cluster.cluster: Still creating... [10m elapsed]
  ...
  module.roks_cluster.ibm_container_vpc_cluster.cluster: Creation complete after 38m12s
  module.cert_manager.helm_release.cert_manager: Creation complete after 2m11s
  module.testing.tls_private_key.jumphost_shared_key: Creation complete after 0s
  module.testing.ibm_is_instance.tgw_jumphost: Creation complete after 1m48s

  Apply complete! Resources: 36 added, 0 changed, 0 destroyed.

✓ Wrote ~/.roksbnkctl/default/cluster-outputs.json
✓ Wrote /home/you/.kube/config (chmod 0600)
✓ Auto-registered target jumphost (169.45.91.177); use `roksbnkctl --on jumphost ...`

Roughly 36 resources land — the cluster phase is about half the size of a full BNK trial. Time-to-ready is dominated by the ROKS cluster itself; everything else after the cluster comes up is fast.

Step 3 — verify the cluster works

The post-apply admin kubeconfig is fetched automatically (unless --no-kubeconfig). kubectl get nodes confirms reachability:

kubectl get nodes
# NAME           STATUS   ROLES           AGE   VERSION
# 10.243.0.4     Ready    master,worker   3m    v1.28.6+5e1b9a1
# 10.243.64.4    Ready    master,worker   3m    v1.28.6+5e1b9a1

Or, post-Sprint 2, the same thing through the internalised verb:

roksbnkctl k get nodes

roksbnkctl status reports cluster identity + reachability:

roksbnkctl status
Workspace:    default
Region:       us-south
Cluster:      bnk-quickstart  (attach existing)
TF source:    embedded@v1.0.0
Last apply:   2026-05-08 14:22:08 UTC  (3m ago)
Kubeconfig:   /home/you/.kube/config
Cluster:      2/2 nodes ready

Step 4 — (optional) deploy a BNK trial on top

Now that the cluster is up, roksbnkctl up deploys a BNK trial onto it. It reads cluster-outputs.json and reuses the cluster:

roksbnkctl up --auto

See Chapter 10 — Deploying BNK trials for the trial-phase walkthrough. You can run up / down many times against the same cluster — each cycle is ~5 minutes rather than the ~50 minutes of a fresh-cluster run.

Step 5 — roksbnkctl cluster down --auto

Tear down the cluster phase. In v1.1.0 cluster down is strictly scoped: it refuses with a hard error (rather than the v1.0.x warning-but-prompt) on any workspace whose trial state is non-empty, so an out-of-order destroy can’t accidentally orphan BNK resources. Destroy the trial first with roksbnkctl bnk down (or roksbnkctl down for both at once); see Chapter 11 for the full refusal catalogue.

roksbnkctl cluster down --auto

Sample output:

→ terraform destroy (cluster phase)
  module.testing.ibm_is_instance.tgw_jumphost: Destroying...
  module.cert_manager.helm_release.cert_manager: Destroying...
  module.roks_cluster.ibm_container_vpc_cluster.cluster: Destroying...
  module.roks_cluster.ibm_container_vpc_cluster.cluster: Still destroying... [5m elapsed]
  module.roks_cluster.ibm_container_vpc_cluster.cluster: Destruction complete after 8m16s

  Destroy complete! Resources: 36 destroyed.

Post-destroy, cluster-outputs.json is deleted. The workspace directory and its config.yaml survive — re-running cluster up against the same workspace re-creates the cluster with the same name and region.

Why split cluster from trial?

Two-phase is the default because the cost of conflating them is concrete. ROKS clusters take 30-50 minutes to provision and bill at roughly $0.30/hour; a BNK trial on top takes 5-10 minutes. Iterating on the trial — different flo versions, different cne_instance shapes, license bundle revisions — happens far more often than iterating on the cluster underneath. Splitting state means a bnk down / bnk up cycle is a five-minute round-trip instead of an hour.

Three scenarios this shape unlocks:

  1. Many BNK trial iterations on one cluster. Run cluster up once, then loop bnk up / bnk down against the same cluster until you’ve covered all the trial permutations. Then cluster down once when you’re finished. This is the headline win of the v1.1.0 surface — see Chapter 10 §“Worked example — iterating on a BNK trial”.

  2. Pre-provisioning for a workshop or demo. You want the cluster ready and warm before the demo starts; you’ll deploy the BNK trial live in front of the audience. cluster up the night before; bnk up during the demo.

  3. Decoupling cluster lifecycle from trial lifecycle. A long-lived cluster used by multiple team members, where one person owns the cluster phase and others own the BNK trials. Cluster-phase outputs live in cluster-outputs.json; trials read it. Each trial can bnk up / bnk down without affecting the cluster.

For workspaces that just want “create a cluster, deploy BNK on it, test, tear it all down”, the unscoped roksbnkctl up / roksbnkctl down are still the right verbs — in v1.1.0 they’re shape-aware composites that drive the cluster + trial steps in the right order without you having to think about it.

Legacy single-state workspaces

Workspaces created against v1.0.x predate the split. Their terraform.tfstate under ~/.roksbnkctl/<workspace>/state/ contains both the cluster modules (module.roks_cluster, module.cert_manager, module.testing) and the trial modules (module.flo, module.cne_instance, module.license) in one file; state-cluster/ either doesn’t exist or is empty.

roksbnkctl calls this shape LegacySingle and identifies it by walking the trial state’s resource list for cluster-module addresses. To check a workspace’s shape from the outside, look at the state directories:

$ ls ~/.roksbnkctl/<workspace>/
config.yaml  state/  state-cluster/    # split (v1.1.0+) or cluster-only

$ ls ~/.roksbnkctl/<workspace>/
config.yaml  state/                    # legacy single-state, or empty

A state/terraform.tfstate that contains module.roks_cluster and friends is legacy single-state; a state-cluster/terraform.tfstate with content is the split shape.

The v1.1.0 binary handles both shapes:

  • Legacy single-state workspaces: roksbnkctl up and roksbnkctl down operate monolithically the way they did in v1.0 — same plan output, same resource count, same byte-for-byte behaviour. The phase-scoped commands (cluster up/down, bnk up/down) refuse with a message pointing you back at the unscoped lifecycle verbs.
  • Split workspaces (the new default): up / down are shape-aware composites that delegate to the phase commands underneath; cluster up/down and bnk up/down work directly.

The refusal messages on a legacy workspace look like:

$ roksbnkctl -w canada-roks cluster down
this workspace is legacy single-state; cluster and BNK trial share one state. Use `roksbnkctl down` to tear down both, or migrate the state first

$ roksbnkctl -w canada-roks bnk down
this workspace is legacy single-state; `bnk down` can't isolate the trial phase. Use `roksbnkctl down` to tear down both, or migrate the state first

The refusals print as a single line each — wrapping is a function of your terminal width. Grep against any of the inline punctuation (e.g. \bnk down` can’t isolate`) lands a clean match.

There is no automatic state-migration command in v1.1.0. The refusal text references migration (“or migrate the state first”) because a future roksbnkctl migrate is planned, but until it ships, legacy workspaces stay on the unscoped up / down flow that’s worked for them since v1.0. See Chapter 11 §“The phase-aware decision tree” for the full destruction-time decision matrix.

Cross-references