High Availability
By default, Banyan runs one engine that manages everything. That’s the right choice for most teams — no extra setup, no moving parts.
When you need the cluster to survive an engine failure, add a second engine. Both engines are active: they handle CLI commands, agent heartbeats, and schedule deployments. If one goes down, the other continues immediately. No election delay, no manual failover.
What changes
| Single engine (default) | Multiple engines | |
|---|---|---|
| etcd | Managed by Banyan | You provide an etcd cluster |
| Registry | Managed by Banyan | You provide an OCI registry |
| Scheduling | Direct | Coordinated via per-deployment locks in etcd |
| Agent config | One engine address | List of engine addresses |
| Failover | None (single point) | Instant — other engines continue |
| Setup effort | Zero | You run etcd and a registry separately |
The trade-off is real: HA means running your own etcd and registry. For many teams, a single engine with a good backup strategy is enough. HA is for when downtime of the control plane is not acceptable.
Prerequisites
Before setting up multiple engines, you need:
- An etcd cluster — at least 3 nodes for etcd’s own HA. A single etcd node works for testing but defeats the purpose.
- An OCI registry — any Docker-compatible registry (Harbor, Docker Registry, GitLab Container Registry, etc.). All engines push/pull from the same registry.
1. Initialize the engines
On Engine 1 (192.168.1.10):
sudo banyan-engine initThe wizard asks for the deployment mode:
Deployment mode: > Single engine — zero config, everything managed for you (recommended) Multi-engine HA — 2+ engines for high availability (requires your own etcd and registry)Choose Multi-engine HA. The wizard then asks for:
- External etcd endpoint — your etcd cluster address (e.g.,
http://etcd.internal:2379) - External registry URL — your OCI registry (e.g.,
registry.internal:5000) - Etcd connection security — None, password, TLS, or mTLS
sudo systemctl enable --now banyan-engineRepeat on Engine 2 (192.168.1.20) with the same etcd and registry addresses.
Both engines generate their own WireGuard keypairs during init. Copy each engine’s public key — agents need it.
The wizard also asks about the secrets encryption key. On the first engine, choose “Generate new key”. On additional engines, choose “Provide existing key file” and point to the key copied from the first engine:
# Copy secrets key from engine-1 to engine-2 before running init on engine-2scp engine-1:/etc/banyan/keys/secrets.key engine-2:/etc/banyan/keys/secrets.keyAll engines must use the same secrets.key to encrypt and decrypt secrets. See Secrets — High Availability for details.
2. Configure agents with multiple engines
On each agent, run banyan-agent init. The wizard asks for the engine host and port as usual. After that:
Add additional engine endpoints for high availability? For single-engine setups, choose No
> Yes / NoChoose Yes. The wizard prompts for each engine’s address and WireGuard public key:
Engine #2 — address: host:port (or leave empty to finish)
> 192.168.1.20:50051
Engine #2 — WireGuard public key: Displayed during 'banyan-engine init' on that server
> ZwlhPtN5dWw4TRSOkrUhjwC4w1jtABOnFUgVEdHImi8=The primary engine (from the host/port and WG key you entered earlier) is automatically included as engine #1. Each engine has its own WireGuard key — the agent sets up encrypted tunnels to all of them. Add as many engines as you need — leave the address empty to finish.
The agent connects to the first available engine. If that engine goes down, the agent reconnects to the next one within seconds.
sudo systemctl enable --now banyan-agent3. Configure the CLI
On your deploy machine, run banyan-cli init. Same pattern — after the engine host/port, it asks:
Add additional engine endpoints for high availability? > Yes / NoIf yes, add each engine’s address and WireGuard public key one at a time — same flow as the agent wizard. The CLI sets up tunnels to all engines and connects to the first one that responds.
4. Verify
Check that both engines are registered:
banyan-cli engineEngine================================================== Status: running Uptime: 5m ...
Cluster Summary-------------------------------------------------- Agents: 2/2 connected ...Both engines see the same cluster state because they share etcd. You can run banyan-cli commands against either engine — the result is the same.
How it works
All engines are identical. There’s no leader, no primary, no standby. Every engine:
- Handles CLI commands (deploy, status, down)
- Accepts agent registrations and heartbeats
- Runs the scheduling loop to assign tasks to agents
- Monitors deployment progress
Per-deployment distributed locks in etcd prevent two engines from scheduling the same deployment twice. The engine that receives a Deploy command schedules it immediately. Other engines skip it when they see it’s already been handled.
CLI runs "banyan-cli up" → connects to Engine 2 (first healthy endpoint) → Engine 2 writes deployment to etcd, schedules tasks immediately → Engine 1's loop sees the deployment is already scheduled — skips it → agents poll either engine for tasks — same data from etcdWhen an engine goes down
If Engine 1 crashes:
- Agents connected to Engine 1 detect heartbeat failures and reconnect to Engine 2 within seconds
- Engine 2 continues scheduling — it was already active
- CLI commands automatically try the next endpoint
- Engine 1’s registration in etcd expires after 15 seconds (TTL-based)
When Engine 1 comes back, it re-registers in etcd and starts processing work again. No manual intervention needed.
Config reference
Engine config (/etc/banyan/banyan.yaml):
engine: multi_engine: true managed_etcd: false store_address: "http://etcd.internal:2379" managed_registry: false external_registry_url: "registry.internal:5000"Agent config:
agent: engines: - address: "192.168.1.10:50051" wg_public_key: "engine-1-public-key-here" - address: "192.168.1.20:50051" wg_public_key: "engine-2-public-key-here"CLI config:
cli: engines: - address: "192.168.1.10:50051" wg_public_key: "engine-1-public-key-here" - address: "192.168.1.20:50051" wg_public_key: "engine-2-public-key-here"Going back to single-engine
If you decide HA isn’t worth the operational overhead:
- Remove
multi_engine: truefrom the engine config and switch back tomanaged_etcd: trueandmanaged_registry: true - Remove the
engines:list from agent and CLI configs - Restart all components
You’re back to zero-config single-engine mode.