Troubleshooting

Engine

Etcd connection issues

Managed etcd: If etcd fails to start, check that port 2379 is not already in use and that the data directory (/var/lib/banyan/etcd/ by default) is writable.

“failed to connect to etcd” — For external etcd, make sure your etcd server is running and reachable at the configured address:

sudo apt-get install etcd-server    # Debian/Ubuntu
sudo systemctl start etcd

If you configured TLS or mTLS, verify that the certificate paths in /etc/banyan/banyan.yaml are correct and the files are readable.

To reconfigure the etcd connection, re-run banyan-engine init. See Etcd for setup details.

Engine starts but agents cannot connect

Agents connect to the Engine’s gRPC port (default: 50051). Check:

The Engine is running and the gRPC server started successfully (look for “Engine gRPC server listening on :50051” in the output).
The agent’s config has the correct engine host and port. Check /etc/banyan/banyan.yaml on the agent:
```
agent:
  engine_host: <engine-ip>
  engine_port: "50051"
  wg_public_key: "<base64-key>"
```
Port 50051 is open in your firewall between agents and the engine.
The agent’s public key is whitelisted on the engine. Check that a .pub file containing the agent’s public key exists in /etc/banyan/whitelisted-keys/ on the engine machine.

”Unauthenticated” errors

If agents or CLI clients receive “Unauthenticated” errors:

Verify the component’s public key is whitelisted on the engine. Check /etc/banyan/whitelisted-keys/ for a .pub file containing the key.
If the engine was re-initialized, the whitelisted keys directory is recreated empty. Re-copy all agent and CLI public keys.
If no config exists yet, run sudo banyan-agent init (or sudo banyan-cli init) to generate a keypair, then whitelist the public key on the engine.
To find a component’s public key: grep wg_public_key /etc/banyan/banyan.yaml

See Authentication for details on key management.

WireGuard overlay issues

If containers on different agents cannot communicate:

Check that wireguard-tools is installed on all agents: wg --version
Verify WireGuard kernel support: ip link add wg-test type wireguard && ip link delete wg-test — if this fails, the kernel module is missing (requires Linux 5.6+ or wireguard-dkms).
Ensure port 51820/UDP is open between agents.
If WireGuard is unavailable, Banyan falls back to VXLAN automatically. You can also force VXLAN by setting overlay_type: "vxlan" in the engine config.

Control tunnel issues

If agents or CLI cannot connect through the WireGuard control tunnel:

Check that the wg-control interface exists: ip link show wg-control
Verify the tunnel peer: wg show wg-control
Ensure port 51821/UDP is open from agents/CLI to the engine.
Test connectivity: ping 10.200.0.1 from the agent/CLI.
If the control tunnel fails, Banyan falls back to direct TCP with public key metadata authentication. Check the agent/engine logs for “Control tunnel setup failed” messages.
The CLI creates its tunnel during banyan-cli init (requires root). The tunnel is a kernel interface and doesn’t survive reboots. After a restart, run sudo banyan-cli login to re-establish it without re-running init. Subsequent CLI commands don’t need root.

Agent

”nerdctl not found”

Install nerdctl on the agent node:

curl -L https://github.com/containerd/nerdctl/releases/download/v2.0.3/nerdctl-2.0.3-linux-amd64.tar.gz \
  | sudo tar -xz -C /usr/local/bin nerdctl

“containerd not running”

Start containerd:

sudo systemctl start containerd

If containerd is not installed:

sudo apt-get install containerd

Agent shows “ready” but tasks fail

Check if the Agent can pull images. SSH into the agent and test:

sudo nerdctl pull nginx:alpine

If this fails, the agent may not have internet access or the image registry may be unreachable.

Deployment

Deployment stays in “deploying” status

The Engine is waiting for Agents to complete their tasks. Check:

Are agents connected? Run banyan-cli agent.
Check agent logs for errors in the terminal where agent start is running.
Verify agents can pull the images specified in your manifest.

Deployment fails immediately

Check the error message in banyan-cli deployment. Common causes:

Image not found: The image name in banyan.yaml is wrong or the registry is unreachable from agents.
Port conflict: Another container is already using the same host port.

”deployment timed out”

The up command waits up to 2 minutes by default. If your images are large, they may take longer to pull. Use --no-wait and check status manually:

banyan-cli up -f banyan.yaml --no-wait
# Check later:
banyan-cli deployment

Redeployment doesn’t replace old containers

When you run banyan-cli up again, Banyan should automatically replace old containers using a blue-green strategy. If old containers aren’t being replaced:

Check that the application name in banyan.yaml matches the running deployment. The name must be identical for Banyan to recognize it as a redeployment.
If the old deployment is in stopping or deploying state, the Engine waits for it to finish before scheduling the new one. Check banyan-cli deployment and wait a few seconds.
If a previous redeployment failed, the old containers stay running. Fix the issue and run banyan-cli up again — it will retry the replacement.

Old containers still running after redeployment

During blue-green redeployment, old containers run alongside new ones until the new deployment is confirmed healthy. This overlap is expected and usually lasts a few seconds. If old containers persist:

The new deployment may have failed. Check banyan-cli deployment for the deployment status and error message.
If the new deployment failed, old containers are intentionally kept running to avoid downtime. Fix the issue and redeploy.

See Redeployment for details on how blue-green and per-service deploys work.

Per-service deploy fails with dependency error

When deploying specific services (e.g., banyan-cli up -f banyan.yaml web), Banyan validates that all depends_on dependencies are satisfied. If you see an error like:

Error: service "web" depends on "api" which is not running and not being deployed

Either deploy the dependency too (banyan-cli up -f banyan.yaml web api) or make sure the dependency is already running in the existing deployment.

Containers are running but the application doesn’t work

Banyan deploys containers but does not manage application-level networking between services across nodes. Containers on the same agent can communicate via localhost. Containers on different agents need external networking or a load balancer.

General

Permission errors

The engine and agent require sudo for all commands — they manage network interfaces, iptables rules, and containers:

sudo banyan-engine init
sudo systemctl enable --now banyan-engine

sudo banyan-agent init
sudo systemctl enable --now banyan-agent

The CLI needs sudo for init and login (both create WireGuard kernel interfaces). All other CLI commands (up, down, engine, agent, deployment, container, events, logs, dashboard) run as your normal user. After a machine restart, run sudo banyan-cli login to re-establish the tunnel.

Checking logs

When running as a systemd service, use journalctl:

sudo journalctl -u banyan-engine -f   # engine logs
sudo journalctl -u banyan-agent -f    # agent logs

When running in the foreground (sudo banyan-engine start), logs print to stdout.

Etcd logs:

Managed etcd: Logs are printed to stdout alongside the engine output.
External etcd: Check the logs of your externally managed etcd service.

Stopping containers manually

If you need to remove containers directly on an agent:

sudo nerdctl rm -f <container-name>

To list all Banyan-managed containers:

sudo nerdctl ps | grep <app-name>