Roadmap
Milestone 1 — Core Orchestration (MVP)
Status: Done
Run containers across multiple servers using a familiar YAML manifest.
- Parse banyan.yaml manifest (Docker Compose-compatible syntax)
- Engine control plane with etcd-based state
- Agent nodes with containerd/nerdctl container management
- Round-robin scheduling across agents
- CLI for engine, agent, and deploy workflows
- VPC networking layer (IPAM, DNS, CNI)
- E2E test infrastructure
Milestone 2 — Service Observability
Status: Done
See what’s running, check container health, stream logs, and stop deployments — all from the CLI.
- Agent monitors container health after deployment (running, exited, restarting)
- Agent reports per-container status back to Engine via gRPC
banyan-cli deploymentandbanyan-cli containershow per-service and per-container statusbanyan-cli logsstreams container logs from agents (via engine gRPC proxy)- Detect and surface failed containers (e.g., exited immediately after start)
banyan-cli downcommand to stop and remove all containers for a deployment
Milestone 3 — Security
Status: Done
Secure all inter-component communication with WireGuard-based authentication and encryption.
- All inter-component communication uses gRPC with public key authentication
- Each component generates an X25519 keypair during
init - Agent/CLI → Engine: public key in gRPC metadata, validated against whitelist
- Engine → Agent: session token authentication for log streaming
- Config file at
/etc/banyan/banyan.yamlwith sections:engine,agent,cli initcommands for engine, agent, and CLI prompt for credentials and connection info- Three separate binaries:
banyan-engine,banyan-agent,banyan-cli
See Authentication for details.
Milestone 3.5 — Agent Tags for Environment Isolation
Status: Done
Optional tags on agents and deployments for environment isolation (e.g. staging vs production on shared infrastructure).
- Agent tags configured in
/etc/banyan/banyan.yamland sent via Register/Heartbeat RPCs --tagsflag onbanyan-cli upandbanyan-cli downfor deployment tag matching- Tag matching rules: both untagged = match, one side tagged = no match, intersection = match
- Same app name with different tags can coexist as independent deployments
- Engine scheduling filters agents by tag match before assigning tasks
Milestone 3.6 — Networking
Status: Done
Built-in overlay networking and cross-host load balancing without external dependencies.
- WireGuard overlay managed by Engine via abstract
OverlayDriverinterface - Per-agent /24 subnet allocation from VPC CIDR via
SubnetAllocator - Peer discovery via heartbeat RPC (15s convergence)
- iptables DNAT proxy on each agent for port forwarding to container backends
- Cross-host load balancing: every agent aware of all service backends cluster-wide, probability-based DNAT rules distribute traffic across all replicas regardless of which agent they run on
- Service DNS: agent-local DNS server on bridge gateway IP resolves
<service>.<app-name>.internalto container IPs (e.g.,db.my-app.internal). Short names (e.g.,db) also work when there’s no conflict across deployments.
Milestone 4 — Blue-Green Redeployment
Status: Done
Update running applications with zero downtime.
- Blue-green strategy: New containers start alongside old ones; old are torn down only after new deployment is healthy
- Automatic rollback on failure: If the new deployment fails, old containers keep running — no downtime
- Per-service deployment: Redeploy only specific services with
banyan-cli up -f banyan.yaml web api - Dependency validation: Per-service deploys validate
depends_on— dependencies must be running or included deploy→uprename: The deploy command is nowbanyan-cli up(withdeploykept as an alias)- Per-service
down: Stop specific services withbanyan-cli down --name my-app web db
See Redeployment for details.
Milestone 4.6 — Live Terminal Dashboard
Status: Done
Monitor the entire cluster from your terminal — no browser, no Grafana, no setup.
banyan-cli dashboard: Live terminal UI built with Bubbletea showing real-time cluster state- Overview screen: Engine health (CPU, memory, disk), cluster summary, agent table, deployment table, and recent events — all on one screen
- Agent and deployment drill-down: Select any agent or deployment to see detailed metrics, container status, service breakdown, and resource usage
- Container list: Flat view of every container across the cluster with status, image, agent, and replica info
- Command palette: Press
pto fuzzy-search and jump between views - Keyboard navigation: htop-style scrolling, vim keys (
j/k), number keys to switch views,Enterto drill in,Escto go back - Floating overlays: Help and command palette float over the dashboard without hiding the underlying view
- Auto-refresh with configurable interval (
--refreshflag, default 5s)
See CLI Reference — dashboard for details.
Milestone 5 — Production Readiness
Status: Done
Deploy with confidence: environment files for configuration, systemd services for reliability.
env_filesupport: Reference.envfiles in the manifest (env_file: .envorenv_file: [.env, .env.local]), matching Docker Compose syntax- Variable loading: Parse key-value pairs from
.envfiles and inject as container environment variables at deploy time - File distribution: CLI bundles referenced
.envfiles with the manifest so agents can resolve them on any node - Systemd service files: Install script creates
banyan-engine.serviceandbanyan-agent.serviceforsystemctl enable --nowlifecycle management — auto-start on boot, restart on crash
Milestone 6 — Resource-Aware Scheduling
Status: Done
Smarter task distribution based on node resources instead of simple round-robin.
- Agent resource reporting: Agents report CPU, memory, and disk usage to the engine via heartbeat (stored on NodeRecord in etcd)
- Resource-aware scheduling: Engine selects the agent with the most available memory when assigning tasks, tracking batch allocations to prevent piling tasks on one node
- Resource requests in manifest: Services can declare CPU and memory requirements via
deploy.resources(e.g.,memory: 512m,cpus: "0.5") - Default resource requests: Services without explicit requirements default to 512MB RAM and 1 CPU core for scheduling purposes
- Cluster capacity validation: Engine rejects deployments whose total resource requests exceed total cluster capacity
- Graceful fallback: When agents haven’t reported metrics yet (e.g., first heartbeat pending), scheduling falls back to round-robin
Milestone 7 — Multi-Engine High Availability
Status: Done
Run multiple engines for high availability. All engines are active — no leader, no standby. If one goes down, the others continue instantly.
- Active-active scheduling: All engines handle RPCs and run the scheduling loop. Per-deployment distributed locks in etcd prevent duplicate work.
- Instant scheduling: Deploy commands trigger scheduling immediately on the receiving engine, instead of waiting for a polling loop.
- Managed registry: Persistent OCI image storage via Distribution (Docker Registry v2) subprocess. Images survive engine restarts.
- Agent multi-endpoint failover: Agents configured with multiple engine addresses reconnect to the next available engine within seconds.
- CLI multi-endpoint failover: CLI tries each configured engine endpoint with a health check, connects to the first one that responds.
- External etcd + registry required: HA mode requires user-provided etcd cluster and OCI registry (managed services are single-process and can’t be shared).
- Zero-config single-engine preserved: Default single-engine mode is unchanged — no new configuration needed for existing users.
See High Availability for setup guide.
Milestone 8 — Volumes
Status: Done
Persistent storage for containers — named volumes, bind mounts, tmpfs, and NFS shared volumes. Same syntax as Docker Compose.
- Named volumes: Persistent local storage managed by the container engine. Data survives container restarts.
- Bind mounts: Mount host directories or files into containers. Absolute paths or relative to
/var/lib/banyan/data/on each agent. - tmpfs: In-memory temporary storage with optional size limits.
- NFS shared volumes: Declare NFS in the manifest, Banyan mounts it on each agent automatically. Multiple replicas on different agents share the same data.
- Read-only mounts: Append
:roor setread_only: trueto prevent container writes. - Placement + volumes: Pin stateful services to specific agents with
deploy.placement.nodeto ensure data locality.
See Manifest Reference — Volumes for syntax and examples.
Milestone 9 — Auto-Scaling & Workload Rebalancing
Status: Done
Automatic horizontal scaling based on CPU metrics, manual scaling via CLI, and workload rebalancing across agents.
banyan-cli scale: Adjust replica counts on a running deployment without redeploying. Containers are added or removed individually — no blue-green, no new deployment ID.- Auto-scaling rules in manifest: Define
deploy.autoscalewithmin,max,target_cpu, andcooldown. Engine evaluates CPU metrics every 30 seconds and adjusts replicas automatically. - Per-container metrics: Agents collect CPU and memory usage per container via
nerdctl statsand report to the engine in health checks. - Graceful scale-down: Removing containers follows a drain sequence — remove from proxy, remove DNS, wait grace period, then stop. No dropped requests.
- Workload rebalancing: Engine detects overloaded agents (CPU or memory > 95%) and migrates stateless containers to underloaded agents. Five safeguards prevent infinite migration: per-container cooldown (10 min), high threshold (95%), target validation, minimum imbalance (30%), and max one migration per agent per cycle.
See Auto-Scaling for the guide and Manifest Reference — Autoscale for syntax.
Milestone 10 — Web Monitoring Dashboard
Status: Done
Browser-based dashboard for teams that prefer a web UI over the terminal. Runs locally via the CLI — no separate server to deploy.
banyan-cli dashboard --web: Starts a local web server and opens the dashboard in your browser. The web UI is embedded in the CLI binary — no npm, no Node.js, no separate process- Per-page APIs: Each page fetches only the data it needs (ListAgents, ListContainers, ListDeployments, etc.) instead of one monolithic call. Lighter payloads, independent refresh rates, easier debugging
- Cluster overview: Stat cards for engines, agents, deployments, containers, and tasks. Recent events table
- Agent, deployment, and container detail pages: Click any row to drill into details. Cross-linked — click an agent name on a container row to jump to that agent
- Container log viewer: Fetch recent logs (configurable tail: 100/500/1000 lines), auto-refresh every 3 seconds, log level coloring, scroll-to-latest indicator
- CPU and memory metrics: CPU percentage with sparkline history, memory usage with progress bars, per-container and per-agent
- Command palette:
Ctrl+Kto search across pages, agents, deployments, and containers. Keyboard navigation - Dark and light themes: Dark mode by default (matches terminal aesthetic), toggle with one click
- Design system: Geist typography, Lucide icons, color tokens matching the TUI palette. Terminal-native aesthetic, not generic SaaS
- Systemd-ready: Run as a service behind nginx/caddy for team-wide access
The terminal dashboard (banyan-cli dashboard) remains available for users who prefer the terminal — see Milestone 4.6.
Milestone 11 — Secrets Management
Status: Done
Manage sensitive configuration — database passwords, API keys, tokens — without plaintext in manifests or source control.
banyan-cli secretcommands:create,list,get(with--reveal),deletefor managing encrypted secrets- AES-256-GCM encryption: Secrets encrypted at rest in etcd with a 256-bit key stored on the engine (
/etc/banyan/keys/secrets.key) - Manifest
secrets:field: Reference secrets by name — injected as environment variables into containers at runtime - Just-in-time resolution: Secret values never stored in task records. Decrypted only during PollTasks (in-memory), transmitted over WireGuard
- Deploy-time validation: Deploying with a missing secret fails immediately with an actionable error
- Delete protection: Secrets referenced by running deployments cannot be deleted
See Secrets for the guide and Manifest Reference — Secrets for syntax.
Milestone 12 — Self-Healing Deployments
Status: Done
Automatic failure recovery through a desired-state reconciliation engine. Banyan checks every 10 seconds that reality matches your manifest and repairs any drift.
- Reconciliation loop: Three controllers (Agent, Container, Deployment) run in sequence every 10 seconds. Container crashes are restarted, dead agents get their work rescheduled, deployment health is computed automatically.
- Restart policy enforcement: Respects Docker Compose
restart:field —always(default),on-failure,on-failure:N(with retry limit),unless-stopped,no. Exponential backoff prevents restart storms. - Agent failure rescheduling: When an agent dies, its containers are rescheduled to healthy agents after a grace period (2 min standard, 5 min for long-running agents). Safeguards: anti-flapping cooldown, capacity checks, stateful pinning for services with local volumes.
- Deployment health status: Each deployment is
healthy,recovering,degraded, orstopped— visible in CLI, TUI dashboard, and web dashboard. - Engine restart recovery: The reconciliation loop starts within 10 seconds of the engine coming back up. Agents reconnect, report their state, and the reconciler repairs any drift.
- Agent lifecycle cleanup: Graceful shutdown cleans up WireGuard, iptables, DNS, and CNI. Stale interface recovery on startup ensures clean networking regardless of prior state.
See How Banyan Works for the mental model behind reconciliation.
Milestone 13 — Advanced Security
Authorization and certificate lifecycle management.
- Attribute-based access control (ABAC) for CLI commands and API actions — define roles and permissions in a config file, enforce in engine gRPC handlers
- Certificate rotation support
Milestone 14 — Advanced Networking
Service discovery, traffic policies, and encrypted communication across the cluster.
- Health-check-based routing: Only route to healthy containers — health status is already tracked via
healthcheck:in the manifest; next step is filtering backends by health status in HeartbeatResponse - Session affinity: Optional sticky sessions per service using iptables
recentmodule or connection tracking (session_affinity: truein banyan.yaml) - Network policies: Control which services can communicate — iptables rules on each agent to filter traffic between service subnets (service-level allow/deny in banyan.yaml)
- VPC peering: Allow explicit cross-deployment communication — deployments are isolated by default (per-deployment iptables chains); VPC peering lets users define exceptions so specific services in one deployment can reach services in another (e.g., a shared database deployment)
- Ingress / L7 routing: HTTP path/host-based routing via a lightweight reverse proxy (Caddy or Envoy) auto-configured from service definitions
Milestone 15 — Rootless CLI
Remove the sudo requirement from banyan-cli. Engine and agent need root (they manage containers, networking, and system services) but the CLI is a user tool — it should work without elevated privileges.
- User-space config: Move CLI config from
/etc/banyan/banyan.yaml(root-owned) to~/.config/banyan/config.yaml(user-owned).banyan-cli initwrites to user dir. Engine/agent config stays in/etc/banyan/. - Userspace WireGuard: Replace kernel WireGuard (
wg-ctl-cliinterface, requires root) with a userspace implementation (e.g.,wireguard-goor embedded Go WireGuard viagolang.zx2c4.com/wireguard). No kernel interface, no root needed. The tunnel runs in-process for the duration of the CLI command. - No sudo for any CLI command:
banyan-cli up,dashboard,logs,scale,down,secret— all work as a normal user. - Migration path:
banyan-cli initdetects existing/etc/banyan/config and offers to migrate CLI section to~/.config/banyan/. Existing root-based setups keep working. - Key storage: CLI private key moves to
~/.config/banyan/keys/cli.keywith0600permissions (user-owned, not root-owned). banyan-cli login: No longer needs sudo — sets up userspace WireGuard tunnel in the background or per-command.
Milestone 16 — Dashboard: Manifest Editor & Container Exec
Extend the web dashboard from a monitoring tool into a deployment interface.
- Compose manifest editor: Edit docker-compose.yaml directly in the web dashboard with syntax highlighting, validation, diff preview, and one-click deploy. Turns the dashboard from an operations tool into a deployment interface — the Vercel/Netlify moment for container orchestration
- Terminal-in-browser: WebSocket terminal into any running container directly from the web dashboard. Click a container, click “Shell”, get an interactive terminal. Requires a new
ExecContainerRPC, agent exec capability (nerdctl exec), WebSocket proxy (xterm.js), and a security model (RBAC needed before allowing exec permissions) - TUI/Web feature parity policy: Define whether the TUI dashboard is kept in feature-sync with the web, allowed to diverge, or eventually deprecated. Depends on real user feedback after both dashboards ship