High Availability¶

DRP v4.6+ supports fully integrated High Availability (HA) using Raft consensus with automated failover. An HA cluster consists of an odd number of DRP endpoints (minimum three) that replicate all writable data between members. The cluster presents a single virtual IP address to clients; if the active node fails, a passive node takes over automatically without manual intervention. HA is a licensed enterprise feature — your license must include the HA entitlement and must list every endpoint ID and the cluster's HA ID.

What HA Provides¶

Automated failover via Raft consensus with no manual intervention required.
Full data replication: files, ISOs, job logs, content bundles, plugins, and all writable model data are replicated to all members.
Single virtual IP for all client traffic, managed by DRP's internal VIP assignment (via gratuitous ARP) or an external load balancer.
Online node add/remove using drpcli system ha enroll without stopping the cluster.

Cluster Model¶

Every cluster has one active node (the Raft leader that handles all API and client requests) and one or more passive nodes that receive replicated data and are ready to become active. All nodes run identical dr-provision binaries. Configuration flags are identical across all members; the only per-node differences are the ConsensusAddr, VirtInterface, and ApiUrl settings stored in ha-state.json.

Consensus requires an odd number of members so that a quorum can always be determined after any single-node failure. A single-node cluster is valid as a bootstrapping step before additional nodes are enrolled.

What Is Replicated¶

Raft replication covers all data written through the DRP API: machine records, profiles, params, stages, workflows, bootenvs, plugins, content bundles, files, ISOs, job logs, and preferences. The replication protocol uses synchronous log replay over TLS-mutual-auth connections between cluster members. Storage backing should use fast SSD or NVMe to avoid write-speed bottlenecks.

Sub-pages¶

Configure — How to bootstrap a consensus cluster, enroll additional nodes, and configure the virtual IP. See the Configure sub-page for step-by-step enrollment commands.
Status — How to inspect cluster health, interpret HA state fields, and diagnose failover issues.

Limitations¶

HA is incompatible with shared storage (NFS, iSCSI, DRBD) between cluster members. Self-runner functionality must be disabled on cluster nodes. Nodes must be on the same layer 2 network when using DRP's internal VIP management without an external load balancer.