High Availability

Digital Rebar v4.6+ supports fully integrated High Availability (HA) in dr-provision via automated failover using consensus (aka Raft) and liveness checking. This capability is fully integrated to the standard Digital Rebar service so an in-place upgrade from Stand Alone mode is possible in v4.6+: a HA enabled license is required.

Manual failover via synchronous replication has been supported since v4.3 and will remain available for the foreseeable future to support live backups, multi-site manager and other distributed operations. This can also be used for HA configuration but is not recommended.

Training Video: https://www.youtube-nocookie.com/embed/w4PBZsYr3zE

Prerequisites¶

There are a few conditions that need to be met in order to set up an HA cluster of dr-provision nodes:

A fast network between the nodes you want to run dr-provision on. Data replication between HA nodes uses synchronous log replay and file transfer, so a fast network (at least gigabit Ethernet) is required to not induce too much lag into the system as a whole.
Enough storage space on each node to store a complete copy of all replicated data. dr-provision will wind up replicating all file, ISO, job log, content bundle, plugin, and writable data to all nodes participating in the HA cluster. The underlying storage should be fast enough that write speeds do not become a bottleneck -- we recommend backing the data with a fast SSD or NVME device.
A high-availability entitlement in your license. High-availability is a licensed enterprise feature. If you are not sure if your license includes high-availability support, contact support@rackn.com.
All endpoints and the cluster ID must be registered in your license. High-availability is a licensed enterprise feature. If you are not sure if your license includes high-availability support, contact support@rackn.com.
A virtual IP address that client and external traffic can be directed to. If using dr-provision's internal VIP address management (i.e. not using the --load-balanced command line option), dr-provision will handle adding and removing the virtual IP from a specified interface and sending out gratuitous ARP packets on failover events, and the nodes forming the HA cluster must be on the same layer 2 network. If using an external load balancer, then the virtual IP must point to the load balancer, and that address will be used by everything outside the cluster to communicate with whichever cluster node is the active one.

Consensus HA¶

Consensus is the feature used to implement high availability with automated failover. This mode requires that you have at least 3 servers in a cluster. More servers in a cluster are also permitted, but there must be an odd number to prevent the cluster from deadlocking in the case of communication failures that isolate half of the cluster from the other half. Consensus also requires a stable IP address:port that can be used for the replication protocol.

Synchronous Replication (obsolete and unsupported)¶

Synchronous Replication is the feature used to implement streaming realtime backups. In Digital Rebar v4.3.0 through v4.6.0, it was also used to implement HA with manual failover. It will continue to be present going forward as the realtime backup protocol, but consensus with automated failover is the supported path for high availability.

Contraindications¶

The high-availability support code in dr-provision assumes a model where either:

There is a single IP available for the HA cluster. This requires one of the following two items:
- The machines are in the same layer 2 broadcast domain to allow for moving the HA IP address via gratuitous AR
- An external load balancer is responsible for holding the virtual IP address and directing all traffic to the current active node.
The writable storage that dr-provision uses is not shared (via NFS, iSCSI, drbd, or whatever) between servers running dr-provision.
There must be an odd number of servers in the cluster, and you must have 3 or more in the cluster for automated failover. Single-node clusters are allowed, and are a necessary step in bootstrapping to a full HA cluster.

If none of the above are true, then you cannot use dr-provision in high-availability mode.

If you are running on separate broadcast domains, you will need to either ensure that there is an alternate mechanism for ensuring that packets destined for the HA IP address get routed correctly, or accept that provisioning operations will fail from the old network until the clients are informed of the new IP address.
If you are using a shared storage mechanism (NFS, DRBD, iSCSI, or some form of distributed filesystem), then you should not use our integrated HA support, as it will lead to data corruption. You should also make sure you never run more than one instance of dr-provision on the same backing data at the same time, or the data will be corrupted.
Self-runner functionality and everything that depends on it must be disabled. Having a persistent agent running on a cluster node that can vanish or switch from active to passive at any time will interact badly if failover happens while the self-runner is performing tasks. This shortcoming will be addressed in future versions of dr-provision.

It is possible to use shared storage that replicates extended attributes for the tftproot space. This will reduce transfer times for replication, but only some distributed filesystems or shared devices support extended attribute sharing.

Configuration¶

Aside from DRPID and settings listed later in this section, configuration flags and startup options for the dr-provision services participating in an HA cluster should be identical. It is not required that the servers participating in the HA cluster have identical versions, but they must be running on the same OS and system architecture types. If you try to add a server version to a cluster that is incompatible, it will fail with an error telling you what to do to resolve the issue.

Required Licensing¶

Before installing, ensure that the High Availability flag is set on your RackN license. You must install the license on your first endpoint.

While installed licenses are not required to join other endpoints, each endpoint and the HA ID must be listed in the installed license. Consult Licensing for help with adding endpoints to a RackN license.

Minimum Cluster Site¶

Consensus clusters require an odd number of endpoints. High availability requires at least 3 endpoints. Running a single-node consensus cluster is permitted, and it is a necessary step in creating a highly available cluster. High availability requires an odd number of cluster members to ensure that an unambiguous quorum can be established.

Initial State¶

When building a high availability system, you will start from regularly install Digital Rebar v4.6+ endpoint. A regular installation leaves the endpoint running in "stand alone" mode with High Availability disabled. No specialized configuration flags are required of the endpoints for the cluster.

For the first endpoint enrolled into the cluster, all data and configuration will be preserved. This process will set the HA ID and virtual IP or Load Balancer configuration.

For the subsequent endpoints enrolled into the cluster, all data and configuration will be overwritten during the enrollment process. For this reason, only a minimal configuration is recommended for the added endpoints.

Reminder: each endpoint in a cluster and the cluster itself must have a unique ID.

Bootstrapping Consensus¶

You can bootstrap, add nodes to, and remove nodes from a consensus cluster using drpcli without needing to stop nodes for manual reconfiguration or mess with systemd config files. This is the preferred method of high availability.

Self-enroll the initial active node¶

To start the initial active node, run drpcli system ha enroll to have it enroll itself into a single node cluster. The form of the command to run is as follows:

drpcli system ha enroll $RS_ENDPOINT username password \
    ConsensusAddr address:port \
    VirtInterface interface \
    VirtInterfaceScript /path/to/script \
    HaID ha-identifier \
    LoadBalanced true/false \
    VirtAddr virtualaddr

The last 3 of those settings can only be specified during self-enroll. You also can only specify VirtInterface and VirtInterfaceScript if LoadBalanced is false.

If any errors are returned during that call, they should be addressed and the command retried. Once the command finished without error, the chosen system will be in a single node cluster that is ready to have other nodes added to the cluster.

Adding additional nodes¶

To add additional nodes to an existing cluster, you also use drpcli system ha enroll against the current active node in that cluster:

drpcli system ha enroll https://ApiURL_of_target target_username target_password \
    ConsensusAddr address:port \
    VirtInterface interface \
    VirtInterfaceScript /path/to/script

This will get the global HA settings from the active node in the cluster, merge those settings with the per-node settings from the target node and the rest of the settings passed in on the command line, and direct the target node to join the cluster using the merged configuration.

NOTE The current data on the target node will be backed up, and once the target node has joined the cluster it will mirror all data from the existing cluster. All backed up data will be inaccessible from that point.

A basic example with IPs, and the default user and password.

Initial Leader -- 192.168.1.10
Member 1 -- 192.168.1.20
Member 2 -- 192.168.1.30
Shared VIP -- 192.168.1.100

On leader node:

drpcli system ha enroll https://192.168.1.10:8092 rocketskates r0cketsk8ts\
    ConsensusAddr 192.168.1.10:8093\
    VirtInterface ens34\
    VirtInterfaceScript /usr/local/bin/drp_virt_interface.sh\
    HaID drp_ha_test\
    LoadBalanced false\
    VirtAddr 192.168.1.100/24

This will join the Initial Leader node into a single node cluster.

On member node 1:

export RS_ENDPOINT=https://192.168.1.100:8092
drpcli system ha enroll https://192.168.1.20:8092 rocketskates r0cketsk8ts\
    ConsensusAddr 192.168.1.20:8093\
    VirtInterface ens34\
    VirtInterfaceScript /usr/local/bin/drp_virt_interface.sh

This will set our endpoint that drpcli will communicate with to the VIP, then join member node 1 into the cluster.

On member node 2:

export RS_ENDPOINT=https://192.168.1.100:8092
drpcli system ha enroll https://192.168.1.30:8092 rocketskates r0cketsk8ts\
    ConsensusAddr 192.168.1.30:8093\
    VirtInterface ens34\
    VirtInterfaceScript /usr/local/bin/drp_virt_interface.sh

This also sets the endpoint drpcli will communicate with to our VIP, then it will join member 2 into the cluster making it a 3 node cluster.

You can verify this worked with:

export RS_ENDPOINT=https://192.168.1.100:8092
drpcli system ha dump|jq .Nodes

The 3 nodes should be listed in the output, with all of the settings.

Other consensus commands¶

drpcli system ha has several other commands that you can use to examine the state of consensus on a node.

drpcli system ha active will get the Consensus ID of the node that is currently responsible for all client communication in a consensus cluster. It is possible for this value to be unset if the active node has failed and the cluster is deciding on a new active node.
drpcli system ha dump will dump the complete state of the cluster.
drpcli system ha failOverSafe will return true if there is at least one node in the cluster that can take over if the active node fail, and it will return false otherwise. You can pass a time to wait (up to 5 seconds) for the cluster to be failover safe as an optional argument.
drpcli system ha id returns the Consensus ID of the node you are talking to.
drpcli system ha leader returns the Consensus ID of the current leader of the Raft cluster. This can be different from the active ID if the cluster is in the middle of determining which cluster member is best suited to handling external cluster traffic.
drpcli system ha peers returns a list of all known cluster members.
drpcli system ha state returns the current HA state of an individual node.
drpcli system ha remove will remove a node from the cluster using its consensus ID (not DRP ID!)

Troubleshooting¶

Network Interface Locked¶

It is possible for the HA interface to become locked if you have to stop and restart the service during configuration testing. To clear the interface, use ip addr del [ha ip] dev [ha interface].

This happens because Digital Rebar is attaching to (and detaching from) the cluster IP. If this process is interrupted, then the association may not be correctly removed.

Tracking HA State: ha-state.json¶

Note

This is a system managed file, do not edit it manually!

As of version 4.6.0, the ha-state.json file will be the proxy Source of Truth for all high availability settings. Settings in ha-state.json take precedence over any from the commandline or environment, and they will be automatically updated as conditions change as a result of HA-related API requests and general cluster status changes.

This section describes the meaning of the components of this state file.

A sample ha-state.json looks like this:

{
  "ActiveUri": "",
  "ApiUrl": "",
  "ConsensusAddr": "",
  "ConsensusEnabled": false,
  "ConsensusID": "ab0f7bec-5c48-45c3-8970-b3543ec2e9d4",
  "ConsensusJoin": "",
  "Enabled": false,
  "HaID": "",
  "LoadBalanced": false,
  "Observer": false,
  "Passive": false,
  "Roots": [],
  "Token": "",
  "Valid": true,
  "VirtAddr": "",
  "VirtInterface": "",
  "VirtInterfaceScript": ""
}

ActiveUrl¶

ActiveUrl is the URL that external services and clients should use to talk to the dr-provision cluster. It is automatically populated when a cluster is created via the self-enroll API, and cannot be changed afterwards. It will be the same on every cluster member.

ApiUrl¶

ApiUrl is the URL used to contact the current node. It is automatically populated on every start of the current node. It is specific to an individual node.

ConsensusAddr¶

ConsensusAddr is the address:port that all consensus traffic will go over on this node. It is initially populated by the self-enroll API, and cannot be changed afterwards.

ConsensusEnabled¶

ConsensusEnabled indicates whether this node is a member of a consensus cluster. It is automatically set to true when ConsensusAddr is not empty.

ConsensusID¶

ConsensusID is set when loading an invalid ha-state.json for the first time, and cannot be changed. It is used by the consensus protocol to uniquely identify a node.

ConsensusJoin¶

ConsensusJoin is the URL for the current consensus cluster leader, if any. It is automatically updated by the consensus replication protocol, and must not be manually edited.

Enabled¶

Enabled is set when either form of high availability is enabled on this node. It corresponds to the --ha-enabled command line option.

HaID¶

HaID is the shared high-availability ID of the cluster. This setting must be the same across all members participating in a cluster, and in a consensus cluster that is enforced by the consensus protocol. It corresponds to the --ha-id commandline option.

LoadBalanced¶

LoadBalanced indicates that the HA address is managed by an external load balancer instead of by dr-provision. This setting must be the same across all members participating in a cluster, and in a consensus cluster that is enforced by the consensus protocol. It corresponds to the --ha-load-balanced command line option.

Observer¶

Observer is set aside for future use to indicate that this node cannot become the active node in the cluster. It will be functional in a future release of dr-provision.

Passive¶

Passive indicates that this node is not the active node in the cluster. This field is kept updated by consensus, and is read-only.

Roots¶

Roots is the list of current trust roots for the consensus protocol. All consensus traffic is secured via TLS 1.3 mutual authentication, and the self-signed certificates in this list are uses as the trust roots for that mutual auth process. Individual trust roots are valid for 3 months, and are rotated every month.

Token¶

Token is the authentication token that can be used for nodes participating in the same cluster to talk to each other's APIs. Token is automatically managed and will be rotated on a regular basis.

Valid¶

Valid indicates that the state stored in ha-state.json is valid. If state is not valid, it is populated with matching parameters from the command line options, otherwise it takes precedence over command line options.

VirtAddr¶

VirtAddr is the address that all external traffic to the cluster should sue to communicate to the cluster. It is initially populated on self-enroll, and cannot change afterwards.

VirtInterface¶

VirtInterface is the name of the network interface that VirtAddr will be added or removed from when LoadBalanced is false. It is unique to each node, and corresponds to the --ha-interface commandline option.

VirtInterfaceScript¶

This is the name of the script that will be run whenever we need to add or remove VirtAddr to VirtInterface. It is specific to each node, and corresponds to the --ha-interface-script commandline option. If empty, dr-provision will use an internal default when LoadBalanced is false. VirtInterfaceScript must be populated if LoadBalanced is true, and it must perform whatever actions are needed to bind address and ensure that traffic for the cluster gets to the appropriate node.

If you want to customize this setting, you can use this example as a starting point:

#!/usr/bin/env bash
# $1 is the action to perform.  "add" and "remove" are the only ones supported for now.
# $2 is the network interface to operate on.  
# $3 is the address to add or remove.
case $1 in
    add)    sudo ip addr add "$3" dev "$2";;
    remove) sudo ip addr del "$3" dev "$2";;
    *) echo "Unknown action $1"; exit 1;;
esac