Proxmox Cluster Operations Fail with WorkOrder failed
¶
When attempting to create new, scale up, or scale down a Digital Rebar Platform (DRP) managed Cluster that is utilizing the Proxmox cloud-wrappers capability, the operation may fail if the Resource Broker is not configured with the correct Proxmox Hypervisor node name. This results in Terraform Apply failures in the Resource Broker, which are also reflected in the Cluster operations failing.
Ultimately this may be a result of the Resource Broker configuration of the Param
proxmox/node
not correctly matching the Proxmox Hypervisor's defined node name.
Symptom¶
When attempting to create, expand, or contract (reduce/destroy) a Cluster that utilizes a Proxmox Resource Broker, the operation fails with a Cluster Job Log (Activity) error similar to:
This results in the Cluster operation being unable to complete successfully.
Problem¶
While there may be several possible reasons for the failure, it is possible that the Resource
Broker configuration contains a proxmox/node
configuration that does not allow the unerlaying
Terraform Provider to correctly complete it's work. This can be determined by evaluating
the Cluster Job Log (Activity) to determine the matching Resource Broker Job Log, and reviewing
the Resource Broker's Job Log failure messages.
The Cluster Job Log will contain a link to the Work_Order that was initiated on the Resource Broker, similar to:
# note this is a reference only, the Work_Order ID will be different
ux://work_orders/1ef5bfcd-bf90-623b-9670-bb6428abb326/activity
Clicking on this link will show the Job Log for the Work_order that was executed on the Resource Broker. Going directly to the Resource Broker, it is possible to review the Activity Logs for the last failed log if no other operations have been attempted via the Cluster or Resource Broker since the original failure.
The following errors may be observed in this case:
Error: error creating VM: 596 error:0A000086:SSL routines::certificate verify failed, error status:
...snip...
Error: Plugin did not respond
...snip...
The plugin encountered an error, and failed to respond to the
plugin.(*GRPCProvider).ApplyResourceChange call. The plugin logs may contain
more details.
...snip...
ERROR: Terraform did not succeed - fail
This error can occur regardless of the Param proxmox/tls-insecure
configured value.
Ultimately, the backing Terraform Plugin Provider and the Proxmox API operations must utilize the Proxmox Hypervisors configured node name. The DRP Endpoint must also be able to resolve the defined node name in DNS correctly for the Resource Broker to be able make the API calls successfully.
Proxmox defines the Hypervisor node name independently from the operating system
hostname
, DNS records, or IP Addresses. It is usually beneficial to ensure that
the operating system hostname
and the Proxmox node name are both the same. For
this reason, DRP provisions Hypervisor node names and operating system hostname
to match the DRP Machine Name to help reduce confusion.
Solution¶
Modify the backing Resource Broker configuration to use the Proxmox hypervisor's
hostname
and also ensure the DRP Endpoint can resolve it to a reachable IP
Address. The node name value can also be verified by logging in to the shell
of the Proxmox Hypervisor, and reviewing the directory name(s) found in the
/etc/pve/nodes/
directory. A directory will exist with the node name
that must be used.
The configuration change is corrected by modifying the Param proxmox/node
which
is in the Profile that has the same name as the Resource Broker's name.
For example, if the Resource Broker name is proxmox-sm-mach-03
, then the Profile
with the same name contains the proxmox/node
Param definition which needs to
be modified. This can be found on the Machines detail tab named Profiles, or
the Profile can be modified by finding it in the Portal's Profiles menu entry
(on the left).
Assuming the Resource Broker name is proxmox-sm-mach-03
and the Proxmox Hypervisor
hostname
is set to sm-mach-03
, the following CLI will make the appropriate change.
BROKER="proxmox-sm-mach-03"
HYPERVISOR="sm-mach-03"
drpcli profiles set $BROKER param proxmox/node to $HYPERVISOR
Additional Information¶
Additional resources and information related to this Knowledge Base article.
See Also¶
- Github reference of error telmate-provider-proxmox issue 175
- Terraform provider Telmate/proxmox target_node definition
Versions¶
All versions of Cloud Wrappers, Proxmox, Clusters, Resource Brokers, Work Orders, and DRP
Keywords¶
proxmox, clusters, resource_brokers, work_orders, cloud-wrappers