Validation

The Validation System allows an operator to flexibly define a set of tasks to do validation operations. Validation allows an operator to execute tests to verify that certain conditions have been met either in the environment or as completed tasks by other workflow steps. This allows for a composable set of rules to be run to either verify the system has been provisioned successfully or an error may have occured that needs addressed.

The validation system is designed to be flexibly extended in the field. The current set of Staes, Library functions, and Tasks are curated capabilities by RackN. Custom Stages and Tasks can be flexibly added to allow for unique use cases.

If you have developed a set of Tasks and Library functions that may be useful to other operators, please consider contributing back to the content pack to enhance it for other users.

Prebuilt Stages¶

The validation system provides three prebuilt base sets of validation stages to run the validation engine in:

validation-post-discover
validation-post-hardware
validation-post-install

The intent is to have a set of tasks that validate after the initial discover (gohai-inventory, inventory, or other "discover" related tasks), post hardware configuration (typically after BIOS settings, Firmware Flash, and RAID volume creation), and in the final installed operating system.

The validation system is run in Workflow as a standard Stage allowing the operator to place the stage anywhere in the Workflow that makes sense. The above three pre-built stages define RackN defined useful points during the provisioning lifecycle of a Machine.

The validation-start and validation-stop tasks are the only tasks that should be listed in the Stage, do not add Tasks to the Stage. Tasks will dynamically be added based on the control Paramters.

Defining Validation Tasks¶

Control over which specific validation tasks are executed at any given validation stage is defined by the Param validation/list-parameter. This parameter is a reference to the Param name that the validation-start task uses to dynamically update the task list with a list of validation tasks.

The Tasks that are defined in the parameter (specified by the validation/list-parameter) will be composed and added to the task list. These will be dynamically inserted between the validation-start and validation-stop tasks.

Composed means the value of that parameter is the aggregation of all occurances of that parameter at all levels of the system, (e.g. From the parameters on the machine, then profiles on machines, then parameters on the stage, ...). This allows multiple resources to provide the validation tests that will eventually be run on the system. Typically, Parameter orders of precedence would override earlier occurances of the Paramter values. Validation does not follow that standard path.

The Param that is referenced by validation/list-parameter should be a simple array type Param. It lists each Task that should be executed in the listed sequence by the validation process at that Stage in the Workflow.

Task Exit Conditions¶

Validation actions can exit one of several ways:

validation-success = validation succeeded, exit 0
validation-fail-immediately = fail and exit 1, stopping workflow immediately at task
validation-fail-at-stage-end = exit 0, collect logs in validation/errors param, stop at end of Stage
validation-fail-and-ignore = log validation error to validation/errors-ignored, but exit 0 - no workflow stop will occur at all

futures return conditions (not yet implemented): validation-fail-and-remediate = exit 1, and if the task specifies an additional remediation task, run this new task, plus ourself again - hoping the remediation task solved the issue - may need number tracking and break the cycle after N failures

Remediating and Continuing Failed Tasks¶

The failure exit conditions allow the developer of the validation task to determine if the failure is a "soft" failure which can be remediated, and then the Workflow can be resumed, or if the failure implies a "hard" fail that can not be corrected during workflow operation. Additionally, some failures may not impact final workflow completion, but the operator would like to have them recorded for future review or remediation.

If a failure that is external to machine and can be remediated (for example a DNS record has the wrong information, and DNS can be updated quickly) the test should be validation-fail-immediately. This allows the failure to be remediate and Workflow can be continued without failing validation.

If the failure is not able to be remediated immediately, a log message should be added to the validation/errors logging Param, and success returned so that additional issues can be found and aggregated.

Example Usage¶

Here is a brief outline of example usage of the validation system utilizing the RackN defined stages. In this example, we will define tasks to run in the validation-post-discovery and validation-post-hardware stages. Most of the Tasks referenced are valid tasks that exist across several content and plugin components.

Workflow¶

In this example we will create a "my-complete-discovery" workflow that utilizes several components of the Digital Rebar content and plugins capabity.

Note

Note the addition of the 'validation-post-discovery' and 'validation-post-hardware' Stages, these define the point in the workflow where the validation process will run.

Meta:
  color: grey
  icon: money
Name: my-complete-discovery
Description: My complete discovery workflow
Stages:
  - discover
  - setup-repos
  - ipmi-inventory
  - raid-inventory
  - network-lldp
  - inventory
  - rack-discover
  - classify
  - validation-post-discover
  - ipmi-configure
  - flash
  - raid-enable-encryption
  - raid-configure
  - bios-configure
  - ilo-config
  - burnin
  - burnin-reboot
  - validation-post-hardware
  - notify-hardware-complete
  - sledgehammer-wait

Stages¶

In this example, we are not creating any new stages, simply using the RackN defined stages in the above workflow example. Please review the "Adding New Stages" section in the below "Developing New Validation Capabilities" section if you'd like to create custom stages.

Tasks¶

There are several tasks that are defeind to be used by the Validation system in this example. You can find them in the following profile section. Note that several of these tasks do exist in various content and plugins. Each of these tasks would need to exist and perform the required functions.

Define the Tasks for Validation¶

In this example, we will utilize a profile which contains the defined Params for the validation-post-discover and validation-post-hardware stages. The operator can apply the Profile to the appropriate machines, you could potentially use the classify stage to execute a classification task to determine whether or not to programmatically apply the Profile to the given machine as it executes the classify stage.

The profile defines the tasks to run. The profile YAML configuration would look like the following:

---
Meta:
  color: grey
  icon: money
Name: validations
Params:
  validation/post-discover:
    - "validation-machine-name-dns-to-ip"
    - "in-subnet-check-render"
    - "in-subnet-check-validate"
    - "ipmi-network-validation"
    - "validate-nic-counts"
    - "validate-jumbo-frames"
  validation/post-hardware:
    - "validate-ipmi-hostname-ip"

The type definition for the Params validation/post-discover are defined in the Validation content pack, and the appropriate stage contains the control mapping for each.

For the validation-post-discover Stage, the stage carries the mapping within the stage, as follows:

Params:
  "validation/list-parameter": "validation/post-discover"

Similarly, the validation-post-hardware stage defines the mapping to point to the Param validation/post-discover.

These are the values we use in our profile above to add the list of Tasks to execute at the appropriate Stage run.

Operating The Validation¶

Once you have completed the above tasks, you can choose to make the my-complete-discovery Workflow your default preference setting for all new machines, or manually choose to place machines in to the Workflow.

To make this the default workflow, change your preferences (from Info & Preferencess in the Portal), or the following CLI command:

drpcli prefs set defaultWorkflow my-complete-discovery

Reviewing Any Failures¶

As with all jobs executed in Workflows, there will be a job log available to review the status output of any tasks that have run. If you receive a Validation failure, refer to the appropriate job log.

If the failure includes the use of the add_validation_error function, you can review the Param value that was added to the Machine, with the name validation/errors. The Param will be added to the machine that has failed the validation process.

The next run of any validation stage will empty the contents of this Param, prior to starting executing the tasks. Please insure that you review the Para error values after a failure, but before you re-run any more validation stages.

It is required to include the {{ template "setup.tmpl" .}} in each of the validation tasks. With the inclusion of this template, you can set the rs-debug-param on a Machine, and the validation tasks will contain a lot more debug output in the job log. If you need to verify/debug the actions in the validation task, this is a good way to review more detailed output logging information.

Skipping Validation Without Changing Your Workflow¶

The validation system was designed to allow you to add it to the Workflow, and if no validation tasks are defined (by use of the reference validation/list-parameter Param), then the system will skip any validation attempts for the given stage.

This allows you to leave the Stage defined control parameter blank and not require changing the Workflow. This effectively becomes a "noop" of the validation system.

Developing New Validation Capabilities¶

This section provides information on how to build custom validation capabilities utilizing the existing Validation system.

Namespace Definitions¶

Validation system stages, control params, and other content parts will utilize the prefix namespace of validation/. Individual Tasks, Params, etc. that implement functional use of the Validation system components will utilize the prefix namespace of validate/.

For example, the Start Stage for the Validation system is validation-start, while the NIC Count and UP link state checking uses a Task namespaced as validate-nic-counts.

Standard Stage/Task and Param semantics still apply (eg validate/nic-required-up is the Param namespace name).

Note that traditionally, RackN uses the "slash" naming standard for Params, while Stages, Taks, and Templates generally utilize a "dash" to separate the parent namespace. Don't ask us why.

Library of Functions¶

The Validation System provides a template which contains a library of currated tools for the system; the template is called validation-lib.tmpl. Only Bash functions which implement broadly useful capabilities are placed in this template, which is managed by RackN.

If your validation tools require a common library, they can be built as a standard template, and included in any appropriate custom tasks.

Current functions that can be incorporated in to your Validation custom tasks are as follows:

validation_add_error() = Adds a validation error to the validation/errors Param
validation_add_error_ignore() = Adds a validation error to the validation/errors-ignore Param
validation_clear_errors_ignore() = Removes the validation/errors-ignore Param from the Machine
validation_msg_prefix() = builds a prefix of Stage/Task/CurrentJob for output messages
validation_success() = marks a validation task successful and exits 0
validation_fail_at_stage_end() = marks a validation task failed, but exits 0, and ultimately exits the Stage with exit code 1
validation_fail_immediately() = marks validation task failed and exits 1 immediately
validation_fail_and_ignore() = marks validation as failed, but ignores the failure without exiting the workflow
#validation_fail_and_remediate() = Not implemented yet
validate_machine_name_dns_by_ip() = verifies the machine has a DNS record based on the Machines IP address
validate_ipv4_ip_syntax() = simple helper to verify string is in IPv4 dotted quad notation
validate_check_same_subnet() = verifies that given 2 ip address and a subnet mask, both addresses are in the same subnet network
validate_ping_dest_from_src() = verifies that Machine can ping an IP address given a source interface to use
validate_nic_counts() = verifies the number of NICs and number of NICs that can be brought to an "UP" state

Adding New Stages¶

Stages define the primary grouping of Tasks that an operator runs at a given point in the Workflow sequence. These stages can be placed anywhere that makes operational sense to a given Workflow.

The Stage should only contain the validation-start and validation-stop tasks, no other tasks should be added to your stage, as the validation-start stage will dynamically inject the desired stages in to the Workflow for the operator.

Example Stage in YAML:

---
Name: "my-validation"
Description: "Perform tasks defined by the 'my-validation' Param."
Documentation: |
  The Param 'my-validation' is an array that will contain
  the list of validation tasks to run.
Params:
  "validation/list-parameter": "my-validation"
Tasks:
  - "validation-start"
  - "validation-stop"
Meta:
  color: "orange"
  icon: "search"
  title: "RackN Content"

Note

Note that as the Documentation field says; the validation/list-parameter for this Stage is defined as my-validation. This Parameter must be defined as an array; which is a list of the Tasks to execute during this Stage. The Param can be defined on the Machine in all of the normal ways (directly as a Param, as part of a Profile, or as a Param in the Global Profile.

Adding New Tasks¶

Tasks are the heart of the validation system, and perform the actual validation implementation for the system to execute. Validation tasks are standard RackN Workflow Tasks, and can carry embedded templates, or refer to external templates to perform the actual Task(s) defined.

A validation task should include both the setup.tmpl template and the validation-lib.tmpl template. Both templates are required for a successful validation task.

Validation tasks are only limited by the requirements (and perhaps creativity) of the author. They should be individual and discreet items that the system should check for and return results on. Please review the "Task Exit Conditions" above to insure you handle exit codes correctly for the system.

Here is an example Task that implements a hypothetical validation function.

---
Name: "validation-test-fail"
Description: "Example failing validation task"
Meta:
  color: "blue"
  icon: "bug"
  title: "RackN Content"
  feature-flags: "sane-exit-codes"
Templates:
- Name: "validation_fail.sh"
  Contents: |
    #!/usr/bin/env bash
    {{ template "setup.tmpl" . }}
    {{ template "validation-lib.tmpl" . }}

    {{ if eq (.Param "margarita-time" ) true }}
      echo "This is a failure.  Calling add_validation_error"
      add_validation_error "Margarita time! We'll finish provisioning later."
      exit 0
    {{ else -}}
      echo "Sadly, it's not margarita time.  Provision on!"
      exit 0
    {{ end -}}

In this example, some other component or process would be responsible for setting the check Param margarita-time to either true or false.

Utilizing the Error Logging Param¶

The library also contains a helper routine to add errors to the validation/errors logging Param. The Bash function name is add_validation_error which should be called with a single string which contains the text of the error message. This function can safely called multiple times, and each subsequent error message will be appended to the validation/errors array.

Separating Validate Tasks from the Validation System¶

Validation tasks or extensions can be built and used and added to other Content Packs or Plugins. The validation system itself is a framework for executing and providing the method to run and manage the validation process.

Examples of validation in use can be found throughout some of the other RackN managed content and plugin systems. For example, review the VMware Plugin content:

https://gitlab.com/rackn/provision-plugins/blob/v4/cmds/vmware/content/tasks/validation-machine-name-dns-to-ip.yaml

For a validation task that is defined and used outside of the Validation content pack.