20.56. kb-00055: Troubleshooting Runner does not connect¶
20.56.1. Knowledge Base Article: kb-00055¶
I have booted a machine (or used the join-up script) but the machine is not showing as runnable (aka Ready) or not being created during discover.
I’ve confirmed that the machine is booted/provisioned, but Digital Rebar does not connecting.
That generally means that the agent/runner is not starting.
There are number of reasons why the DRP runner is not able to start on a system. This article will help identify where to look for logs. Generally, details in the logs are sufficient to identify the failure to start or connect reason.
Depending on your machine registration approach, there are differences between discovery and join-up processes. While some suggestions here may not apply to your specific case, we’ve included several troubleshooting scenarios for completeness.
Note: We will be troubelshooting from the machine back towards Digital Rebar server, so these instructions assume that you can log into the machine(s) in question.
18.104.22.168. Did Discovery Start control.sh?¶
For Sledgehammer and netboot installation, Digital Rebar injects values into the bootstrap process.
cat /proc/cmdline to ensure that the
provisioner.web values are being set.
Note: join-up systems will not have specialized settings.
22.214.171.124. Can DRPCLI connect?¶
One of the first things that the runner will check is access to the system. Test access to the DRP server by using
drpcli -E [https://endpoint IP] -P [password] machines whoami.
Success with whoami confirms connectivity to the DRP server and also shows how the server identifies the machine in question.grep
126.96.36.199. Did discovery-common-bootstrap.sh get created?¶
One of the first things creating during runner bootstrap is the
/tmp/discovery-common-bootstrap.sh file. Check to see if that file exists.
If it is missing, then the basic discovery process failed. You will need to review how your BootEnv was created. Additional details may be in the Digital Rebar server logs.
188.8.131.52. Is control.sh available?¶
During discovery or join-up of _known_ machines machines will download the
control.sh bootstrap script from the DRP Server from http://[drp ip]:8091/machines/[machine uuid]/control.sh. If this file is not avialable at this URL then the process will fail.
This will happen if the machine is using
ignore bootenvs. It can also happen if the DRP Unknown Workflow/Stage/BootEnv values are not set.
184.108.40.206. Did control.sh or join-up.sh start?¶
When attaching discovered or netbooted machines, the system uses
control.sh to connect.
When attaching to existing machines, such as cloud joins, the system uses
join-up.sh to connect.
In many cases, the machine’s UUID is already presentnon the system as
/etc/rs-uuid. Check to see if this file exists and contains the correct UUID.
join-up.sh script by simply running it on the system and watching the results.
220.127.116.11. Is the runner starting then failing?¶
If the runner is starting and then failing, then the runner logs will provide helpful information.
Either review configuration at
/var/lib/drp-agent or try
journalctl -u sledgehammer
18.104.22.168. Did Jobs start running?¶
In some cases, the runner starts but then fails. This may indiciate a configuration or resource problem on the machine.
If the runner was able to start, then jobs lobs will be created under
/tmp/runner-* on the system. You also should be able to find matching logs on the Digital Rebar job log entries.
22.214.171.124. What about DRP Server logs?¶
While runner connect issues are generally machine related. DRP Server logs (or lack of them) can be a helpful diagnostic tool.
journalctl -u dr-provision to observe the logs.
20.56.4. Additional Information¶
Additional resources and information related to this Knowledge Base article.
126.96.36.199. See Also¶
For more information on join-up.sh, see the
runner, agent, does not start, not runnable
188.8.131.52. Revision Information¶
KB Article : kb-00055 initial release: Wed 16 Dec 2020 11:15:57 AM CST updated release: Wed 16 Dec 2020 11:15:57 AM CST