PXEBoot and TFTP Troubleshooting¶
This guide covers low-level debugging techniques for TFTP and PXE boot issues in DRP.
Understanding the PXE Boot Process¶
Before troubleshooting, it helps to understand what happens during a typical successful PXE boot:
- Machine sends DHCP discover broadcast
- DRP DHCP server responds with IP address and boot filename
- Machine requests bootloader via TFTP (port 69/udp)
- Bootloader (iPXE/pxelinux) loads and requests configuration
- Bootloader fetches kernel and initrd via TFTP or HTTP
- Machine boots into the target boot environment (e.g., Sledgehammer for discovery)
DRP Server Diagnostic Commands¶
Before diving into step-by-step debugging, verify DRP is healthy:
# Verify DRP is running and listening on expected ports
drpcli info status
# Check if the API is responding
drpcli info check
Enable debug logging for more detail during troubleshooting:
# All service logging
drpcli prefs set logLevel debug
# Individual service logging (e.g. DHCP)
drpcli prefs set debugDhcp debug
# Remember to set back to 'info' after debugging
drpcli prefs set logLevel info
drpcli prefs set debugDhcp info
Stream logs in real-time while attempting a boot:
Step 1: DHCP Discover¶
What happens: Machine sends DHCP discover broadcast.
Test DHCP Server is Responding¶
# Verify DHCP port is listening
ss -uln | grep 67
# Test DHCP from another Linux machine on the same network (requires root)
sudo dhclient -v -1 <interface>
Capture DHCP Traffic¶
You should see DISCOVER, OFFER, REQUEST, and ACK packets.
Common Issues¶
-
Firewall blocking port 67/udp
-
Another DHCP server on the network - Check for unexpected OFFER packets in tcpdump output
Step 2: DHCP Response with Boot Filename¶
What happens: DRP DHCP server responds with IP address and boot filename.
Verify Subnet Configuration¶
# Check NextServer
drpcli subnets show <subnet-name> | jq '.NextServer'
This will be empty if DRP is the NextServer
# Verify desired boot filename options
drpcli subnets show <subnet-name> | jq '.Options'
Verify Boot Environment is Available¶
# List available boot environments
drpcli bootenvs list | jq '.[].Name'
# Check a specific boot environment exists and is valid
drpcli bootenvs show sledgehammer
Step 3: Bootloader Request via TFTP¶
What happens: Machine requests bootloader via TFTP (port 69/udp).
Verify TFTP Port is Listening¶
Test TFTP Manually¶
# Install tftp client if needed
# RHEL/CentOS: yum install tftp
# Ubuntu/Debian: apt-get install tftp-hpa
# Test fetching the bootloader
tftp <drp-server-ip>
tftp> get lpxelinux.0
tftp> quit
# Verify file was retrieved
ls -la lpxelinux.0
Capture TFTP Traffic¶
Common Issues¶
-
Firewall blocking port 69/udp
-
Network isolation - Client and DRP server on different VLANs without proper routing
Step 4: Bootloader Fetches Configuration¶
What happens: Bootloader (iPXE/pxelinux) loads and requests its configuration script.
Test Configuration Fetch via TFTP¶
tftp <drp-server-ip>
tftp> get default.ipxe
tftp> quit
# View the retrieved configuration
cat default.ipxe
A valid response contains iPXE commands like #!ipxe, kernel, and initrd.
Check for Boot Environment Errors¶
Common Issues¶
- Missing or invalid templates - Check bootenv template syntax
- Machine not assigned a bootenv - Verify with
drpcli machines show <uuid> | jq '.BootEnv'
Step 5: Kernel and Initrd Fetch¶
What happens: Bootloader fetches kernel and initrd via TFTP or HTTP.
Test Boot File Fetch via TFTP¶
tftp <drp-server-ip>
tftp> get machines/[machine-uuid]/boot/[sledgehammer-uuid]/vmlinuz0
tftp> get machines/[machine-uuid]/boot/[sledgehammer-uuid]/stage1.img
tftp> quit
# Verify files were retrieved
ls -la vmlinuz0 stage1.img
Large files like kernel and initrd may take time to transfer over TFTP.
Check ISO/File Server Content¶
Common Issues¶
- Missing ISO - Sledgehammer or OS ISO not uploaded
Step 6: Boot into Target Environment¶
What happens: Machine boots into the target boot environment (e.g., Sledgehammer for discovery).