Skip to content

PXEBoot and TFTP Troubleshooting

This guide covers low-level debugging techniques for TFTP and PXE boot issues in DRP.

Understanding the PXE Boot Process

Before troubleshooting, it helps to understand what happens during a typical successful PXE boot:

  1. Machine sends DHCP discover broadcast
  2. DRP DHCP server responds with IP address and boot filename
  3. Machine requests bootloader via TFTP (port 69/udp)
  4. Bootloader (iPXE/pxelinux) loads and requests configuration
  5. Bootloader fetches kernel and initrd via TFTP or HTTP
  6. Machine boots into the target boot environment (e.g., Sledgehammer for discovery)

DRP Server Diagnostic Commands

Before diving into step-by-step debugging, verify DRP is healthy:

Bash
# Verify DRP is running and listening on expected ports
drpcli info status

# Check if the API is responding
drpcli info check

Enable debug logging for more detail during troubleshooting:

Bash
# All service logging
drpcli prefs set logLevel debug

# Individual service logging (e.g. DHCP)
drpcli prefs set debugDhcp debug

# Remember to set back to 'info' after debugging
drpcli prefs set logLevel info
drpcli prefs set debugDhcp info

Stream logs in real-time while attempting a boot:

Bash
drpcli logs watch

Step 1: DHCP Discover

What happens: Machine sends DHCP discover broadcast.

Test DHCP Server is Responding

Bash
# Verify DHCP port is listening
ss -uln | grep 67

# Test DHCP from another Linux machine on the same network (requires root)
sudo dhclient -v -1 <interface>

Capture DHCP Traffic

Bash
tcpdump -i <interface> -n port 67 -vv

You should see DISCOVER, OFFER, REQUEST, and ACK packets.

Common Issues

  • Firewall blocking port 67/udp

    Bash
    iptables -L -n | grep 67
    firewall-cmd --list-ports | grep 67
    

  • Another DHCP server on the network - Check for unexpected OFFER packets in tcpdump output

Step 2: DHCP Response with Boot Filename

What happens: DRP DHCP server responds with IP address and boot filename.

Verify Subnet Configuration

Bash
# Check NextServer
drpcli subnets show <subnet-name> | jq '.NextServer'

This will be empty if DRP is the NextServer

# Verify desired boot filename options
drpcli subnets show <subnet-name> | jq '.Options'

Verify Boot Environment is Available

Bash
# List available boot environments
drpcli bootenvs list | jq '.[].Name'

# Check a specific boot environment exists and is valid
drpcli bootenvs show sledgehammer

Step 3: Bootloader Request via TFTP

What happens: Machine requests bootloader via TFTP (port 69/udp).

Verify TFTP Port is Listening

Bash
ss -uln | grep 69

Test TFTP Manually

Bash
# Install tftp client if needed
# RHEL/CentOS: yum install tftp
# Ubuntu/Debian: apt-get install tftp-hpa

# Test fetching the bootloader
tftp <drp-server-ip>
tftp> get lpxelinux.0
tftp> quit

# Verify file was retrieved
ls -la lpxelinux.0

Capture TFTP Traffic

Bash
tcpdump -i <interface> -n port 69 -vv

Common Issues

  • Firewall blocking port 69/udp

    Bash
    iptables -L -n | grep 69
    
    # Open TFTP port if needed
    iptables -A INPUT -p udp --dport 69 -j ACCEPT
    
    # For firewalld
    firewall-cmd --permanent --add-port=69/udp
    firewall-cmd --reload
    

  • Network isolation - Client and DRP server on different VLANs without proper routing

Step 4: Bootloader Fetches Configuration

What happens: Bootloader (iPXE/pxelinux) loads and requests its configuration script.

Test Configuration Fetch via TFTP

Bash
tftp <drp-server-ip>
tftp> get default.ipxe
tftp> quit

# View the retrieved configuration
cat default.ipxe

A valid response contains iPXE commands like #!ipxe, kernel, and initrd.

Check for Boot Environment Errors

Bash
# Verify templates are rendered correctly
drpcli bootenvs show sledgehammer | jq '.Templates'

Common Issues

  • Missing or invalid templates - Check bootenv template syntax
  • Machine not assigned a bootenv - Verify with drpcli machines show <uuid> | jq '.BootEnv'

Step 5: Kernel and Initrd Fetch

What happens: Bootloader fetches kernel and initrd via TFTP or HTTP.

Test Boot File Fetch via TFTP

Bash
tftp <drp-server-ip>
tftp> get machines/[machine-uuid]/boot/[sledgehammer-uuid]/vmlinuz0
tftp> get machines/[machine-uuid]/boot/[sledgehammer-uuid]/stage1.img
tftp> quit

# Verify files were retrieved
ls -la vmlinuz0 stage1.img

Large files like kernel and initrd may take time to transfer over TFTP.

Check ISO/File Server Content

Bash
# Check specific bootenv files are uploaded
drpcli isos list

Common Issues

  • Missing ISO - Sledgehammer or OS ISO not uploaded

Step 6: Boot into Target Environment

What happens: Machine boots into the target boot environment (e.g., Sledgehammer for discovery).

Verify Machine Discovered Successfully

Bash
# List all machines
drpcli machines list | jq '.[] | {Name, Uuid, Stage, BootEnv}'

Check Machine Reached DRP

Bash
# View machine events/jobs
drpcli machines show <uuid> | jq '.CurrentJob'