Troubleshooting Guide
Common Issues
Installation Failures
Check API server health:
Bash # Test API server health
curl -k https://api.demo.k8s.local:6443/healthz
# Verify API server version
curl -k https://api.demo.k8s.local:6443/version
Check node and machine status:
Bash oc get nodes
oc get machines
oc describe node <node-name>
Review events:
Bash oc get events --sort-by= '.metadata.creationTimestamp'
Examine operator status:
Bash oc get clusteroperators
oc describe co <operator-name>
Check machine configuration:
Bash oc get pods -n openshift-machine-config-operator
oc logs -n openshift-machine-config-operator -l k8s-app= machine-config-server
Check installation logs:
Bash openshift-install gather bootstrap --dir /root/cluster
Network Issues
Verify DNS resolution:
Bash # Check API server resolution
dig api.demo.k8s.local +short
# Check internal API server resolution
dig api-int.demo.k8s.local +short
# Check application wildcard DNS
dig *.apps.demo.k8s.local +short
Check pod networking:
Bash oc get pods -n openshift-sdn
oc logs -n openshift-sdn -l app = sdn
oc get network.config.openshift.io cluster -o yaml
Review service endpoints:
Bash oc get endpoints -A
oc get svc -A
Test network connectivity:
Bash oc debug node/<node-name> -- chroot /host ip addr show
oc debug node/<node-name> -- chroot /host ping <target-ip>
Resource Constraints
Check resource usage:
Bash oc adm top nodes
oc adm top pods --containers= true --all-namespaces
Review quota usage:
Bash oc get resourcequota -A
oc describe quota -n <namespace>
Monitor storage:
Bash oc get pv,pvc --all-namespaces
oc get volumeattachment
Certificate Issues
Check certificate status:
Bash oc get csr
oc get secret -n openshift-config
Review certificate expiration:
Bash oc get secret -n openshift-kube-apiserver-operator kube-apiserver-to-kubelet-signer -o jsonpath = '{.metadata.annotations.auth\.openshift\.io/certificate-not-after}'
Verify API server certificates:
Bash oc get apiserver cluster -o yaml
Authentication and Authorization
Check identity provider configuration:
Bash oc get oauth cluster -o yaml
oc get identity
Review role bindings:
Bash oc get clusterrolebinding
oc get rolebinding --all-namespaces
Registry Issues
Check registry status:
Bash oc get pods -n openshift-image-registry
oc get configs.imageregistry.operator.openshift.io cluster -o yaml
Review storage configuration:
Bash oc get pvc -n openshift-image-registry
oc describe pvc -n openshift-image-registry
Collecting Diagnostics
Gather must-gather data:
Bash # General cluster data
oc adm must-gather
# Specific component data
oc adm must-gather --image= registry.redhat.io/rhacm2/acm-must-gather-rhel8:v2.8
Review cluster logs:
Bash # Control plane logs
oc logs -n openshift-controller-manager deployment/controller-manager
# Node logs
oc adm node-logs <node-name> -u kubelet
# Specific pod logs
oc logs -n <namespace> <pod-name> --previous
Export cluster state:
Bash # Full cluster state
oc get all -A -o yaml > cluster-state.yaml
# Specific component state
oc get nodes -o yaml > nodes-state.yaml
oc get co -o yaml > operators-state.yaml
Check etcd health:
Bash oc rsh -n openshift-etcd etcd-<control-plane-node> etcdctl endpoint health
oc rsh -n openshift-etcd etcd-<control-plane-node> etcdctl endpoint status -w table
Monitor API server metrics:
Bash oc get --raw /metrics | grep apiserver_request_duration_seconds
DRP -Specific Troubleshooting
Check DRP machine status:
Bash drpcli machines show <machine-uuid>
Examine task execution:
Bash drpcli tasks status <task-uuid>
drpcli tasks logs <task-uuid>
API Health Verification
Test API server health:
Bash # Test API server health directly
curl -k https://api.demo.k8s.local:6443/healthz
# Get API server version
curl -k https://api.demo.k8s.local:6443/version
Best Practices
Maintain cluster documentation including:
Network configuration
Storage layout
Authentication setup
Custom configurations
Implement systematic log collection and retention
Create and maintain runbooks for common issues
Document configuration changes and their rationale
Establish clear escalation paths for different types of issues
openshift
troubleshooting
debugging
diagnostics