Debugging Kubernetes Issues
Issues in Kubernetes can be complex and multi-layered. This guide provides systematic approaches to troubleshooting,
common problems and their solutions, debugging techniques, and performance optimization strategies.
Pod Status & Information
# Get pod status
kubectl get pods -n default
# Detailed pod info
kubectl describe pod
# View pod events
kubectl get events --sort-by='.lastTimestamp'
# Check pod logs
kubectl logs
kubectl logs -f # Follow logs
Exec & Port Forward
# Execute commands in pod
kubectl exec -it
-- /bin/sh
# Port forwarding
kubectl port-forward 8080:8080
# Copy files
kubectl cp :/path/to/file ./local/path
Resource & Node Info
# Node status
kubectl get nodes
kubectl describe node
# Resource usage
kubectl top nodes
kubectl top pods
# Node capacity
kubectl describe nodes | grep Capacity -A 5
Issue: Pod Stuck in Pending
Pod is created but not assigned to any node.
Solution:
# Check node resources
kubectl top nodes
# Check for node selector issues
kubectl describe pod
# Check resource quota
kubectl describe quota -n
Common causes: Insufficient cluster resources, node affinity issues, or resource quotas exceeded.
Issue: CrashLoopBackOff Status
Container starts, crashes, and restarts continuously.
Solution:
# Check logs
kubectl logs
--previous
# Detailed event information
kubectl describe pod
# Check liveness probe settings
kubectl get pod -o yaml
Common causes: Application errors, misconfigured health probes, missing dependencies, or failed initialization.
Issue: ImagePullBackOff
Kubernetes cannot pull the container image.
Solution:
# Verify image name and tag
kubectl describe pod
# Check credentials
kubectl get secrets
# Create image pull secret
kubectl create secret docker-registry myregistrykey \
--docker-server= \
--docker-username= \
--docker-password=
Common causes: Wrong image name, non-existent tag, authentication issues, or private registry access problems.
Resource Optimization
- Right-size CPU/memory requests based on actual usage
- Use Horizontal Pod Autoscaling (HPA)
- Implement pod disruption budgets
- Use node affinity for optimal placement
Monitoring Tools
- Prometheus - Metrics collection
- Grafana - Visualization
- ELK Stack - Logging
- Jaeger - Distributed tracing
Key Metrics to Monitor
- CPU and memory usage
- Pod restart counts
- Network I/O throughput
- API server response times
Logging Best Practices
# View pod logs
kubectl logs
# Tail logs in real-time
kubectl logs -f
# View logs for previous crashed container
kubectl logs --previous
# Get logs from all containers in a pod
kubectl logs --all-containers=true
# Stream logs from multiple pods
kubectl logs -f -l app=myapp --max-log-requests=10
Log aggregation setup:
- Use structured logging (JSON format)
- Include correlation IDs for tracing
- Forward logs to centralized system (ELK, Splunk)
- Set retention policies
- Create alerts for errors and warnings