Kubernetes Troubleshooting & Debugging | Common Issues & Solutions

Debugging Kubernetes Issues

Issues in Kubernetes can be complex and multi-layered. This guide provides systematic approaches to troubleshooting, common problems and their solutions, debugging techniques, and performance optimization strategies.

Essential Debugging Tools & Commands

Master these commands for effective troubleshooting

Pod Status & Information

# Get pod status
kubectl get pods -n default

# Detailed pod info
kubectl describe pod 

# View pod events
kubectl get events --sort-by='.lastTimestamp'

# Check pod logs
kubectl logs 
kubectl logs -f   # Follow logs

Exec & Port Forward

# Execute commands in pod
kubectl exec -it  -- /bin/sh

# Port forwarding
kubectl port-forward  8080:8080

# Copy files
kubectl cp :/path/to/file ./local/path

Resource & Node Info

# Node status
kubectl get nodes
kubectl describe node 

# Resource usage
kubectl top nodes
kubectl top pods

# Node capacity
kubectl describe nodes | grep Capacity -A 5

Common Issues & Solutions

Quick reference for typical problems

Issue: Pod Stuck in Pending

Pod is created but not assigned to any node.

Solution:

# Check node resources
kubectl top nodes

# Check for node selector issues
kubectl describe pod 

# Check resource quota
kubectl describe quota -n 

Common causes: Insufficient cluster resources, node affinity issues, or resource quotas exceeded.

Issue: CrashLoopBackOff Status

Container starts, crashes, and restarts continuously.

Solution:

# Check logs
kubectl logs  --previous

# Detailed event information
kubectl describe pod 

# Check liveness probe settings
kubectl get pod  -o yaml

Common causes: Application errors, misconfigured health probes, missing dependencies, or failed initialization.

Issue: ImagePullBackOff

Kubernetes cannot pull the container image.

Solution:

# Verify image name and tag
kubectl describe pod 

# Check credentials
kubectl get secrets

# Create image pull secret
kubectl create secret docker-registry myregistrykey \
  --docker-server= \
  --docker-username= \
  --docker-password=

Common causes: Wrong image name, non-existent tag, authentication issues, or private registry access problems.

Performance Optimization & Monitoring

Improve cluster and application performance

Resource Optimization

Right-size CPU/memory requests based on actual usage
Use Horizontal Pod Autoscaling (HPA)
Implement pod disruption budgets
Use node affinity for optimal placement

Monitoring Tools

Prometheus - Metrics collection
Grafana - Visualization
ELK Stack - Logging
Jaeger - Distributed tracing

Key Metrics to Monitor

CPU and memory usage
Pod restart counts
Network I/O throughput
API server response times

Logging & Log Analysis

Effective logging strategies

Logging Best Practices

# View pod logs
kubectl logs 

# Tail logs in real-time
kubectl logs -f 

# View logs for previous crashed container
kubectl logs --previous 

# Get logs from all containers in a pod
kubectl logs  --all-containers=true

# Stream logs from multiple pods
kubectl logs -f -l app=myapp --max-log-requests=10

Log aggregation setup:

Use structured logging (JSON format)
Include correlation IDs for tracing
Forward logs to centralized system (ELK, Splunk)
Set retention policies
Create alerts for errors and warnings