Audit Logging & Monitoring Overview
Comprehensive logging and monitoring are essential for detecting and responding to security incidents. API audit logs track all cluster operations, while tools like Falco provide runtime security monitoring. Together, they create a complete visibility layer for your cluster.
💡 Key Insight: Logging without monitoring is useless. You must actively monitor
logs for suspicious activity and alert on anomalies.
Kubernetes Audit Logging
Audit Log Levels
- None: Don't log requests matching this rule
- Metadata: Log request metadata (user, timestamp, resource)
- Request: Log metadata and request body
- RequestResponse: Log metadata, request, and response body
Sample Audit Policy
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
# Log all requests at Metadata level
- level: Metadata
omitStages:
- RequestReceived
# Log pod exec commands at RequestResponse
- level: RequestResponse
resources:
- group: ""
resources: ["pods/exec"]
# Log secret access
- level: RequestResponse
resources:
- group: ""
resources: ["secrets"]
Enable Audit Logging on API Server
--audit-log-path=/var/log/kubernetes/audit.log
--audit-log-maxage=30
--audit-log-maxsize=100
--audit-policy-file=/etc/kubernetes/audit-policy.yaml
Runtime Security with Falco
Falco Overview
Falco is a runtime security tool that detects suspicious process, network, and file activity:
- Real-time threat detection
- eBPF-based kernel monitoring
- Built-in security rules
- Low performance overhead
Install Falco
helm repo add falcosecurity https://falcosecurity.github.io/charts
helm install falco falcosecurity/falco \
--namespace falco --create-namespace \
--set falco.grpc.enabled=true
Common Falco Rules
- Write below root files
- Launch shell in container
- Unauthorized process
- Sensitive file access
- Network anomalies
✓ Best Practice: Combine Falco with SIEM (Splunk, ELK) for centralized
threat detection and response.
Alerting & Response
Set Up Alerting
- Prometheus + Alertmanager: Alert on metrics
- ELK Stack: Alert on log patterns
- SIEM: Centralized alerting
- Webhooks: Send alerts to Slack, PagerDuty, etc.
Incident Response Playbook
1. Alert triggered (Falco/audit log)
2. Investigation (check logs, processes, network)
3. Containment (isolate pod, preserve evidence)
4. Analysis (understand attack)
5. Recovery (remediate, restore)
6. Post-incident (update rules, improve detection)
Example Alert Rules
- Unauthorized API access attempts
- Suspicious process execution in container
- Network connections to suspicious IPs
- Privilege escalation attempts
- Credential access patterns
Monitoring Best Practices
- Centralize Logs: Use ELK, Splunk, or cloud native solutions
- Long Retention: Keep audit logs for 30+ days
- Real-Time Analysis: Monitor for threats in real-time
- Alerting: Alert on critical security events
- Regular Review: Review logs weekly for anomalies
- Compliance: Ensure logs meet regulatory requirements