Skip to content

Latest commit

 

History

History
229 lines (144 loc) · 7.59 KB

README.md

File metadata and controls

229 lines (144 loc) · 7.59 KB

Introduction

Practice Kubernetes troubleshooting with realistic error scenarios.

Each scenario is run with kubectl apply commands. To cleanup, run kubectl delete on the same.

Simple Scenarios

Crashing Pod (CrashLoopBackoff)
kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/crashpod/broken.yaml

To get notifications like below, install Robusta:

OOMKilled Pod (Out of Memory Kill)
kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/oomkill/oomkill_job.yaml

To get notifications like below, install Robusta:

High CPU Throttling (CPUThrottlingHigh)

Apply the following YAML and wait 15 minutes. (CPU throttling is only an issue if it occurs for a meaningful period of time. Less than 15 minutes of throttling typically does not trigger an alert.)

kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/cpu_throttling/throttling.yaml

To get notifications like below, install Robusta:

Pending Pod (Unschedulable due to Node Selectors)

Apply the following YAML and wait 15 minutes. (By default, most systems only alert after pods are pending for 15 minutes. This prevents false alarms on autoscaled clusters, where it's OK for pods to be temporarily pending.)

kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/pending_pods/pending_pod_node_selector.yaml

To get notifications like below, install Robusta:

ImagePullBackOff
kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/image_pull_backoff/no_such_image.yaml 

To get notifications like below, install Robusta:

Liveness Probe Failure
kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/liveness_probe_fail/failing_liveness_probe.yaml

To get notifications like below, install Robusta:

Readiness Probe Failure
kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/readiness_probe_fail/failing_readiness_probe.yaml
Job Failure The job will fail after 60 seconds, then attempt to run again. After two attempts, it will fail for good.
kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/job_failure/job_crash.yaml

To get notifications like below, install Robusta:

Failed Helm Releases Deliberately deploy a failing Helm release:
helm repo add robusta https://robusta-charts.storage.googleapis.com && helm repo update
helm install kubewatch robusta/kubewatch --set='rbac.create=true,updateStrategy.type=Error' --namespace demo-namespace

Upgrade the release so it succeeds:

helm upgrade kubewatch robusta/kubewatch --set='rbac.create=true' --namespace demo-namespace --create-namespace

Clean up by removing the release and deleting the namespace:

helm del kubewatch  --namespace demo-namespace 
kubectl delete namespace demo-namespace 

To get notifications like below, install Robusta and setup Helm Releases Monitoring

Advanced Scenarios

Correlate Changes and Errors

Deploy a healthy pod. Then break it.

kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/crashpod/healthy.yaml
kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/crashpod/broken.yaml

If someone else made this change, would you be able to immediately pinpoint the change that broke the application?

To get notifications like below, install Robusta.

Track Deployment Changes

Create an nginx deployment. Then simulate multiple unexpected changes to this deployment.

kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/deployment_image_change/before_image_change.yaml
kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/deployment_image_change/after_image_change.yaml

To get notifications like below, install Robusta and setup Kubernetes change tracking

Track Ingress Changes

Create an ingress. Then changes its path and secretName to simulate an unexpected ingress modification.

kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/ingress_port_path_change/before_port_path_change.yaml
kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/ingress_port_path_change/after_port_path_change.yaml

To get notifications like below, install Robusta and setup Kubernetes change tracking

Drift Detection and Namespace Diff

Deploy two variants of the same application in different namespaces:

kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/namespace_drift/example.yaml

Can you quickly tell the difference between the compare1 and compare2 namespaces? What is the drift between them?

To do so with Robusta, install Robusta and enable the UI.

Inefficient GKE Nodes

On GKE, nodes can reserve more than 50% of CPU for themselves. Users pay for CPU that is unavailable to applications.

Reproduction:

  1. Create a default GKE cluster with autopilot disabled. Don't change any other settings.
  2. Deploy the following pod:
kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/gke_node_allocatable/gke_issue.yaml
  1. Run kubectl get pods -o wide gke-node-allocatable-issue

The pod will be Pending. A Pod requesting 1 CPU cannot run on an empty node with 2 CPUs!

To see problems like this with Robusta, install Robusta and enable the UI.