Module 01 · Foundations
What Is Kubernetes?
Beginner
~15 min

Kubernetes (K8s) is an open-source container orchestration platform originally developed by Google and donated to the CNCF in 2014. It automates the deployment, scaling, and management of containerized workloads across a cluster of machines.

Before K8s, running containers at scale meant writing custom scripts to restart crashed containers, balance load, and roll out updates. Kubernetes solves all of that declaratively — you describe what you want, and the control plane makes it happen.

The Problem Kubernetes Solves
Without K8s
Crashed containers stay dead. Rolling updates require custom scripts. Scaling is manual. Load balancing is external glue. Config drift is constant.
With K8s
Self-healing restarts failed pods. Rolling/canary deployments built-in. Horizontal autoscaling on metrics. Internal DNS and load balancing included.
Declarative Model
You write YAML manifests describing desired state. K8s continuously reconciles actual state toward desired state — the control loop.
Portable
Runs on-prem (bare metal, VMs), any cloud (EKS, GKE, AKS), or a laptop (k3s, minikube). Same YAML everywhere.
Key Concepts at a Glance
ConceptAnalogyWhat it does
PodA shipping containerSmallest deployable unit; wraps one or more containers
DeploymentA fleet managerManages replicas, rolling updates, rollbacks
ServiceA DNS name + load balancerStable network endpoint in front of pods
NamespaceA departmentLogical isolation boundary within a cluster
NodeA serverPhysical/virtual machine that runs pods
ClusterA data centerThe full set of control-plane + worker nodes
Kubernetes vs. Docker

Docker builds and runs containers on a single host. Kubernetes orchestrates containers across many hosts. They are complementary: Docker (or containerd/CRI-O) provides the container runtime, and Kubernetes uses it to schedule and manage workloads cluster-wide.

💡
Note: Since K8s 1.24, Docker is no longer the default runtime. Clusters use containerd or CRI-O directly via the Container Runtime Interface (CRI).
⚡ Quick Check
Which best describes Kubernetes' operational model?
A
Imperative — you tell it every step to execute
B
Declarative — you describe desired state; K8s reconciles
C
Event-driven — you subscribe to container lifecycle hooks
D
Scripted — you provide shell scripts for each operation
Module 02 · Foundations
Cluster Architecture
Beginner
~20 min

A Kubernetes cluster has two logical planes: the Control Plane (formerly "master") which makes global decisions about the cluster, and Worker Nodes which run your application workloads.

Cluster Architecture Overview
Control Plane
API Server
kube-apiserver
etcd
cluster state store
Scheduler
kube-scheduler
Controller Mgr
control loops
↕ HTTPS
Worker Nodes
kubelet
node agent
kube-proxy
network rules
Container RT
containerd/CRI-O
Pod · Pod · Pod
workloads
Control Plane Components
kube-apiserver
The front door. All kubectl commands, controllers, and nodes communicate through the REST API it exposes. Validates and persists objects to etcd.
etcd
Distributed key-value store. The single source of truth for all cluster state. Back this up. If etcd dies, the cluster loses memory.
kube-scheduler
Watches for unscheduled pods and assigns them to nodes based on resource requests, taints/tolerations, affinity rules, and available capacity.
kube-controller-manager
Runs control loops: Node controller, Deployment controller, ReplicaSet controller, Job controller, etc. Each loop reconciles actual → desired state.
Worker Node Components
kubelet
Agent on every node. Registers the node with the API server, pulls pod specs, instructs the container runtime to start/stop containers, and reports back node/pod status.
kube-proxy
Maintains iptables/IPVS rules on each node that implement Service virtual IPs and load balancing across pod endpoints.
Container Runtime
The software that actually runs containers. containerd and CRI-O are the standard choices. kubelet talks to them via the CRI gRPC interface.
The Reconciliation Loop

Every controller in K8s runs a watch → diff → act loop:

Desired State
YAML in etcd
Controller
watches API
Diff
desired vs actual
Reconcile
create/delete/update
⚠️
Exam tip (CKA): The API server is the only component that talks directly to etcd. Everything else (scheduler, controllers, kubelet) communicates exclusively through the API server.
⚡ Quick Check
Which component is responsible for assigning a newly created pod to a node?
A
kube-controller-manager
B
kubelet
C
kube-scheduler
D
kube-proxy
Module 03 · Foundations
Lab Setup
Hands-On
~25 min

You have several options for a local Kubernetes lab. We'll use k3s — a lightweight, CNCF-certified K8s distro ideal for learning. It runs on a single Linux VM or bare metal host.

Option Comparison
ToolBest ForOverheadNotes
k3sLinux VM / bare metal labLow (~512MB RAM)Single binary, full K8s API
minikubeMac/Windows dev laptopMediumVM or Docker driver
kindCI pipelines, Docker hostsLowNodes as Docker containers
k3dk3s inside DockerLowMulti-node on one host
Install k3s (Single-Node)
1
Install k3s
Run the official install script. It will set up the server and a systemd service automatically.
bash
# As root or with sudo
curl -sfL https://get.k3s.io | sh -

# Verify the service is running
systemctl status k3s
2
Configure kubectl access
k3s writes its kubeconfig to /etc/rancher/k3s/k3s.yaml. Copy it to your user's config directory.
bash
mkdir -p ~/.kube
sudo cp /etc/rancher/k3s/k3s.yaml ~/.kube/config
sudo chown $(id -u):$(id -g) ~/.kube/config

# Or use k3s's built-in kubectl alias
sudo k3s kubectl get nodes
3
Verify the cluster
Confirm your node is Ready and system pods are running.
bash
kubectl get nodes
# NAME         STATUS   ROLES                  AGE   VERSION
# k3s-node     Ready    control-plane,master   1m    v1.30.x

kubectl get pods -n kube-system
4
Install kubectl bash completion (optional but useful)
bash
kubectl completion bash | sudo tee /etc/bash_completion.d/kubectl
echo 'alias k=kubectl' >> ~/.bashrc
echo 'complete -o default -F __start_kubectl k' >> ~/.bashrc
source ~/.bashrc
💡
Multi-node with k3d: k3d cluster create lab --servers 1 --agents 2 spins up a 3-node cluster (1 control-plane + 2 workers) as Docker containers in under 30 seconds.
Module 04 · Core Resources
Working With Pods
Hands-On
~25 min

A Pod is the smallest deployable unit in Kubernetes — a wrapper around one or more containers that share a network namespace (same IP, same localhost) and storage volumes. In practice, most pods run a single container.

Pods are ephemeral: they are created, run, and die. They do not reschedule themselves. That's what Deployments are for.

Your First Pod
imperativebash
# Create a pod imperatively (useful for quick debugging)
kubectl run nginx-pod --image=nginx:alpine --port=80

# Watch it come up
kubectl get pod nginx-pod -w

# Inspect it
kubectl describe pod nginx-pod

# Get shell access
kubectl exec -it nginx-pod -- sh
Pod Manifest (YAML)

The declarative approach — what you'll use in production:

pod.yamlyaml
apiVersion: v1
kind: Pod
metadata:
  name: my-nginx
  labels:
    app: nginx
    env: dev
spec:
  containers:
  - name: nginx
    image: nginx:alpine
    ports:
    - containerPort: 80
    resources:
      requests:
        cpu: "100m"      # 0.1 CPU cores
        memory: "64Mi"
      limits:
        cpu: "250m"
        memory: "128Mi"
    readinessProbe:
      httpGet:
        path: /
        port: 80
      initialDelaySeconds: 5
      periodSeconds: 10
    livenessProbe:
      httpGet:
        path: /
        port: 80
      failureThreshold: 3
Health Probes
readinessProbe
Gates traffic. Pod won't receive Service traffic until probe passes. Use this to prevent requests hitting a container that hasn't finished starting up.
livenessProbe
Detects deadlocks. If it fails, kubelet restarts the container. Use for apps that can get stuck in a broken state without crashing.
startupProbe
For slow-starting apps. Disables liveness/readiness until it passes. Prevents premature restarts on slow JVM or DB init.
🚨
Don't run naked pods in production. A pod deleted or node lost is gone forever — nothing recreates it. Always use a Deployment (or StatefulSet/DaemonSet) to manage pod lifecycle.
⚡ Quick Check
Two containers in the same Pod want to communicate. What address should they use?
A
The pod's cluster IP address
B
localhost (they share a network namespace)
C
The node's IP address
D
A Kubernetes Service ClusterIP
Module 05 · Core Resources
Deployments & Rollouts
Hands-On
~30 min

A Deployment manages a ReplicaSet, which manages pods. It's the primary way to run stateless applications. Deployments handle rolling updates, rollbacks, and scaling — all with zero-downtime by default.

Deployment Manifest
deployment.yamlyaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
  namespace: default
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web-app           # must match template labels
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1          # extra pods during update
      maxUnavailable: 0  # zero downtime
  template:
    metadata:
      labels:
        app: web-app
    spec:
      containers:
      - name: web
        image: nginx:1.25
        ports:
        - containerPort: 80
        resources:
          requests: { cpu: "100m", memory: "64Mi" }
          limits:   { cpu: "500m", memory: "256Mi" }
Rolling Update Workflow
bash
# Trigger a rolling update (new image)
kubectl set image deployment/web-app web=nginx:1.26

# Watch the rollout
kubectl rollout status deployment/web-app

# Inspect revision history
kubectl rollout history deployment/web-app

# Roll back to previous revision
kubectl rollout undo deployment/web-app

# Roll back to specific revision
kubectl rollout undo deployment/web-app --to-revision=2

# Scale manually
kubectl scale deployment/web-app --replicas=5
Horizontal Pod Autoscaler
bash + yaml
# Imperative
kubectl autoscale deployment/web-app --cpu-percent=70 --min=2 --max=10

# Declarative (HPA v2)
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
Module 06 · Core Resources
Services & Networking
Hands-On
~30 min

Pods are ephemeral — their IPs change on every restart. A Service provides a stable virtual IP (ClusterIP) and DNS name that load-balances traffic to the pods matching its selector.

Service Types
ClusterIP (default)
Accessible only within the cluster. Used for internal service-to-service communication. Every service gets a DNS entry: svc.namespace.svc.cluster.local
NodePort
Exposes a static port (30000–32767) on every node's IP. Useful for dev/testing. Not recommended for production — use a LoadBalancer or Ingress instead.
LoadBalancer
Provisions a cloud provider load balancer (AWS ELB, GCP LB, etc.) with a public IP. The standard way to expose services externally on managed K8s.
ExternalName
Maps a service to a DNS name outside the cluster (CNAME). Useful for migrating external services into the cluster namespace gradually.
Service + Deployment Example
service.yamlyaml
apiVersion: v1
kind: Service
metadata:
  name: web-svc
spec:
  selector:
    app: web-app     # matches Deployment pod labels
  ports:
  - port: 80         # Service port (ClusterIP)
    targetPort: 80  # container port
    protocol: TCP
  type: ClusterIP
Ingress

An Ingress routes HTTP/HTTPS traffic by hostname or path to backend services — like a reverse proxy. You need an Ingress Controller installed (nginx-ingress, Traefik, etc.) for Ingress resources to work.

ingress.yamlyaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: web-ingress
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
spec:
  ingressClassName: nginx
  rules:
  - host: app.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: web-svc
            port: { number: 80 }
💡
DNS: Within a cluster, services are reachable at <svc>.<ns>.svc.cluster.local. Within the same namespace, just <svc> works. This is served by CoreDNS running in kube-system.
Module 07 · Configuration
ConfigMaps & Secrets
Hands-On
~25 min

Kubernetes separates configuration from container images. ConfigMaps store non-sensitive configuration (env vars, config files). Secrets store sensitive data (passwords, tokens, TLS certs) — base64-encoded at rest by default, though you should enable encryption at rest in production.

ConfigMap
configmap.yamlyaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
data:
  APP_ENV: "production"
  LOG_LEVEL: "info"
  app.conf: |
    server.port=8080
    db.pool.size=10
Secret
secret.yamlyaml
apiVersion: v1
kind: Secret
metadata:
  name: db-secret
type: Opaque
data:
    # values must be base64-encoded: echo -n 'mypassword' | base64
  DB_PASSWORD: bXlwYXNzd29yZA==
  DB_USER: YWRtaW4=
stringData:          # alternative: plain text (K8s encodes it)
  API_KEY: "my-api-key-plain"
Consuming in a Pod
pod consuming configyaml
spec:
  containers:
  - name: app
    image: myapp:v1
    env:
    - name: LOG_LEVEL             # single key from ConfigMap
      valueFrom:
        configMapKeyRef:
          name: app-config
          key: LOG_LEVEL
    - name: DB_PASSWORD           # single key from Secret
      valueFrom:
        secretKeyRef:
          name: db-secret
          key: DB_PASSWORD
    envFrom:                       # ALL keys as env vars
    - configMapRef:
        name: app-config
    volumeMounts:
    - name: config-vol
      mountPath: /etc/app        # mount as files
  volumes:
  - name: config-vol
    configMap:
      name: app-config
⚠️
Secrets are not encrypted by default — only base64-encoded in etcd. For production, enable Encryption at Rest in the API server config and consider external secret managers like HashiCorp Vault, AWS Secrets Manager, or the External Secrets Operator.
Module 08 · Configuration
Persistent Storage
Intermediate
~25 min

Container filesystems are ephemeral. Volumes persist data. The storage workflow in K8s has three layers: StorageClass (how to provision), PersistentVolume (the actual storage), and PersistentVolumeClaim (a pod's request for storage).

Storage Binding Flow
StorageClass
provisioner + params
PersistentVolume
actual storage resource
PVC
claim (size + accessMode)
Pod
mounts the volume
PersistentVolumeClaim
pvc.yamlyaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: data-pvc
spec:
  accessModes:
  - ReadWriteOnce      # RWO: one node; RWX: many nodes (NFS)
  storageClassName: standard
  resources:
    requests:
      storage: 10Gi

---  # Mount in a pod
spec:
  containers:
  - name: db
    image: postgres:16
    volumeMounts:
    - name: pgdata
      mountPath: /var/lib/postgresql/data
  volumes:
  - name: pgdata
    persistentVolumeClaim:
      claimName: data-pvc
Access Modes
ModeShortMeaning
ReadWriteOnceRWOOne node can mount read/write. Most block storage (EBS, local disk).
ReadOnlyManyROXMany nodes can mount read-only.
ReadWriteManyRWXMany nodes can mount read/write. Requires shared storage: NFS, CephFS, Azure Files.
ReadWriteOncePodRWOPOnly one pod cluster-wide. Strictest isolation (K8s 1.22+).
Module 09 · Operations
RBAC & Security
Intermediate
~30 min

Role-Based Access Control (RBAC) governs who can do what to which resources. It is the primary authorization mechanism in K8s and should be enabled on every cluster.

RBAC Objects
ObjectScopePurpose
RoleNamespaceGrants permissions within one namespace
ClusterRoleClusterGrants permissions cluster-wide or for non-namespaced resources (nodes, PVs)
RoleBindingNamespaceBinds a Role (or ClusterRole) to a user/group/serviceaccount in a namespace
ClusterRoleBindingClusterBinds a ClusterRole to a subject cluster-wide
Role + Binding Example
rbac.yamlyaml
# Role: read-only access to pods in 'dev' namespace
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: pod-reader
  namespace: dev
rules:
- apiGroups: [""]            # "" = core API group
  resources: ["pods", "pods/log"]
  verbs: ["get", "list", "watch"]
---
kind: RoleBinding
metadata:
  name: read-pods
  namespace: dev
subjects:
- kind: User
  name: alice
  apiGroup: rbac.authorization.k8s.io
- kind: ServiceAccount
  name: ci-runner
  namespace: dev
roleRef:
  kind: Role
  name: pod-reader
  apiGroup: rbac.authorization.k8s.io
Verifying Permissions
bash
# Can alice delete pods in dev?
kubectl auth can-i delete pods -n dev --as=alice
# no

# Can ci-runner serviceaccount list pods?
kubectl auth can-i list pods -n dev --as=system:serviceaccount:dev:ci-runner
# yes

# What can I do? (verbose)
kubectl auth can-i --list -n dev
💡
Principle of Least Privilege: Default service accounts in most namespaces have minimal permissions. Always create dedicated ServiceAccounts for workloads that need API access, and bind only what they need.
Module 10 · Operations
Observability & Debugging
Intermediate
~30 min

When something breaks in K8s, there's a logical triage path. Start broad (cluster, nodes), narrow to workload (deployment, replicaset, pods), then drill into the container itself (logs, exec, events).

Triage Workflow
1
Check Pod Status
Start by listing pods and looking for non-Running status.
bash
kubectl get pods -n mynamespace
# Look for: Pending, CrashLoopBackOff, ImagePullBackOff, OOMKilled, Error
2
Describe the Pod
Events at the bottom of describe output are the most useful signal.
bash
kubectl describe pod mypod -n mynamespace
# Check: Events section, resource limits, node assignments, image name
3
Read Logs
Current logs, plus the previous container run if it crashed.
bash
kubectl logs mypod
kubectl logs --previous mypod     # crashed container
kubectl logs -f mypod             # follow live
kubectl logs -l app=myapp --all-containers  # all pods by label
4
Exec into the Container
For live inspection when logs aren't enough.
bash
kubectl exec -it mypod -- bash
# or sh if bash not available:
kubectl exec -it mypod -- sh
5
Deploy a Debug Pod
For network troubleshooting — run a temporary pod with networking tools.
bash
# Ephemeral debug pod (deleted on exit)
kubectl run netdbg --image=nicolaka/netshoot -it --rm -- bash

# From inside: test DNS and connectivity
nslookup web-svc.default.svc.cluster.local
curl http://web-svc/healthz
Common Pod Failure States
StatusCauseFix
PendingNo node with sufficient resources, or node selector mismatchdescribe pod → check Events; check node capacity
ImagePullBackOffWrong image name/tag, or missing registry credentialsCheck image name; create imagePullSecret
CrashLoopBackOffContainer exits repeatedlylogs --previous; fix app crash or wrong command
OOMKilledContainer exceeded memory limitIncrease resources.limits.memory
Terminating (stuck)Finalizers blocking deletionkubectl delete pod --force --grace-period=0
EvictedNode disk/memory pressureFree node resources; check eviction thresholds
Resource Monitoring
bash
# Requires metrics-server installed
kubectl top nodes
kubectl top pods -A --sort-by=memory

# Events across all resources (sorted)
kubectl get events -A --sort-by=.lastTimestamp

# Check API server health
kubectl get componentstatuses  # deprecated but sometimes useful
kubectl get --raw=/healthz
💡
For production observability, deploy the kube-prometheus-stack Helm chart (Prometheus + Grafana + Alertmanager + node-exporter + kube-state-metrics). It gives you GPU, node, and workload dashboards out of the box — familiar territory if you're running Prometheus/Grafana already.
⚡ Final Check
A pod is in CrashLoopBackOff. What's the best first step?
A
Delete and recreate the deployment
B
Drain the node it's running on
C
Run kubectl logs --previous to read the crash output
D
Increase the memory limit