DevOps & Docker Interview Questions & Answers (2026) – RankWeb3

$ docker build -t myapp .

$ docker run -d -p 3000:3000 myapp

──────────────────────────

$ kubectl apply -f deployment.yaml

deployment.apps/myapp configured

$ kubectl get pods

myapp-7d4b9c-xk8pq 1/1 Running

myapp-7d4b9c-m2nrl 1/1 Running

myapp-7d4b9c-p9vws 1/1 Running

──────────────────────────

✓ 3 replicas healthy

🐳 Docker ☸ Kubernetes All Levels 40 Questions Updated 2026

DevOps & Docker Interview
Questions & Answers

📅 Updated: March 2026

⏱️ Read time: ~30 min

🎯 40 questions — Beginner to Advanced

✍️ By RankWeb3 Team

Total Questions

Beginner

Intermediate

Advanced

🌱Beginner QuestionsQ1–Q14

What is DevOps and what problem does it solve?

BeginnerVery Common

DevOps is a culture, philosophy, and set of practices that unify software development (Dev) and operations (Ops) teams to deliver software faster, more reliably, and continuously.

The problem it solves: Historically, Dev teams wrote code and threw it "over the wall" to Ops teams to deploy and maintain. Different goals (Dev: ship fast, Ops: keep stable) caused friction, slow releases, blame culture, and fragile deployments.

Continuous Integration (CI): Developers frequently merge code; automated tests run on every change.
Continuous Delivery (CD): Code is always in a deployable state; releases are automated to staging.
Continuous Deployment: Every passing build is automatically deployed to production.
Infrastructure as Code (IaC): Servers and infrastructure defined in code, version-controlled.
Monitoring & Feedback: Measure production metrics, feed insights back to development.

💡Key metrics: Deployment frequency (how often you ship), lead time (idea → production), mean time to recovery (MTTR), change failure rate. High-performing DevOps teams deploy multiple times per day with <1% failure rate.

What is Docker and what problem does it solve?

BeginnerVery Common

Docker is an open platform for building, shipping, and running applications in containers — lightweight, portable, self-contained units that package code along with all its dependencies.

The "works on my machine" problem: Before Docker, apps would work on a developer's laptop but fail in staging/production due to different OS versions, library versions, or environment variables. Docker solves this by shipping the environment along with the code.

Docker — Basic Workflow
# Build an image from a Dockerfile:
docker build -t myapp:1.0 .

# Run a container from the image:
docker run -d \
  --name myapp \
  -p 3000:3000 \
  -e NODE_ENV=production \
  myapp:1.0

# View running containers:
docker ps

# View logs:
docker logs myapp

# Stop & remove:
docker stop myapp && docker rm myapp

# Push to registry:
docker push myrepo/myapp:1.0

Docker vs VMs: VMs virtualise an entire machine (OS + hardware). Containers share the host OS kernel — much lighter (MBs vs GBs), start in milliseconds vs minutes.

What is a Dockerfile and what are its key instructions?

BeginnerVery Common

A Dockerfile is a text file containing a series of instructions that Docker reads to automatically build an image. Each instruction creates a new layer in the image.

Dockerfile — Node.js App Example
# Base image
FROM node:20-alpine

# Set working directory
WORKDIR /app

# Copy dependency files FIRST (layer caching!)
COPY package*.json ./

# Install dependencies
RUN npm ci --only=production

# Copy source code
COPY . .

# Expose port (documentation only — doesn't publish)
EXPOSE 3000

# Create non-root user (security best practice)
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
USER appuser

# Command to run when container starts
CMD ["node", "server.js"]

Instruction	Purpose
`FROM`	Base image to build from
`WORKDIR`	Set working directory inside container
`COPY` / `ADD`	Copy files into image (ADD also handles URLs & tar extraction)
`RUN`	Execute command during build (creates a layer)
`ENV`	Set environment variables
`EXPOSE`	Document which port the container listens on
`CMD`	Default command when container starts (overridable)
`ENTRYPOINT`	Main command (not easily overridden)
`ARG`	Build-time variables (not in final image)
`VOLUME`	Declare mount point for persistent data

What is the difference between a Docker image and a Docker container?

BeginnerVery Common

Feature	Docker Image	Docker Container
What it is	Blueprint / template (read-only layers)	Running instance of an image
State	Immutable — never changes	Has writable layer on top
Analogy	Like a class definition	Like an object/instance
Storage	Shared across containers	Adds thin writable layer per container
Created by	`docker build`	`docker run`
Data persistence	Persists until deleted	Data lost when removed (use volumes)

Docker
# One image → many containers:
docker run -d --name web1 nginx:latest
docker run -d --name web2 nginx:latest
docker run -d --name web3 nginx:latest
# All 3 containers share the nginx image layers
# Each has its own writable container layer on top

# Image layers (union filesystem):
# [Read-only] Layer 3: COPY app/ .
# [Read-only] Layer 2: RUN npm install
# [Read-only] Layer 1: FROM node:20
# [Writable ] Container layer (per container)

What is Docker layer caching and how do you optimise it?

BeginnerVery Common

Docker caches each layer. If a layer hasn't changed since the last build, Docker reuses the cached version — making subsequent builds much faster. Once a layer is invalidated, all subsequent layers are rebuilt.

Dockerfile — Cache Optimisation
# ❌ BAD — copies everything first, npm install rebuilds every time:
COPY . .
RUN npm install

# ✅ GOOD — copy package.json first, install deps, THEN copy source:
COPY package*.json ./   ← only changes when deps change
RUN npm ci              ← cached unless package.json changed
COPY . .               ← only invalidates last layers

# Ordering rules:
# 1. Put rarely-changing instructions early
# 2. Put frequently-changing instructions (COPY source) late
# 3. Combine RUN commands to reduce layers:
RUN apt-get update \
    && apt-get install -y curl \
    && rm -rf /var/lib/apt/lists/*  ← clean up in same layer!

What is Docker Compose and what is it used for?

BeginnerVery Common

Docker Compose is a tool for defining and running multi-container Docker applications using a single YAML file. Instead of running multiple docker run commands, you declare all services, networks, and volumes in one file.

YAML — docker-compose.yml
version: '3.9'

services:
  api:
    build: .
    ports: ["3000:3000"]
    environment:
      - NODE_ENV=production
      - DB_URL=mongodb://mongo:27017/mydb
    depends_on:
      mongo:
        condition: service_healthy
    restart: unless-stopped

  mongo:
    image: mongo:7
    volumes: ["mongo_data:/data/db"]
    healthcheck:
      test: ["CMD", "mongosh", "--eval", "db.adminCommand('ping')"]
      interval: 10s
      retries: 5

  redis:
    image: redis:7-alpine
    ports: ["6379:6379"]

volumes:
  mongo_data:

# Run all services:
# docker compose up -d
# docker compose down
# docker compose logs -f api

What are Docker volumes and why are they needed?

BeginnerVery Common

Container filesystems are ephemeral — data written inside a container is lost when the container is removed. Volumes provide persistent storage that survives container restarts and removals.

Type	Description	Best for
Named Volume	Managed by Docker, stored in Docker's storage area	Production databases, app state
Bind Mount	Maps a host directory into the container	Development (live code reload)
tmpfs Mount	Stored in host memory only (not persisted)	Sensitive data, temp files

Docker — Volumes
# Named volume — managed by Docker:
docker volume create postgres_data
docker run -v postgres_data:/var/lib/postgresql/data postgres:16

# Bind mount — host directory ↔ container directory:
docker run -v $(pwd)/src:/app/src node:20  ← live reload in dev

# In docker-compose.yml:
volumes:
  - ./src:/app/src           # bind mount
  - postgres_data:/var/lib/postgresql/data  # named vol

# Volume commands:
docker volume ls
docker volume inspect postgres_data
docker volume rm postgres_data

What is CI/CD and what is the difference between Continuous Delivery and Continuous Deployment?

BeginnerVery Common

CI/CD is the practice of automating the integration, testing, and delivery of code changes.

Stage	What happens	Trigger
Continuous Integration	Merge frequently → run automated tests + static analysis on every push	Every git push / PR
Continuous Delivery	Every passing build is automatically deployed to staging/pre-prod. Release to prod requires manual approval.	After CI passes
Continuous Deployment	Every passing build automatically goes all the way to production. No manual step.	After CI passes (fully automated)

YAML — GitHub Actions CI/CD Pipeline
name: CI/CD Pipeline
on: [push]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npm ci && npm test

  build-and-push:
    needs: test
    steps:
      - run: docker build -t myapp:${{ github.sha }} .
      - run: docker push myrepo/myapp:${{ github.sha }}

  deploy-staging:
    needs: build-and-push
    environment: staging   # auto-deploy staging
    run: kubectl set image deployment/myapp ...

  deploy-production:
    needs: deploy-staging
    environment:
      name: production
      url: https://myapp.com   # requires manual approval

What is Kubernetes and what problems does it solve?

BeginnerVery Common

Kubernetes (K8s) is an open-source container orchestration platform that automates deployment, scaling, and management of containerised applications across a cluster of machines.

Problems it solves:

Container scheduling: Decides which node in the cluster to run each container on based on resource availability.
Auto-scaling: Automatically scales pods up/down based on CPU, memory, or custom metrics.
Self-healing: Automatically restarts failed containers, replaces unhealthy nodes, kills containers that fail health checks.
Rolling updates & rollbacks: Deploy new versions with zero downtime; instantly roll back if something goes wrong.
Service discovery & load balancing: Pods get DNS names; Services load-balance traffic across them.
Secret & config management: Store sensitive data and configuration separately from container images.

💡Docker vs Kubernetes: Docker runs containers on a single machine. Kubernetes orchestrates containers across many machines. They complement each other — Docker packages the app, Kubernetes runs it at scale.

What are the core Kubernetes objects — Pod, Deployment, Service, and Namespace?

BeginnerVery Common

YAML — Kubernetes Core Objects
# POD — smallest deployable unit. Runs one or more containers.
apiVersion: v1
kind: Pod
metadata:
  name: myapp-pod
spec:
  containers:
  - name: myapp
    image: myapp:1.0
    ports: [{containerPort: 3000}]

---
# DEPLOYMENT — manages a ReplicaSet of Pods. Handles rolling updates.
apiVersion: apps/v1
kind: Deployment
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate: {maxSurge: 1, maxUnavailable: 0}

---
# SERVICE — stable DNS name + IP that load-balances across matching pods.
apiVersion: v1
kind: Service
spec:
  selector: {app: myapp}   # routes to pods with this label
  ports: [{port: 80, targetPort: 3000}]
  type: ClusterIP  # internal | NodePort | LoadBalancer

---
# NAMESPACE — logical isolation within a cluster.
# kubectl create namespace production
# kubectl get pods -n production

What is Infrastructure as Code (IaC) and what are common tools?

BeginnerVery Common

Infrastructure as Code defines and provisions infrastructure (servers, networks, databases, load balancers) using machine-readable configuration files — stored in version control — instead of manual processes or UIs.

Benefits: Reproducible environments, version-controlled changes, peer review via PRs, automated provisioning, disaster recovery (recreate from code in minutes).
Terraform (HashiCorp): Declarative HCL language. Cloud-agnostic — works with AWS, GCP, Azure. Manages state file tracking what's deployed. Most widely used IaC tool.
AWS CloudFormation: AWS-native IaC in YAML/JSON. Tight AWS integration but vendor-locked.
Pulumi: IaC using real programming languages (TypeScript, Python, Go). Great for complex logic.
Ansible: Configuration management + provisioning in YAML (playbooks). Agentless — uses SSH.
Helm: Package manager for Kubernetes. Charts are templated K8s manifests.

HCL — Terraform Example
# Provision an EC2 instance:
resource "aws_instance" "web" {
  ami           = "ami-0abcdef1234567890"
  instance_type = "t3.micro"
  tags = { Name = "web-server" }
}

# terraform init → terraform plan → terraform apply

What is the difference between containers and virtual machines?

BeginnerVery Common

Feature	Virtual Machine (VM)	Container
OS	Full OS per VM (GBs)	Shares host OS kernel (MBs)
Start time	Minutes	Milliseconds
Size	GBs	MBs
Isolation	Strong (hardware-level)	Process-level (namespace/cgroups)
Portability	Heavy, hypervisor dependent	Highly portable
Use case	Run different OS, strong security isolation	Microservices, CI/CD, cloud-native apps
Examples	VMware, VirtualBox, AWS EC2 instances	Docker, containerd, podman

ℹ️In practice, containers run inside VMs in cloud environments. AWS EKS nodes are EC2 VMs running containerd. You get the security of VMs and the efficiency of containers.

What is a .dockerignore file?

BeginnerCommon

A .dockerignore file works like .gitignore — it tells Docker which files and directories to exclude when sending the build context to the Docker daemon. Smaller build context = faster builds and smaller images.

.dockerignore
# Dependencies (will be installed fresh)
node_modules/
vendor/

# Version control
.git/
.gitignore

# Local environment
.env
.env.local
.env.*.local

# Development files
*.md
*.test.js
*.spec.ts
coverage/
.nyc_output/

# Build artifacts
dist/
build/
*.log

# Docker files themselves
Dockerfile*
docker-compose*

⚠️Never copy .env files into Docker images. Secrets baked into images can be extracted from image layers even if you delete them in a later layer. Use runtime environment variables or secret managers (Vault, AWS Secrets Manager) instead.

What is a container registry and what are common options?

BeginnerCommon

A container registry is a repository for storing, versioning, and distributing Docker images. Like GitHub for code, but for container images.

Registry	Type	Notes
Docker Hub	Public/Private	Default registry. Free tier has pull limits.
AWS ECR	Private (AWS)	Tight IAM integration, lifecycle policies
Google Artifact Registry	Private (GCP)	Replaced GCR, supports multiple formats
Azure Container Registry	Private (Azure)	Integrated with AKS
GitHub Container Registry	Public/Private	Free for public, integrated with Actions
Harbor	Self-hosted	Open-source, vulnerability scanning

Docker — Registry Operations
# Tag image for a registry:
docker tag myapp:1.0 123456789.dkr.ecr.us-east-1.amazonaws.com/myapp:1.0

# Login to ECR:
aws ecr get-login-password | docker login --username AWS \
  --password-stdin 123456789.dkr.ecr.us-east-1.amazonaws.com

# Push image:
docker push 123456789.dkr.ecr.us-east-1.amazonaws.com/myapp:1.0

# Pull image:
docker pull 123456789.dkr.ecr.us-east-1.amazonaws.com/myapp:1.0

⚡Intermediate QuestionsQ15–Q28

What is multi-stage Docker build and why use it?

IntermediateVery Common

Multi-stage builds use multiple FROM instructions in a single Dockerfile, allowing you to use a heavy build environment but produce a tiny final image containing only what's needed at runtime.

Dockerfile — Multi-Stage Build
# ── Stage 1: Builder ────────────────────────
FROM node:20 AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build      ← compiles TypeScript, bundles etc.

# ── Stage 2: Production ─────────────────────
FROM node:20-alpine AS production
WORKDIR /app

# Copy ONLY the build artifacts from builder stage:
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules

USER node
CMD ["node", "dist/server.js"]

# Result:
# Builder stage: ~1.2 GB (node:20 + devDependencies + source)
# Final image:   ~180 MB (alpine + prod deps + dist only)
# 85% smaller image → faster pulls, reduced attack surface

What is Kubernetes ConfigMap and Secret?

IntermediateVery Common

Feature	ConfigMap	Secret
Use for	Non-sensitive config (URLs, feature flags, ports)	Sensitive data (passwords, API keys, certs)
Encoding	Plain text	Base64 encoded (not encrypted by default!)
etcd storage	Unencrypted	Can be encrypted at rest (requires config)
Access	Env vars or mounted files	Env vars or mounted files

YAML — ConfigMap & Secret
apiVersion: v1
kind: ConfigMap
metadata: {name: app-config}
data:
  API_URL: "https://api.example.com"
  LOG_LEVEL: "info"
---
apiVersion: v1
kind: Secret
type: Opaque
data:
  DB_PASSWORD: bXlTZWNyZXRQYXNzd29yZA==  # base64
---
# Use in a Pod:
env:
- name: API_URL
  valueFrom:
    configMapKeyRef: {name: app-config, key: API_URL}
- name: DB_PASSWORD
  valueFrom:
    secretKeyRef: {name: app-secret, key: DB_PASSWORD}

⚠️Kubernetes Secrets are NOT encrypted by default — they're just base64 encoded. Enable encryption at rest, use RBAC to restrict access, or integrate with external secret managers (AWS Secrets Manager, HashiCorp Vault, External Secrets Operator).

What is Kubernetes Ingress and how does it differ from a Service?

IntermediateVery Common

A Service routes traffic to pods inside the cluster. An Ingress is a layer 7 (HTTP/HTTPS) routing resource that routes external traffic to internal Services based on hostname/path rules.

YAML — Kubernetes Ingress
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: app-ingress
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
    cert-manager.io/cluster-issuer: letsencrypt
spec:
  tls:
  - hosts: [myapp.com]
    secretName: myapp-tls
  rules:
  - host: myapp.com
    http:
      paths:
      - path: /api
        pathType: Prefix
        backend:
          service: {name: api-service, port: {number: 80}}
      - path: /
        backend:
          service: {name: frontend-service, port: {number: 80}}

Ingress Controllers: NGINX Ingress (most popular), Traefik, AWS ALB Ingress Controller, HAProxy. The controller reads Ingress resources and configures the underlying load balancer accordingly.

What is Horizontal Pod Autoscaler (HPA) in Kubernetes?

IntermediateVery Common

The Horizontal Pod Autoscaler automatically scales the number of pod replicas in a Deployment based on observed CPU utilisation, memory, or custom metrics — ensuring your app handles traffic spikes without manual intervention.

YAML + Shell — HPA
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp
  minReplicas: 2
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70  # scale up when CPU > 70%
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

# Quick HPA via kubectl:
kubectl autoscale deployment myapp --cpu-percent=70 --min=2 --max=20
kubectl get hpa

💡Requires Metrics Server installed in the cluster. For custom metrics (queue depth, RPS), use KEDA (Kubernetes Event-driven Autoscaling) which integrates with 50+ external systems including Kafka, SQS, Redis, Prometheus.

What are liveness, readiness, and startup probes in Kubernetes?

IntermediateVery Common

Probe	When fails	K8s action
Liveness	Container is alive but stuck (deadlock)	Restart the container
Readiness	Container is not ready to serve traffic (warming up, db connecting)	Remove from Service endpoints (stop routing traffic)
Startup	Slow-starting app hasn't started yet	Delays liveness/readiness checks until startup succeeds

YAML — Health Probes
containers:
- name: myapp
  livenessProbe:
    httpGet: {path: /health, port: 3000}
    initialDelaySeconds: 15
    periodSeconds: 20
    failureThreshold: 3

  readinessProbe:
    httpGet: {path: /ready, port: 3000}
    periodSeconds: 5
    failureThreshold: 3

  startupProbe:
    httpGet: {path: /health, port: 3000}
    failureThreshold: 30    # 30 * 10s = 5 min to start
    periodSeconds: 10

# /health endpoint — is the app alive?
# /ready endpoint — can it serve traffic? (checks DB, cache)

What is Docker networking? Explain bridge, host, and overlay networks.

IntermediateCommon

Network Type	Isolation	Use case
bridge (default)	Private network on single host. Containers communicate by container name.	Multi-container apps on same host (Compose)
host	Container shares host's network namespace. No network isolation.	Max performance, when port mapping overhead matters
none	Complete network isolation	Batch jobs, maximum security
overlay	Spans multiple Docker hosts (swarm)	Docker Swarm, multi-host communication
macvlan	Container gets own MAC address on physical network	Legacy apps expecting direct network access

Docker — Networking
# Create a custom bridge network:
docker network create mynet

# Run containers in same network (can talk by name):
docker run -d --network mynet --name api   myapp
docker run -d --network mynet --name mongo mongo:7
# api container can reach mongo with: mongodb://mongo:27017

# Inspect network:
docker network inspect mynet

# Compose auto-creates a bridge network per project
# Services reference each other by service name

What is monitoring and observability in DevOps? What is the Prometheus + Grafana stack?

IntermediateVery Common

Prometheus is a time-series metrics database that scrapes metrics from targets (apps, nodes, K8s). Grafana is a visualisation platform that queries Prometheus and displays dashboards.

Node.js — Expose Prometheus Metrics
const { Registry, Counter, Histogram } = require('prom-client');
const register = new Registry();

// Counter — monotonically increasing:
const httpRequests = new Counter({
  name: 'http_requests_total',
  help: 'Total HTTP requests',
  labelNames: ['method', 'status'],
  registers: [register],
});

// Histogram — request duration buckets:
const httpDuration = new Histogram({
  name: 'http_duration_seconds',
  help: 'HTTP request duration',
  labelNames: ['method', 'route'],
  registers: [register],
});

// Expose /metrics endpoint:
app.get('/metrics', async (req, res) => {
  res.set('Content-Type', register.contentType);
  res.end(await register.metrics());
});

Key Prometheus concepts: Scrape interval (how often to collect), retention period, PromQL (query language for metrics), AlertManager (route alerts to PagerDuty, Slack, email).

What is a Kubernetes StatefulSet vs Deployment?

IntermediateVery Common

Feature	Deployment	StatefulSet
Pod identity	Random names (myapp-xyz123)	Stable, ordered names (myapp-0, myapp-1)
Storage	Shared or ephemeral	Stable PersistentVolume per pod
Scaling order	Any order	Sequential (0→1→2 up, 2→1→0 down)
DNS	Service DNS only	Each pod gets stable DNS hostname
Use case	Stateless apps (web servers, APIs)	Stateful apps (databases, Kafka, Zookeeper)

YAML — StatefulSet (MongoDB)
kind: StatefulSet
spec:
  serviceName: "mongo"
  replicas: 3
  volumeClaimTemplates:         # each pod gets own PVC!
  - metadata: {name: mongo-storage}
    spec:
      accessModes: ["ReadWriteOnce"]
      resources:
        requests: {storage: 10Gi}

# Pods: mongo-0, mongo-1, mongo-2
# DNS:  mongo-0.mongo.default.svc.cluster.local

What is a rolling update vs blue/green vs canary deployment?

IntermediateVery Common

Strategy	How it works	Risk	Cost
Rolling Update	Gradually replace old pods with new. Traffic shifts as pods become ready.	Both versions live simultaneously briefly	No extra cost
Blue/Green	Run two full environments (blue=current, green=new). Switch traffic all at once via DNS/LB.	Instant rollback — switch back	2× infrastructure cost during switch
Canary	Send small % of traffic (5–10%) to new version. Monitor. Gradually increase or roll back.	Only affects small % of users if fails	Slightly more infra
Recreate	Stop all old pods, start all new ones. Downtime.	Downtime during update	Minimal

YAML — Canary with Nginx Ingress
# Canary ingress — route 10% of traffic to new version:
metadata:
  annotations:
    nginx.ingress.kubernetes.io/canary: "true"
    nginx.ingress.kubernetes.io/canary-weight: "10"  ← 10%

What are Kubernetes resource requests and limits?

IntermediateVery Common

Resource requests and limits tell Kubernetes how much CPU and memory a container needs and the maximum it can use. This enables the scheduler to place pods efficiently and prevent resource contention.

YAML — Resource Requests & Limits
containers:
- name: myapp
  resources:
    requests:          # guaranteed resources (for scheduling)
      memory: "128Mi"  # mibibytes
      cpu:    "100m"   # millicores (100m = 0.1 CPU core)
    limits:             # maximum allowed
      memory: "512Mi"  # OOMKilled if exceeded
      cpu:    "500m"   # throttled (not killed) if exceeded

# QoS Classes (determines eviction priority):
# Guaranteed: requests == limits → never evicted first
# Burstable:  requests < limits
# BestEffort: no requests/limits → evicted first

⚠️OOMKilled (Out Of Memory Killed) happens when a container exceeds its memory limit. Check kubectl describe pod for OOMKilled events. Increase memory limit or fix memory leak.

What is Helm and why is it used with Kubernetes?

IntermediateVery Common

Helm is the package manager for Kubernetes. It bundles related K8s manifests into a Chart — a versioned, parameterisable package. Instead of managing dozens of YAML files, you deploy with one command and customise with values.

Shell — Helm Commands
# Add a chart repository:
helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo update

# Install PostgreSQL with custom values:
helm install my-postgres bitnami/postgresql \
  --set auth.postgresPassword=mypassword \
  --set primary.persistence.size=20Gi \
  --namespace production

# Upgrade a release:
helm upgrade my-postgres bitnami/postgresql --set image.tag=16.2.0

# Rollback to previous version:
helm rollback my-postgres 1

# Template a chart to see generated YAML:
helm template my-chart ./mychart -f values-prod.yaml

# List releases:
helm list -n production

Chart structure: Chart.yaml (metadata), values.yaml (default values), templates/ (K8s manifests with Go templating), charts/ (dependencies).

What is GitOps and how does ArgoCD/Flux implement it?

IntermediateCommon

GitOps is a DevOps practice where Git is the single source of truth for infrastructure and application configuration. The cluster continuously syncs itself to match the desired state declared in Git.

Push-based (traditional CI/CD): CI pipeline pushes changes to the cluster (kubectl apply). Problem: pipeline needs cluster credentials, drift goes undetected.
Pull-based (GitOps): An agent running inside the cluster watches a Git repo. When it detects drift (cluster ≠ Git), it automatically reconciles. Credentials never leave the cluster.

Shell — ArgoCD
# Install ArgoCD:
kubectl create namespace argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml

# Create an Application (sync Git → cluster):
argocd app create myapp \
  --repo https://github.com/myorg/k8s-configs \
  --path ./production \
  --dest-server https://kubernetes.default.svc \
  --dest-namespace production \
  --sync-policy automated    ← auto-sync on git push

# Benefits:
# - Full audit trail (who changed what, when)
# - Easy rollback (git revert)
# - Drift detection and auto-healing
# - No kubectl credentials in CI pipelines

What are Linux namespaces and cgroups? How do they enable containers?

IntermediateCommon

Containers aren't magic — they're Linux processes using two kernel features: namespaces for isolation and cgroups for resource limits.

Namespace	Isolates
`pid`	Process IDs — container sees only its own processes
`net`	Network interfaces — container gets its own network stack
`mnt`	Mount points — container has its own filesystem view
`uts`	Hostname and domain name
`ipc`	Inter-process communication
`user`	User/group IDs — map container root to non-root on host

cgroups (control groups): Limit and account for resource usage (CPU, memory, I/O, network) per group of processes. When you set --memory=512m on a container, Docker configures a cgroup to enforce that limit.

💡Container = process(es) + namespaces + cgroups. There is no hypervisor, no separate kernel. This is why containers are so lightweight — they share the host kernel but see an isolated view of resources.

What is container security and what are Docker security best practices?

IntermediateVery Common

Dockerfile & Shell — Security Best Practices
# 1. Use specific image tags (not :latest):
FROM node:20.11.1-alpine3.19  ← pinned, reproducible

# 2. Run as non-root user:
RUN addgroup -S app && adduser -S app -G app
USER app

# 3. Read-only filesystem at runtime:
docker run --read-only --tmpfs /tmp myapp

# 4. Drop all capabilities, add only what's needed:
docker run --cap-drop=ALL --cap-add=NET_BIND_SERVICE myapp

# 5. No new privileges escalation:
docker run --security-opt=no-new-privileges myapp

# 6. Scan images for vulnerabilities:
docker scout cves myapp:latest
trivy image myapp:latest

# 7. Use multi-stage builds (smaller attack surface)
# 8. Never store secrets in images or ENV baked at build time
# 9. Sign images with Docker Content Trust
export DOCKER_CONTENT_TRUST=1

🔥Advanced QuestionsQ29–Q40

What is a Kubernetes Operator and when would you build one?

AdvancedCommon

A Kubernetes Operator extends K8s with a custom controller that encodes operational knowledge for managing stateful applications. It uses Custom Resource Definitions (CRDs) to define new resource types, and a controller loop to reconcile desired vs actual state.

CRD: Defines a new resource type (e.g., PostgresCluster). Users create instances of this resource just like Deployments.
Controller: Watches CRD instances, compares desired state to actual state, makes changes to converge them — automated DBA/SRE knowledge.
When to build one: Complex stateful app lifecycle that kubectl alone can't manage — automated backup/restore, version upgrades with data migration, failover, scaling with rebalancing.

Shell — Using an Operator (CrunchyData PGO)
# Install PGO (PostgreSQL Operator):
kubectl apply -k https://github.com/CrunchyData/postgres-operator-examples/kustomize/install

# Create a PostgreSQL cluster (CRD instance):
apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster   ← custom resource!
metadata: {name: mydb}
spec:
  instances:
  - replicas: 3
    dataVolumeClaimSpec:
      resources: {requests: {storage: 100Gi}}
  backups:
    pgbackrest:
      repos: [{name: repo1, s3: {...}}]
# Operator handles: HA setup, streaming replication, backups, failover

Operator frameworks: Operator SDK (Go), Kopf (Python), kubebuilder. Popular operators: Prometheus Operator, Cert-Manager, Strimzi (Kafka), CloudNativePG.

What is a service mesh and what does Istio/Linkerd provide?

AdvancedCommon

A service mesh adds a transparent infrastructure layer for microservice-to-microservice communication — implementing cross-cutting concerns without code changes via sidecar proxies.

Sidecar pattern: Each pod gets an injected proxy (Envoy for Istio) that intercepts all traffic in/out of the pod.
mTLS: Mutual TLS encryption and authentication between all services — automatically, without app code changes.
Traffic management: Fine-grained routing (canary, A/B, weight-based), retries, timeouts, circuit breaking — all configured as K8s resources.
Observability: Distributed tracing (Jaeger), metrics (Prometheus), and access logs — automatically for all services.
Istio: Feature-rich, more complex. Uses Envoy sidecar + Istiod control plane.
Linkerd: Simpler, lighter, written in Rust. Ultra-low latency overhead (<1ms). Easier to operate.
Cilium (eBPF-based): Next-gen — implements service mesh at the kernel level using eBPF, no sidecar required. Lower overhead.

What is the ELK Stack and how is it used for log management?

AdvancedVery Common

The ELK Stack (now Elastic Stack) is the most popular open-source log management solution.

Component	Role
Elasticsearch	Distributed search and analytics engine. Stores and indexes logs. Near real-time full-text search.
Logstash	Log pipeline — collects, transforms, and ships logs from multiple sources to Elasticsearch.
Kibana	Web UI for searching, visualising, and dashboarding Elasticsearch data.
Beats/Filebeat	Lightweight log shippers. Run on each server/pod to tail log files and send to Logstash or Elasticsearch.

YAML — Fluent Bit (K8s log shipping)
# Modern alternative: Grafana Loki + Promtail (pull-based, cheaper)
# Or Vector.dev (Rust-based, very fast)

# Fluent Bit DaemonSet — runs on every node:
kind: DaemonSet      ← one pod per node
spec:
  containers:
  - image: fluent/fluent-bit:latest
    volumeMounts:
    - name: varlog
      mountPath: /var/log   ← reads all pod logs

Alternative: Grafana Loki — Like Prometheus but for logs. Only indexes metadata (labels), not full text — much cheaper to run at scale. Pairs with Promtail (shipper) and Grafana (visualisation).

What is Kubernetes RBAC (Role-Based Access Control)?

AdvancedVery Common

RBAC controls who can do what with which Kubernetes resources. It uses four main objects: Role, ClusterRole, RoleBinding, and ClusterRoleBinding.

YAML — Kubernetes RBAC
# Role — permissions scoped to a namespace:
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata: {name: pod-reader, namespace: production}
rules:
- apiGroups: [""]
  resources: ["pods", "pods/log"]
  verbs: ["get", "list", "watch"]

---
# RoleBinding — assign role to a user/group/serviceaccount:
kind: RoleBinding
subjects:
- kind: User
  name: "meraj@company.com"
- kind: ServiceAccount
  name: "ci-pipeline"
roleRef:
  kind: Role
  name: pod-reader

# ClusterRole — cluster-wide permissions (no namespace)
# Principle of least privilege: give minimum permissions needed

How do you implement zero-downtime deployments in Kubernetes?

AdvancedVery Common

YAML — Zero-Downtime Deployment Config
spec:
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1          # allow 1 extra pod during update
      maxUnavailable: 0    # never reduce below desired

  template:
    spec:
      containers:
      - readinessProbe:        ← ensures pod is ready before traffic
          httpGet: {path: /ready, port: 3000}

      lifecycle:
        preStop:              ← graceful shutdown hook
          exec:
            command: ["/bin/sh", "-c", "sleep 5"]

      terminationGracePeriodSeconds: 30  ← wait for in-flight requests

# Zero-downtime checklist:
# 1. Readiness probe (don't send traffic to unready pods)
# 2. maxUnavailable: 0 (never scale down before new pod ready)
# 3. preStop hook + terminationGracePeriod (drain in-flight requests)
# 4. App handles SIGTERM gracefully (stop accepting, finish, exit)
# 5. Multiple replicas (at least 2)

What is container image scanning and supply chain security?

AdvancedCommon

Supply chain attacks target the software build process itself — injecting malicious code into dependencies, base images, or build systems. Container image security involves scanning, signing, and policy enforcement at every stage.

Shell — Supply Chain Security
# 1. Vulnerability scanning with Trivy:
trivy image --exit-code 1 --severity HIGH,CRITICAL myapp:latest

# 2. Generate SBOM (Software Bill of Materials):
syft myapp:latest -o spdx-json > sbom.json
# Tracks all packages in the image

# 3. Sign images with Sigstore/cosign:
cosign sign --key cosign.key myapp:latest
cosign verify --key cosign.pub myapp:latest

# 4. Enforce policies with Kyverno/OPA Gatekeeper:
# Reject pods using :latest tag
# Require images from approved registry only
# Require non-root user

# 5. CI/CD pipeline gates:
# Block deployment if critical CVEs found
# Block if image not signed by known key

SLSA framework (Supply chain Levels for Software Artifacts): Google's framework for hardening the build pipeline — hermetic builds, provenance attestation, two-person review. Level 4 = highest assurance.

What is eBPF and why is it transforming DevOps/networking?

Advanced

eBPF (extended Berkeley Packet Filter) allows custom programs to run safely inside the Linux kernel without changing kernel source or loading modules. It's revolutionising networking, security, and observability.

How it works: Write eBPF programs in restricted C. Kernel verifier ensures safety. JIT-compiled to native instructions. Runs at hook points (syscalls, network, tracing events) with minimal overhead.
Observability (Pixie, Hubble): Capture any system event — network connections, file I/O, syscalls — without modifying application code. Get golden signal metrics automatically for every service.
Networking (Cilium): Replace kube-proxy with eBPF-based load balancing. Faster than iptables at scale. Service mesh capabilities without sidecars.
Security (Falco, Tetragon): Runtime security — detect suspicious syscalls (privilege escalation, unexpected network connections) instantly at kernel level.
Performance profiling (Parca, Pyroscope): Always-on continuous profiling with near-zero overhead. Profile any process in production without instrumentation.

💡Why it matters: Traditional sidecars add ~100MB memory and ~1ms latency per service. eBPF achieves the same observability/security with <1% overhead, no code changes, and no sidecar injection.

What is distributed tracing and how do you implement it?

AdvancedVery Common

Distributed tracing tracks a single request as it flows through multiple microservices, showing exactly where time is spent. Each request gets a unique trace ID propagated through all services via HTTP headers.

Node.js — OpenTelemetry Tracing
const { NodeSDK } = require('@opentelemetry/sdk-node');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-http');

const sdk = new NodeSDK({
  traceExporter: new OTLPTraceExporter({
    url: 'http://jaeger:4318/v1/traces',
  }),
  instrumentations: [getNodeAutoInstrumentations()],
});
sdk.start(); // auto-instruments Express, MongoDB, Redis, http

// Manual span for custom operations:
const { trace } = require('@opentelemetry/api');
const tracer = trace.getTracer('myapp');

async function processOrder(orderId) {
  const span = tracer.startSpan('process_order');
  span.setAttribute('order.id', orderId);
  try {
    await fulfillOrder(orderId);
  } catch (err) {
    span.recordException(err);
    span.setStatus({ code: SpanStatusCode.ERROR });
  } finally {
    span.end();
  }
}

Tools: OpenTelemetry (standard SDK), Jaeger/Tempo (backends), Grafana Tempo (cheap, pairs with Loki+Prometheus), AWS X-Ray, Datadog APM.

What is Kubernetes network policy?

AdvancedCommon

By default, all pods in a Kubernetes cluster can communicate with each other. Network Policies act as a firewall — restricting which pods can talk to which other pods and on which ports.

YAML — Network Policy (Zero-Trust)
# Default deny all traffic in namespace:
kind: NetworkPolicy
spec:
  podSelector: {}    # matches ALL pods
  policyTypes: [Ingress, Egress]
  # no ingress/egress rules = deny all
---
# Allow only api → database traffic on port 5432:
kind: NetworkPolicy
metadata: {name: allow-api-to-db}
spec:
  podSelector:
    matchLabels: {app: postgres}
  policyTypes: [Ingress]
  ingress:
  - from:
    - podSelector:
        matchLabels: {app: api}
    ports: [{protocol: TCP, port: 5432}]

⚠️Network Policies require a network plugin (CNI) that supports them — Calico, Cilium, Weave Net. The default kubenet CNI on many managed clusters does NOT enforce NetworkPolicies. Check before relying on them for security.

What is chaos engineering and how do you implement it?

AdvancedCommon

Chaos engineering deliberately injects failures into a system to discover weaknesses before they cause incidents. The goal is to build confidence that the system will withstand turbulent, real-world conditions.

Principles: Start with a hypothesis (the system will recover from X). Run experiments. Measure impact. Fix weaknesses discovered.
Types of experiments: Kill random pods, add network latency/packet loss, exhaust CPU/memory, kill nodes, trigger DNS failures, cut off database connections.

Shell — Chaos Engineering Tools
# 1. Chaos Mesh (CNCF project for Kubernetes):
kind: PodChaos
spec:
  action: pod-kill
  selector:
    namespaces: [production]
    labelSelectors: {app: myapp}
  scheduler: {cron: "@every 10m"}  ← continuous chaos

# 2. Litmus Chaos:
kubectl apply -f https://hub.litmuschaos.io/api/chaos/pod-delete

# 3. Simple pod deletion (poor man's chaos):
kubectl delete pod $(kubectl get pods -l app=myapp -o name | shuf -n 1)

# 4. Network chaos with tc (traffic control):
tc qdisc add dev eth0 root netem delay 100ms loss 5%

# GameDay: scheduled chaos experiments with entire team watching

What is the difference between PodDisruptionBudget, PodAffinity, and Taints/Tolerations?

Advanced

YAML — PDB, Affinity, Taints
# PodDisruptionBudget — limits voluntary disruptions:
kind: PodDisruptionBudget
spec:
  minAvailable: 2         # always keep at least 2 pods up
  selector:
    matchLabels: {app: myapp}
# Prevents: kubectl drain node from evicting too many pods at once

---
# Pod Affinity — schedule near/far from other pods:
affinity:
  podAntiAffinity:              # spread across nodes
    requiredDuringScheduling...:
    - labelSelector:
        matchLabels: {app: myapp}
      topologyKey: "kubernetes.io/hostname"
# Never schedule 2 myapp pods on same node

---
# Taints (on nodes) + Tolerations (on pods):
# Taint a node for GPU workloads only:
kubectl taint nodes gpu-node1 dedicated=gpu:NoSchedule

# Pod must tolerate the taint to be scheduled there:
tolerations:
- key: "dedicated"
  value: "gpu"
  effect: "NoSchedule"

What are DevOps and SRE best practices for production reliability?

AdvancedVery Common

SLOs & Error Budgets: Define SLOs (e.g. 99.9% availability). Track error budget consumption. When budget is at risk, freeze feature work, focus on reliability.
Blameless postmortems: After incidents, analyse what happened systematically — not who to blame. Document timeline, root cause, contributing factors, action items. Share learnings broadly.
On-call rotation: Developers who write code carry pagers. Shared ownership means better designed systems (you don't write fragile code if you're the one woken at 3am).
Runbooks / Playbooks: Document step-by-step response procedures for known failure modes. Reduces MTTR when engineers are stressed at 2am.
Feature flags: Decouple deployment from release. Deploy dark, enable per-user/percentage/cohort. Instant rollback without re-deploy.
Disaster recovery drills: Regularly practice restoring from backup, failing over to secondary region, recovering from a database corruption scenario. DR that's never tested doesn't work.
Capacity planning: Monitor resource trends, project growth, provision ahead of demand. Avoid reactive scaling that causes outages.
Toil reduction: SRE principle — automate repetitive operational tasks. If a human does the same thing > twice, automate it. Track toil as a metric and reduce it.

💡Google SRE Book (free online): The authoritative reference for production engineering best practices. Key concepts: error budgets, toil, CRE, postmortems, progressive rollouts, capacity planning. Essential reading for senior engineers.

🎉 The Complete Interview Hub is Live!

You've now covered all 11 topics — JavaScript, Git, Python, React, HTML & CSS, Node.js, SQL, TypeScript, System Design, DSA, and DevOps & Docker. Share RankWeb3 with anyone preparing for tech interviews.

← DSA Q&A

JavaScript TypeScript System Design DSA All Topics Articles

DevOps & Docker Interview
Questions & Answers

🌱Beginner QuestionsQ1–Q14

⚡Intermediate QuestionsQ15–Q28

🔥Advanced QuestionsQ29–Q40

🎉 The Complete Interview Hub is Live!

we believe learning is more than just acquiring knowledge

quick link

help center

Get In Touch

Info@rankweb3.com

©2026, Rankweb3. All Rights Reserved

DevOps & Docker InterviewQuestions & Answers

🌱Beginner QuestionsQ1–Q14

⚡Intermediate QuestionsQ15–Q28

🔥Advanced QuestionsQ29–Q40

🎉 The Complete Interview Hub is Live!

we believe learning is more than just acquiring knowledge

quick link

help center

Get In Touch

Info@rankweb3.com

©2026, Rankweb3. All Rights Reserved

DevOps & Docker Interview
Questions & Answers