System Design Interview Questions & Answers (2026) – RankWeb3

Client

→

CDN

↓

Load Balancer

↙↓↘

Server 1

Server 2

Server 3

↓

Redis Cache

→

Primary DB

↓

Replica × 2

⚙️ System Design All Levels 40 Questions Updated 2026

System Design Interview
Questions & Answers

📅 Updated: March 2026

⏱️ Read time: ~30 min

🎯 40 questions — Fundamental to Expert

✍️ By RankWeb3 Team

Total Questions

Fundamental

Intermediate

Advanced

🌱Fundamental ConceptsQ1–Q14

What is system design and why does it matter in interviews?

FundamentalVery Common

System design is the process of defining the architecture, components, modules, interfaces, and data flow of a system to satisfy specified requirements. In interviews, it tests whether you can think holistically about building large-scale, reliable, and maintainable software.

What interviewers assess: Ambiguity handling, requirement clarification, trade-off reasoning, component knowledge, and communication skills.
Typical format: 45–60 minutes. Open-ended prompt like "Design Twitter" or "Design a URL shortener."
Framework to follow: Clarify requirements → Estimate scale → High-level design → Deep dive components → Identify bottlenecks → Propose solutions.

💡Key insight: There is no single correct answer. Interviewers want to see your reasoning, how you identify trade-offs, and whether you can communicate decisions clearly. Always talk through your thinking out loud.

What is the difference between vertical and horizontal scaling?

FundamentalVery Common

Feature	Vertical Scaling (Scale Up)	Horizontal Scaling (Scale Out)
Method	Add more CPU/RAM/SSD to one machine	Add more machines to the pool
Limit	Hardware ceiling — finite	Theoretically unlimited
Downtime	Often requires restart	No downtime (add nodes live)
Complexity	Simple — no code changes	Higher — needs load balancer, stateless design
Cost	Expensive at high end	Commodity servers — cost-effective
Fault tolerance	Single point of failure	Redundant — node failure is survivable
Best for	Databases (often), early stage	Web/app servers, stateless services

💡In practice: Most large systems use both. Start vertical (simpler), then add horizontal as you grow. Databases often scale vertically + read replicas, while app servers scale horizontally behind a load balancer.

What is a load balancer and what are common load balancing strategies?

FundamentalVery Common

A load balancer distributes incoming traffic across multiple servers to prevent any single server from becoming overloaded. It acts as a reverse proxy and is the entry point for all client requests.

Clients

↓

Load Balancer (L4/L7)

↙

↓

↘

Server A

Server B

Server C

Load Balancer distributes traffic across healthy servers

Round Robin: Requests distributed sequentially to each server. Simple, good for uniform workloads.
Weighted Round Robin: More powerful servers get proportionally more traffic.
Least Connections: Routes to server with fewest active connections. Good for varying request durations.
IP Hash: Same client IP always goes to the same server. Useful for session stickiness.
Random: Pick a server randomly. Simple and effective at scale.
Resource Based: Route based on actual CPU/memory usage. Most accurate but needs health reporting.

L4 vs L7: Layer 4 balancers operate at the transport layer (TCP/UDP) — fast but unaware of content. Layer 7 balancers operate at the application layer — can route by URL, headers, cookies, enabling advanced rules.

What is caching and what are common caching strategies?

FundamentalVery Common

Caching stores copies of expensive-to-compute data closer to the requester to reduce latency and database load. A cache hit serves data from memory in microseconds; a cache miss falls back to the slower data source.

Strategy	How it works	Best for
Cache-Aside (Lazy)	App checks cache → miss → load from DB → populate cache	Read-heavy, infrequently updated data
Write-Through	Write to cache AND DB synchronously	Read-heavy, consistency important
Write-Behind (Write-Back)	Write to cache immediately, DB asynchronously	Write-heavy, tolerable data lag
Read-Through	Cache handles DB read on miss — app only talks to cache	Simplified app code
Refresh-Ahead	Proactively refresh cache before expiry	Predictable access patterns

Cache eviction policies: LRU (Least Recently Used — most common), LFU (Least Frequently Used), FIFO, TTL (Time To Live — expire after N seconds).

⚠️Cache invalidation is hard. Stale data, thundering herd (many requests hitting DB simultaneously on cache miss), and cache stampede are common problems. Use TTL + events-based invalidation for critical data.

What is a CDN (Content Delivery Network) and how does it work?

FundamentalVery Common

A CDN is a geographically distributed network of servers (Points of Presence / PoPs) that cache and serve content from the location closest to the user — reducing latency and origin server load.

User (Mumbai)

User (London)

User (NYC)

↓

CDN PoP India

CDN PoP UK

CDN PoP US

↓ cache miss only

Origin Server

CDN serves from nearest PoP — only misses hit origin

Static assets: Images, JS, CSS, fonts — cache with long TTL (months).
Dynamic content: Can be cached short-term (seconds) with cache-control headers.
Benefits: Reduced latency (ms vs 100ms+ round trips), DDoS protection, reduced bandwidth cost, SSL offloading.
Providers: Cloudflare, AWS CloudFront, Fastly, Akamai, Vercel Edge.

What is a database index and how does it affect performance?

FundamentalVery Common

An index is a separate data structure (usually a B-tree or hash) that the database maintains to speed up data retrieval. Without an index, the database scans every row (full table scan — O(n)). With an index, it jumps directly to matching rows (O(log n) for B-tree).

When to add indexes: Columns frequently used in WHERE, JOIN ON, ORDER BY, GROUP BY.
Types: Single-column, composite (multi-column), unique, partial (filtered), full-text, spatial.
Write overhead: Every INSERT/UPDATE/DELETE must update all affected indexes — too many indexes slow writes.
Composite index column order matters: Index on (A, B) helps queries filtering on A or (A, B), but not B alone.
Index selectivity: High-cardinality columns (email, ID) benefit most. Low-cardinality (boolean, gender) rarely worth indexing alone.

SQL — Index Strategies
-- Composite index — order matters:
CREATE INDEX idx ON orders(user_id, created_at);
-- Helps: WHERE user_id=1
-- Helps: WHERE user_id=1 AND created_at > '2026-01-01'
-- Does NOT help: WHERE created_at > '2026-01-01' alone

-- Covering index — all columns in query are in the index:
CREATE INDEX idx_cover ON orders(user_id, status, total);
-- Query SELECT total FROM orders WHERE user_id=1 AND status='paid'
-- hits index only — never reads the table row (faster!)

What is the difference between SQL and NoSQL databases?

FundamentalVery Common

Feature	SQL (Relational)	NoSQL
Schema	Fixed — must define upfront	Flexible — schema-less
Scaling	Vertical (+ read replicas)	Horizontal sharding natively
ACID	✅ Full support	Varies (eventual → strong)
Joins	✅ Built-in	❌ Must embed or reference manually
Best for	Complex relationships, financial, reporting	Flexible schema, high write throughput, huge scale

NoSQL database types and when to use each:

Document (MongoDB, Firestore): Flexible JSON documents. Good for catalogs, user profiles, CMS.
Key-Value (Redis, DynamoDB): Ultra-fast simple lookups. Sessions, caching, leaderboards.
Column-family (Cassandra, HBase): Write-optimized, time-series. IoT, logs, analytics.
Graph (Neo4j, Amazon Neptune): Relationship traversal. Social networks, recommendations, fraud detection.
Search (Elasticsearch, OpenSearch): Full-text search with relevance scoring.

What is a message queue and what problems does it solve?

FundamentalVery Common

A message queue is a component that enables asynchronous communication between services by holding messages until a consumer processes them. The producer doesn't wait for the consumer to finish.

Producer (API)

→

Message Queue

→

Consumer Workers

Producer sends fast, queue buffers, consumers process at their own pace

Problems it solves:

Decoupling: Producer and consumer don't need to know about each other.
Load leveling: Absorbs traffic spikes — queue fills up, workers process steadily.
Reliability: If consumer crashes, messages persist in queue and can be retried.
Async processing: User gets instant response; email sending / PDF generation happens in background.
Fan-out: One message delivered to multiple consumers (e.g., new user → send email + provision account + log analytics).

Tools: RabbitMQ (AMQP, complex routing), Kafka (high-throughput streaming, replay), AWS SQS (managed, simple), Redis Streams, Google Pub/Sub.

What is the CAP theorem?

FundamentalVery Common

The CAP theorem states that a distributed system can guarantee at most two of three properties simultaneously:

C — Consistency: Every read receives the most recent write or an error. All nodes see the same data at the same time.
A — Availability: Every request receives a response (not necessarily the latest data). The system never returns an error.
P — Partition Tolerance: The system continues operating even when network partitions (node communication failures) occur.

Choice	Trade-off	Real examples
CP	May reject requests to stay consistent	HBase, MongoDB (strong), Zookeeper
AP	May return stale data — eventual consistency	Cassandra, DynamoDB, CouchDB
CA	Cannot tolerate network partition — single node	Traditional RDBMS (single instance)

ℹ️Since network partitions are unavoidable in real distributed systems, P is non-negotiable. The real design decision is between C and A. Choose CP for banking/inventory. Choose AP for social media/product catalogs where eventual consistency is acceptable.

What are microservices and how do they compare to monoliths?

FundamentalVery Common

Feature	Monolith	Microservices
Deployment	Single deployable unit	Each service deploys independently
Scaling	Scale everything together	Scale only bottleneck services
Team ownership	Shared codebase — coordination needed	Each team owns their service
Technology	Single language/framework	Each service can use best tool for the job
Latency	In-process function calls (fast)	Network calls between services (slower)
Complexity	Simpler initially	Distributed systems complexity
Fault isolation	One bug can crash everything	Failures contained to one service
Best for	Small teams, early stage, simple domains	Large teams, complex domains, high scale

💡Start with a monolith, extract services when you feel the pain. Teams that go microservices-first often create distributed monoliths — all the complexity with none of the benefits. Extract a service only when a clear boundary exists and scaling/team needs justify it.

What is an API Gateway and what does it do?

FundamentalVery Common

An API Gateway is a single entry point for all client requests in a microservices architecture. It handles cross-cutting concerns so individual services don't have to implement them.

Mobile App

Web App

3rd Party

↓

API Gateway

↙

↓

↘

User Service

Order Service

Payment Service

API Gateway — single entry point, handles auth, routing, rate limiting

Authentication & Authorization: Validate JWT/API keys before forwarding to services.
Rate Limiting: Throttle clients to prevent abuse.
Routing: Route /api/users to User Service, /api/orders to Order Service.
SSL Termination: Decrypt HTTPS at the gateway; internal traffic can use HTTP.
Request/Response Transformation: Translate protocols, aggregate multiple service calls.
Caching: Cache responses at the edge.
Logging & Monitoring: Centralized observability for all traffic.

Tools: AWS API Gateway, Kong, NGINX, Traefik, Envoy.

What is database replication and what are its types?

FundamentalVery Common

Replication copies data from one database (primary) to one or more databases (replicas) for high availability and read scalability.

App Server

writes →

Primary DB

↓ replicate

App Server

reads →

Replica 1

Replica 2

Writes to primary, reads distributed across replicas

Type	How	Trade-off
Synchronous	Commit only after replica confirms write	Strong consistency, slower writes
Asynchronous	Commit immediately, replica catches up	Fast writes, possible replication lag (stale reads)
Semi-synchronous	Wait for at least one replica to confirm	Balance of both
Logical	Replicate row-level changes	Flexible, selective tables
Physical/Streaming	Copy raw WAL (write-ahead log) bytes	Fast, same DB version required

What is a reverse proxy and how does it differ from a forward proxy?

FundamentalCommon

Feature	Forward Proxy	Reverse Proxy
Sits between	Client and internet	Internet and servers
Represents	The client (hides client identity)	The server (hides server identity)
Client knows about it?	Usually yes (configured)	Transparent to client
Use cases	VPNs, content filtering, anonymity	Load balancing, SSL termination, caching, security
Examples	Squid, corporate proxies	NGINX, Cloudflare, AWS ALB

A reverse proxy is the workhorse of modern web architecture. NGINX acting as a reverse proxy can handle: SSL termination (HTTPS → HTTP internally), static file serving, caching, compression, rate limiting, and load balancing — all before the request reaches your app server.

What are the key back-of-envelope estimation numbers every engineer should know?

FundamentalVery Common

Back-of-envelope estimations help size systems during design interviews. Memorise these reference numbers:

Reference Numbers
-- LATENCY (Jeff Dean's numbers, approximate 2026):
L1 cache reference:         0.5 ns
Main memory reference:      100 ns
Read 1 MB from memory:      250 μs
SSD random read:            100 μs
HDD seek:                   10 ms
Network: same datacenter:   0.5 ms
Network: across regions:    50–150 ms

-- THROUGHPUT estimates:
Single DB server:           ~1,000–5,000 QPS (read)
Typical web server:         ~10,000–50,000 RPS
Redis:                      ~100,000–1,000,000 ops/sec
Kafka:                      ~1,000,000 messages/sec

-- DATA SIZES:
1 char / 1 byte
1 KB   = 1,000 bytes
1 MB   = 10^6 bytes
1 GB   = 10^9 bytes
1 TB   = 10^12 bytes

-- COMMON CALCULATIONS:
1 million users × 1 photo/month × 200KB = 200 GB/month
100K DAU × 10 requests/day = 1M req/day ≈ 12 RPS

-- AVAILABILITY:
99%     = 3.65 days downtime/year
99.9%   = 8.76 hours/year
99.99%  = 52 minutes/year
99.999% = 5.26 minutes/year (five nines)

⚡Intermediate TopicsQ15–Q28

How would you design a URL shortener (like bit.ly)?

IntermediateVery Common

This is a classic system design question. Walk through each layer systematically.

Requirements: Shorten URL → 7-char code. Redirect short → original. 100M URLs/day created, 10B redirects/day (100:1 read:write). Links expire optionally. Analytics.

Client

→

Load Balancer

→

API Servers

↓

Redis (hot URLs)

→ miss

MySQL (url_map)

URL shortener architecture with cache-aside for hot URLs

ID generation: Auto-increment ID → Base62 encode (a-zA-Z0-9) to get 7-char code. Or pre-generate 7-char tokens in batches.
Database schema: short_code (PK), long_url, user_id, created_at, expires_at, click_count.
Redirect flow: GET /{code} → check Redis → check MySQL → 301/302 redirect. Cache hot codes in Redis with 24h TTL.
301 vs 302: 301 (permanent) is cached by browsers — reduces future load but loses analytics. 302 (temporary) hits your server every time — better for analytics.
Analytics: Log clicks to Kafka → stream processor → ClickHouse/BigQuery for analysis.
Scale: 10B redirects/day ≈ 115,000 RPS — multiple app server replicas + Redis cluster handles this comfortably.

What is consistent hashing and why is it important?

IntermediateVery Common

Consistent hashing is a technique for distributing data across servers so that adding or removing a server requires remapping only a minimal fraction of keys — unlike simple modular hashing which remaps everything.

The problem with key % N: If you have 4 servers and add a 5th, key % 4 vs key % 5 gives different results for almost every key → massive cache invalidation, thundering herd.

Consistent hashing solution:

Imagine a ring (hash space 0 to 2³²).
Hash each server to a position on the ring.
Hash each key to a position on the ring.
Each key is served by the first server clockwise from its position.
Adding/removing a server only affects keys between that server and its predecessor — typically 1/N of all keys.
Virtual nodes (vnodes): Each physical server maps to multiple positions on the ring for better load distribution and smaller rebalancing impact.

💡Used by: Amazon DynamoDB, Apache Cassandra, Redis Cluster, Memcached (with consistent hashing client). Essential concept for distributed caches and NoSQL databases.

What is database sharding and what are common sharding strategies?

IntermediateVery Common

Sharding (horizontal partitioning) splits a database table across multiple database servers. Each shard holds a subset of the data — enabling horizontal scaling beyond what a single machine can handle.

Strategy	How	Pros	Cons
Range-based	Shard by value range (A-M, N-Z or dates)	Simple, range queries easy	Hotspots if data not uniform
Hash-based	`shard = hash(key) % N`	Even distribution	Range queries span all shards
Directory-based	Lookup table maps key → shard	Flexible, easy to rebalance	Lookup table is a bottleneck/SPOF
Geographic	Data lives in user's region	Low latency, data residency	Cross-region queries difficult

Sharding challenges: Cross-shard JOINs are expensive. Transactions spanning shards need distributed transaction protocols (2PC). Resharding when adding nodes is complex. Choose the shard key carefully — changing it later is very painful.

What is rate limiting and how do you implement it at scale?

IntermediateVery Common

Rate limiting controls how many requests a client (by IP, user ID, or API key) can make in a given time window — protecting APIs from abuse, DoS attacks, and fair usage enforcement.

Algorithm	How it works	Best for
Token Bucket	Bucket holds N tokens, fills at rate R. Each request consumes 1 token.	Smooth average with bursting allowed
Leaky Bucket	Requests queue; processed at fixed rate. Overflow dropped.	Smooth outflow, no burst
Fixed Window Counter	Count requests per N-second window. Reset on tick.	Simple, but edge spikes possible
Sliding Window Log	Track timestamps of each request. Count those in last N seconds.	Accurate but memory-heavy
Sliding Window Counter	Hybrid — approximate sliding window with low memory	Best accuracy/memory balance

Redis — Token Bucket Rate Limiter
-- Sliding window counter with Redis:
-- Key: ratelimit:{user_id}:{window_start}
-- INCR key → check if > limit → EXPIRE key 60

-- Or use Redis sorted set for sliding window log:
ZREMRANGEBYSCORE key 0 {now - window_ms}
ZADD key {now} {now}
count = ZCARD key
-- If count > limit → reject
EXPIRE key {window_seconds}

At scale, rate limiting state must be stored in a shared layer (Redis) accessible by all API server instances — a single server's memory won't work when you have many replicas.

How would you design a notification system?

IntermediateVery Common

A notification system needs to send messages across multiple channels (push, email, SMS) reliably, at scale, with minimal latency.

API / Events

→

Notification Queue (Kafka)

↓

Router Worker

routes by type

↙

↓

↘

Push (FCM/APNs)

Email (SendGrid)

SMS (Twilio)

Event-driven notification system with per-channel workers

User preferences: Store per-user channel + opt-out settings. Check before sending.
Deduplication: Use idempotency keys to prevent duplicate sends on retry.
Priority queues: Separate queues for real-time (OTP, alerts) vs batch (newsletters).
Retry logic: Exponential backoff for failed sends. Dead letter queue for permanently failed.
Template engine: Personalised content per user, localised per language.
Delivery tracking: Webhooks from providers → update delivery status in DB → analytics.

What is the difference between push and pull architectures?

IntermediateCommon

Feature	Push	Pull (Polling)
Who initiates	Server pushes to client	Client polls server
Latency	Near real-time	Up to poll interval
Server load	Must manage connections	Predictable (polling rate)
Client complexity	Must maintain connection	Simple HTTP requests
Missed data	Must handle reconnects	Client controls what it needs
Examples	WebSockets, SSE, push notifications	REST polling, long-polling, cron jobs

Hybrid approaches:

Long polling: Client makes request, server holds it open until data available (or timeout). Better than short polling but still HTTP-based.
Server-Sent Events (SSE): One-directional push over HTTP. Server streams updates, client can't send back. Good for dashboards, live feeds.
WebSockets: Full duplex — both sides can send at any time. Best for chat, gaming, collaborative tools. Higher infrastructure complexity.

What is eventual consistency and how do you design for it?

IntermediateVery Common

Eventual consistency means that if no new updates are made, all replicas will eventually converge to the same value — but at any given moment, different nodes may return different data (stale reads).

Read-your-writes: After a user writes, route their subsequent reads to the primary (or the same replica) to avoid seeing stale data. Common for profile updates.
Monotonic reads: Ensure a user never sees older data than they've already seen — use sticky sessions or consistent hashing to route to same replica.
CRDT (Conflict-free Replicated Data Types): Data structures that can be merged without conflicts — counters, sets, last-write-wins registers.
Idempotent operations: Make operations safe to replay — critical when retrying after network failures.
Sagas: For distributed transactions, break into steps with compensating actions for rollback instead of 2PC.

ℹ️Many systems combine eventual consistency for scale with strong consistency for critical operations. Facebook uses eventual consistency for like counts but strong consistency for payment balance. Design per use case, not globally.

What is the circuit breaker pattern?

IntermediateVery Common

The circuit breaker pattern prevents cascading failures in distributed systems by stopping calls to a failing service and allowing it time to recover — like an electrical circuit breaker that trips to prevent damage.

Circuit Breaker — States
// Three states:

CLOSED (normal)
  → Requests pass through
  → Track failure rate
  → If failures > threshold → OPEN

OPEN (tripped)
  → Requests fail immediately (no network call)
  → After timeout period → HALF-OPEN

HALF-OPEN (testing recovery)
  → Allow limited test requests through
  → If they succeed → CLOSED (recovered)
  → If they fail → OPEN again (reset timeout)

// Example thresholds:
// - 50% failure rate over 10 seconds → OPEN
// - 30 seconds in OPEN → try HALF-OPEN
// - 5 successful requests → CLOSED

Fallback behaviour: When OPEN, return cached data, default value, or friendly error instead of hanging.
Libraries: Netflix Hystrix (legacy), Resilience4j (Java), Polly (.NET), pybreaker (Python).
Service mesh: Envoy, Istio implement circuit breaking transparently at infrastructure level.

How would you design a key-value store like Redis?

IntermediateCommon

Data structure: In-memory hash table (O(1) get/set). Keys are strings; values can be strings, lists, hashes, sets, sorted sets.
Persistence: AOF (Append Only File) — log every write command for durability. RDB (point-in-time snapshots) — faster recovery but potential data loss. Both can be combined.
Single-threaded event loop: Redis processes commands sequentially — no locking needed, very fast. Network I/O is multiplexed (epoll/kqueue).
Replication: Primary-replica async replication. Replica streams AOF from primary.
Cluster mode: Data sharded across 16,384 hash slots distributed among nodes. Each key assigned to a slot via CRC16(key) % 16384.
Eviction policies: LRU, LFU, random, TTL-based. Configure maxmemory-policy.
Pub/Sub: Lightweight message broadcasting — publishers send to channels, subscribers receive.

💡Common Redis patterns: Session storage, distributed locks (SETNX + TTL), rate limiting (counters), leaderboards (sorted sets), pub/sub event bus, job queues (lists), bloom filters (probabilistic set membership).

What is service discovery and why is it needed?

IntermediateCommon

In microservices, services spin up and down dynamically (containers, auto-scaling). Hard-coded IPs/ports break immediately. Service discovery lets services find each other dynamically by name.

Pattern	How	Examples
Client-side discovery	Client queries registry, picks instance, calls directly	Eureka + Ribbon (Netflix OSS)
Server-side discovery	Client calls load balancer → LB queries registry	AWS ALB + ECS, Kubernetes Service
DNS-based	Service registered as DNS record; clients resolve normally	Consul DNS, Kubernetes DNS

Service registry tools: Consul, Zookeeper, etcd, AWS Cloud Map, Kubernetes built-in.
Health checks: Registry periodically pings services; removes unhealthy instances from pool.
Service mesh (Envoy/Istio): Handles discovery, load balancing, mTLS, circuit breaking at the sidecar proxy layer — no code changes needed.

What is the Saga pattern for distributed transactions?

IntermediateCommon

A Saga is a sequence of local transactions where each step publishes an event that triggers the next step. If any step fails, compensating transactions are run to undo previous steps — instead of using a two-phase commit (2PC) which locks resources across services.

Saga — Order Flow Example
// Choreography Saga (event-driven):
Order Service     → creates order → publishes OrderCreated
Payment Service   → reserves funds → publishes PaymentReserved
Inventory Service → reserves stock → publishes StockReserved
Order Service     → confirms order → publishes OrderConfirmed

// If payment fails → compensating transactions:
Payment Service  → publishes PaymentFailed
Order Service    → receives event → cancels order
Inventory Service → receives event → releases stock

// Orchestration Saga (central coordinator):
Saga Orchestrator → calls Payment Service
                 → if success: calls Inventory Service
                 → if failure: calls Payment.refund()
                             → calls Order.cancel()

Pattern	Choreography	Orchestration
Coordinator	None — services react to events	Central saga orchestrator
Coupling	Lower	Services coupled to orchestrator
Visibility	Hard to trace across services	Easy — orchestrator tracks state

What is CQRS and Event Sourcing?

IntermediateCommon

CQRS (Command Query Responsibility Segregation) separates write operations (commands) from read operations (queries) — using different models, and often different databases, for each.

Event Sourcing stores the history of all events that led to the current state instead of just the current state. The current state is derived by replaying events.

Write API (Commands)

→

Event Store

→

Projections

↓

Read API (Queries)

→

Read DB (Optimised Views)

CQRS + Event Sourcing: writes to event store, reads from materialized views

Benefits: Full audit trail, time-travel queries, scale reads/writes independently, rebuild any view by replaying events.
Challenges: Eventual consistency between write and read models, higher complexity, event schema evolution.
Good for: Financial transactions, audit systems, collaborative editing, high-traffic e-commerce.
Tools: EventStoreDB, Kafka (as event log), Axon Framework.

How do you design for high availability and fault tolerance?

IntermediateVery Common

Eliminate single points of failure (SPOF): Every component should have at least one backup. Load balancer pairs (active-passive or active-active). Multi-AZ database deployments.
Redundancy: N+1 redundancy minimum. Critical systems N+2 or more. Active-active (both handle traffic) vs active-passive (standby takes over).
Health checks & auto-healing: Load balancers and orchestrators (Kubernetes) automatically route away from unhealthy instances and restart failed containers.
Geographic distribution: Multi-region deployment. DNS failover routes traffic to healthy region if one region goes down.
Graceful degradation: System continues with reduced functionality rather than total failure. Feature flags to disable expensive features under load.
Timeouts & retries: Never make calls without timeouts. Retry with exponential backoff + jitter. Distinguish retryable (5xx, network) from non-retryable (4xx) errors.
Bulkhead pattern: Isolate resources (thread pools, connection pools) per dependency so one slow service can't exhaust all resources.
Chaos engineering: Deliberately inject failures (kill random pods, add latency) to validate resilience. Netflix Chaos Monkey pioneered this.

What is observability and what are its three pillars?

IntermediateVery Common

Observability is the ability to understand the internal state of a system from its external outputs. The three pillars are Logs, Metrics, and Traces.

Pillar	What it is	Tools	Answers
Logs	Timestamped records of discrete events	ELK Stack, Loki, CloudWatch	"What happened?"
Metrics	Numeric measurements over time (CPU, latency, error rate)	Prometheus, Grafana, Datadog	"How is the system performing?"
Traces	End-to-end request journey across services (spans)	Jaeger, Zipkin, AWS X-Ray, Tempo	"Where is the bottleneck?"

Golden signals (Google SRE): Latency, Traffic (requests/sec), Errors (rate), Saturation (resource utilisation). Monitor these for every service.
SLI/SLO/SLA: Service Level Indicator (what you measure), Service Level Objective (target %, e.g. 99.9% uptime), Service Level Agreement (contract with penalty).
Error budgets: If SLO is 99.9% uptime, you have 0.1% = ~43 min/month error budget. Once consumed, freeze new deployments until budget resets.
Alerting: Alert on symptoms (user-visible impact), not causes. Avoid alert fatigue — too many alerts and engineers ignore them all.

🔥Advanced ArchitectureQ29–Q40

How would you design a social media feed (like Twitter/X's timeline)?

AdvancedVery Common

Scale: 500M users, 500M tweets/day, 28B timeline reads/day (reads >> writes, 56:1 ratio). This is the canonical fan-out problem.

Two approaches for feed generation:

Approach	Fan-out on Write (Push)	Fan-out on Read (Pull)
When happens	On tweet creation — push to all followers' feeds	On read — merge tweets from all followees
Read speed	⚡ Instant — pre-computed feed in Redis	🐌 Slow — N DB queries per read
Write cost	High — celebrity with 10M followers = 10M writes	Low — one write
Problem	Celebrity problem (huge write amplification)	Hot read problem (all followers read at once)

Twitter's hybrid approach: Fan-out on write for regular users (push tweets to follower Redis feeds). Fan-out on read for celebrities (>10K followers) — inject their tweets at read time. This limits write amplification while keeping reads fast.

Feed storage: Redis sorted set per user (score = timestamp). Only keep last ~800 tweets per user. Older tweets fetched from DB on scroll.
Feed service: Merges pre-computed feed with celebrity tweets, applies ranking/algorithm, paginates with cursor.
Ranking: Engagement signals (likes, retweets, replies, recency, relationship strength) scored per item.

How would you design a distributed file storage system (like Google Drive)?

AdvancedVery Common

Requirements: Store/retrieve files of any size, 1B users, sync across devices, share files, 10 PB storage.

Client (mobile/web)

↓

API Gateway

↙

↓

↘

Metadata Service

Chunk Upload Service

Sync Service

↓

Metadata DB (MySQL)

Object Store (S3)

File split into chunks, uploaded to object store, metadata tracked in DB

Chunking: Split large files into 4MB chunks. Upload chunks in parallel. Only re-upload changed chunks on edit (delta sync).
Deduplication: Hash each chunk (SHA-256). If hash exists in store, reference existing chunk — save storage for common files.
Metadata DB: Tracks files, directories, versions, sharing permissions, chunk references.
Object storage: S3-compatible (AWS S3, GCS, Azure Blob) — durable, cheap, infinitely scalable.
Sync protocol: Long polling or WebSocket to notify clients of changes. Conflict resolution: last-write-wins or create conflict copy.
CDN for downloads: Serve frequently accessed files from CDN edge nodes.

How do you design a distributed rate limiter for a multi-region API?

Advanced

A multi-region distributed rate limiter must coordinate counters across geographically distributed nodes without introducing prohibitive latency or single points of failure.

Centralized Redis: All API servers write to a single Redis cluster. Simple but the Redis cluster becomes a potential bottleneck and adds latency for distant regions.
Local + global hybrid: Each region maintains a local counter (fast). Periodically sync to global count (async). Allow small over-limit bursts between sync cycles.
Token bucket with sync: Each node holds a portion of the token budget. When depleted, request more from a central token service.
CRDTs for counters: Use increment-only CRDT counters that can be merged from multiple nodes without coordination. Eventually consistent but accurate over time.
Sliding window at the edge: Implement rate limiting at CDN/edge (Cloudflare Workers, Fastly Compute) — closest to users, before traffic hits your servers.
Accept slight over-limiting: In a globally distributed system, perfect accuracy requires coordination that adds latency. For most APIs, being slightly over or under limit briefly is acceptable.

💡Stripe's approach: They use Redis with a local cache layer. API servers check local counter first, sync to Redis every 100ms. This gives 99% accuracy with very low latency.

How would you design a real-time collaborative document editor (like Google Docs)?

AdvancedCommon

Real-time collaboration requires handling concurrent edits from multiple users simultaneously without losing changes or corrupting the document.

Operational Transformation (OT): Google Docs' original approach. Each operation is transformed against concurrent operations to preserve intent. Complex to implement correctly.
CRDT (Conflict-free Replicated Data Type): Newer approach (Figma, Linear). Data structures designed so concurrent edits automatically merge without conflict. More predictable.
Architecture: WebSocket connection per user to a document server. Document server broadcasts operations to all connected users. Operations stored as event log.
Cursors & presence: Broadcast cursor positions via WebSocket. Each user's cursor shown in real-time with their colour/name.
Document sharding: Assign each document to a document server. Use consistent hashing so all users of the same document connect to the same server.
Persistence: Operations appended to append-only log. Document state snapshotted periodically. On load, replay from latest snapshot + recent ops.
Offline support: Queue operations locally, merge on reconnect using OT/CRDT.

What is a Bloom filter and when would you use it in system design?

AdvancedCommon

A Bloom filter is a space-efficient probabilistic data structure that tests whether an element is in a set. It can have false positives (says "yes" when answer is "no") but never false negatives (never says "no" when answer is "yes").

How it works: A bit array of m bits + k hash functions. To add element: set bits at k hash positions. To check: if all k positions are set → "probably in set". If any is 0 → "definitely not in set".
Space advantage: 1 billion items stored with 1% false positive rate needs ~1.14 GB vs ~8 GB for a hash set.

Real-world use cases:

URL shortener: Check if a short code is already taken before DB query — avoid expensive lookups for definitely-new keys.
Username availability: Instantly check if a username might exist. Only query DB on "might exist" response.
Spam filter: Check if an email domain is in a list of known spam domains.
Database caching: Don't bother checking cache if you know data isn't there (cache miss avoidance). Used by RocksDB, Cassandra, HBase.
Cryptocurrency: Bitcoin uses Bloom filters for lightweight node wallet filtering.
Password breachlist: Check if a password appears in a breached password list without storing the full list.

How would you design a distributed unique ID generator?

AdvancedVery Common

Generating unique IDs at scale across distributed servers without coordination is a classic system design problem. IDs should ideally be unique, sortable by time, and generated without a single coordinator.

Approach	Structure	Pros	Cons
UUID v4	128-bit random	Simple, no coordination	Not sortable, large, random index fragmentation
Auto-increment DB	Sequential integer	Sortable, compact	Single point of failure, bottleneck
Twitter Snowflake	41-bit timestamp + 10-bit machine ID + 12-bit sequence	Sortable, distributed, 4096/ms/machine	Clock skew issues, 69-year range
ULID	48-bit time + 80-bit random	Sortable, URL-safe, 128-bit	Slightly larger than Snowflake
Ticket Server	Dedicated DB server serving ranges	Simple, sequential	SPOF if not replicated

Snowflake ID — 64-bit layout
// 64-bit Snowflake ID structure:
[ 1 sign bit | 41 timestamp ms | 10 machine ID | 12 sequence ]
                ↑                    ↑                ↑
           since epoch         up to 1024       4096/ms/machine
           (~69 years)          machines

How do you design a search autocomplete system?

AdvancedCommon

Requirements: 5 suggestions per keystroke. Return in <100ms. Rank by popularity. Handle 10M queries/day.

User types "syst..."

→

API Gateway

↓

Redis Trie / Sorted Set

miss

Trie DB / Elasticsearch

Hot prefixes served from Redis; cold from database

Trie data structure: Each node represents a character. At terminal nodes, store top-N suggestions ranked by frequency. Prefix traversal gives all suggestions starting with typed text.
Frequency tracking: Log all search queries → stream to Kafka → aggregate weekly/daily → update suggestion scores.
Caching strategy: Most popular prefixes (first 1-3 chars) cover most traffic. Cache these in Redis sorted sets (score = frequency). Remaining prefixes hit Trie DB.
Offline aggregation: Batch job (MapReduce/Spark) computes top suggestions per prefix weekly. Incremental updates for trending terms.
Filtering: Remove offensive/adult terms. Support personalisation (user's past searches ranked higher).
Latency target: CDN serves static prefix cache → <10ms. API server with Redis → <50ms. Accept eventual consistency for suggestion counts.

How do you design a video streaming platform (like YouTube)?

AdvancedVery Common

Uploader

→

Upload Service

→

Raw Video S3

↓ trigger

Transcoding Queue

→

Transcoder Workers

↓

CDN (video segments)

→

Viewer

Upload → transcode to multiple qualities → CDN serves to viewers

Upload: Direct-to-S3 upload via pre-signed URL (bypasses app servers). Resume with multipart upload for large files.
Transcoding: Convert raw video to multiple resolutions (240p, 480p, 720p, 1080p, 4K) and formats (HLS, DASH) using FFmpeg workers. HLS splits video into ~10-second segments for adaptive streaming.
Adaptive bitrate streaming (ABR): Client player switches quality dynamically based on bandwidth. Starts low, increases as connection stabilises.
CDN delivery: Video segments cached at CDN PoPs globally. Popular videos served from edge — no origin hit.
Metadata DB: PostgreSQL for video metadata (title, description, tags, counts). Elasticsearch for search indexing.
Comments & likes: Cassandra for comments (high write throughput). Redis for like counts (approximate is fine).
Recommendation: ML-based (watch history, engagement signals) — separate ML pipeline.

What is a leader election algorithm and how does it work?

Advanced

Leader election is the process of designating one node in a cluster as the coordinator (leader) — responsible for decision-making, while others are followers. When the leader fails, a new leader must be elected quickly.

Raft consensus algorithm: Nodes can be leader, follower, or candidate. Candidates request votes from peers. First to get majority becomes leader. Leader sends heartbeats to prevent new elections. Used by etcd, CockroachDB, TiKV.
Paxos: The theoretical foundation. More complex than Raft — harder to understand and implement correctly. Multi-Paxos used by Google Chubby, Spanner.
Zookeeper (ZAB): Atomic broadcast protocol. Clients race to create an ephemeral node at /leader. First to create it wins. Others watch for deletion to trigger re-election.
Bully algorithm: Node with highest ID sends election message to all higher-ID nodes. If no response, it becomes leader. Simple but causes high message overhead.

Raft — Leader Election Flow
// Raft terms:
All nodes start as followers
Follower hasn't heard from leader in election timeout (150-300ms)
   → becomes candidate, increments term, votes for itself
Candidate sends RequestVote to all peers
Peers vote for first valid candidate in their term
Candidate receives majority → becomes leader
Leader sends AppendEntries heartbeats every 50ms
If leader fails, followers timeout → new election

How do you design a payment processing system?

AdvancedCommon

Payment systems require the highest levels of consistency, reliability, and security. Money cannot be lost or double-charged.

Client

→

Payment API

→

Idempotency Check

↓

Payment Processor

→

Stripe/Adyen/Braintree

↓

Ledger DB (append-only)

Event Queue (webhooks)

Payment flow with idempotency, ledger, and async webhook delivery

Idempotency keys: Each payment request carries a unique key. Duplicate requests return the same result without processing twice. Stored in DB with TTL.
Ledger (double-entry bookkeeping): Every debit has a corresponding credit. Ledger is append-only — never update/delete records. Balance = sum of all transactions for an account.
Exactly-once delivery: Use idempotency + outbox pattern. Write payment record + event in same DB transaction. Separate poller reads outbox and publishes events.
Retry strategy: Network failures are common with payment providers. Retry with exponential backoff, checking idempotency to prevent double charge.
PCI DSS compliance: Never store raw card numbers. Use tokenization (Stripe Elements handles card data before it touches your servers).
Reconciliation: Daily batch job reconciles your ledger against payment processor statements to catch discrepancies.

How do you design a ride-sharing system (like Uber)?

AdvancedVery Common

Core challenge: Real-time location tracking of millions of drivers, fast matching of drivers to riders, ETA calculation, and price surging — all at city scale.

Location storage: Driver location updates every ~5 seconds via mobile. Store in Redis with geospatial indexing (GEOADD, GEODIST, GEORADIUS). Redis holds recent locations; DB holds historical trips.
Matching algorithm: When rider requests → find nearest N available drivers using geospatial query → rank by ETA, rating, vehicle type → offer to top candidate → if declined, offer to next. Time limit per offer (10–15s).
Geospatial indexing: Divide city into S2 cells (Google S2 library). Map drivers to cells. Search radiating cells outward from pickup point.
WebSocket for real-time: Persistent WebSocket connection per driver and rider. Location updates, trip status, notifications pushed via WebSocket.
Trip state machine: REQUESTED → DRIVER_ACCEPTED → DRIVER_ARRIVED → TRIP_STARTED → COMPLETED → PAYMENT_PROCESSED.
Surge pricing: Demand (ride requests per area) / supply (available drivers per area) ratio → multiplier applied. Updated every ~1 minute per geohash zone.
ETA calculation: Routing engine (OSRM, Google Maps API) calculates drive time. Factor in real-time traffic data.

What are common system design anti-patterns to avoid?

AdvancedVery Common

Premature optimisation: Building for 1B users when you have 1K. Complexity is expensive to maintain. Build what you need now, design for horizontal scaling later.
Distributed monolith: Microservices that are tightly coupled and must deploy together. Gets the worst of both worlds — network latency without independence.
Chatty services: Service A makes 20 calls to Service B to complete one user request. High latency, fragile. Fix with aggregation, batching, or combining services.
Shared database between services: Multiple services directly accessing the same DB schema. Breaks service independence — schema changes break all consumers. Each service should own its data store.
Synchronous chains: Request must wait for A → B → C → D. One slow service blocks everything. Use async (queues) where response isn't immediately needed.
Missing idempotency: Operations that aren't safe to retry. Crucial for payments, notifications, and any external API call.
N+1 queries: Fetching a list then making one query per item. Use JOINs, eager loading, or batch APIs.
No graceful degradation: System shows error page when a non-critical service is down. Feature flags and fallbacks maintain core functionality.
Over-relying on a single cache layer: If Redis goes down and your app has no fallback to DB, everything breaks. Plan for cache failure.
Not testing at scale: Load tests reveal bottlenecks invisible at low traffic. Use tools like k6, Gatling, or Locust before launch.

💡The best system design: Simple, well-understood components that are easy to reason about, monitor, and change. Complexity is a liability — only add it when you feel the pain of not having it.

Up Next: Data Structures & Algorithms

System Design done. Next is DSA — arrays, trees, graphs, dynamic programming, and the algorithms that power technical interviews.

← TypeScript Q&A

JavaScript Python React Node.js TypeScript All Articles

System Design Interview
Questions & Answers

🌱Fundamental ConceptsQ1–Q14

⚡Intermediate TopicsQ15–Q28

🔥Advanced ArchitectureQ29–Q40

Up Next: Data Structures & Algorithms

we believe learning is more than just acquiring knowledge

quick link

help center

Get In Touch

Info@rankweb3.com

©2026, Rankweb3. All Rights Reserved

System Design InterviewQuestions & Answers

🌱Fundamental ConceptsQ1–Q14

⚡Intermediate TopicsQ15–Q28

🔥Advanced ArchitectureQ29–Q40

Up Next: Data Structures & Algorithms

we believe learning is more than just acquiring knowledge

quick link

help center

Get In Touch

Info@rankweb3.com

©2026, Rankweb3. All Rights Reserved

System Design Interview
Questions & Answers