Pillar 5 — High-Level Design (HLD)

HLD Methodology & Framework Full-Focus

Tip: Practice this framework on every system. The structured process matters as much as the answer itself.

Interview Framework

Requirement Clarification (Functional vs Non-Functional)

The first two minutes of every design interview. Separate what the system does (functional) from how well it does it (non-functional: latency, throughput, durability).

Ask clarifying questions before drawing anything — interviewers expect this
Functional: core use-cases, user personas, read vs write heavy
Non-functional: availability target, consistency model, expected scale
Write requirements on the whiteboard so both you and the interviewer can reference them

System Design Primer Baeldung — Functional vs Non-Functional

Capacity Estimation (QPS, Storage, Bandwidth)

Back-of-the-envelope math that proves your design can handle the expected load. Interviewers use this to gauge your system intuition.

Start from DAU, derive QPS (peak = 2-3x average)
Estimate storage per record, project for 3-5 years
Bandwidth = QPS x average response size
Know common numbers: 86 400 sec/day, 1 M req/day ~ 12 QPS

System Design Primer — BOE High Scalability — BOE

API Design First

Define the contract before drawing boxes. A clear REST/gRPC API grounds the design and prevents scope creep during the interview.

List 3-5 core endpoints with HTTP methods, params, and response shapes
Decide REST vs gRPC vs GraphQL based on use-case
Consider pagination, idempotency keys, and auth headers up front
API versioning strategy (path-based vs header-based)

System Design Primer — REST Baeldung — REST API Design

Data Model Design

Choose the right schema and storage engine before scaling discussions. The data model drives every downstream decision.

Identify core entities and relationships (1:1, 1:N, M:N)
SQL vs NoSQL decision based on query patterns, not hype
Denormalization tradeoffs for read-heavy workloads
Index strategy: what queries will hit this table?

System Design Primer — Database Baeldung — Data Modeling

High-Level Component Diagram

The core deliverable of the interview: a box-and-arrow diagram showing clients, load balancers, services, databases, caches, and queues.

Draw the happy path first, then add failure handling
Label every arrow with protocol (HTTP, gRPC, TCP) and data flow direction
Keep it to 5-8 major components — too many boxes signals unclear thinking
Narrate while drawing: explain why each component exists

System Design Primer — Overview High Scalability

Deep-Dive on the Bottleneck Component

After the high-level sketch, the interviewer will zoom in. Identify the bottleneck yourself before they ask — it shows senior-level thinking.

Common bottlenecks: database writes, fan-out, hotspot keys, network hops
Discuss scaling strategies: sharding, caching, async processing
Mention monitoring: how would you detect this bottleneck in production?
Propose a fallback or degradation strategy

System Design Primer — Performance High Scalability

SLI / SLO / SLA Definitions

Quantify reliability. SLIs are metrics (latency p99), SLOs are targets (99.9%), SLAs are contractual promises. Mentioning these elevates your answer.

SLI: the metric you measure (error rate, latency percentile, throughput)
SLO: internal target for an SLI (p99 latency < 200ms)
SLA: external contract with penalties if SLO is breached
Error budgets: how much downtime is acceptable before halting deployments

AWS Well-Architected — Reliability System Design Primer — Availability

Core Systems to Practice Full-Focus

Tier 1 — Must Do

URL Shortener

The classic warm-up design problem. Interesting because it covers hashing, collision handling, read-heavy workloads, and analytics at scale.

Key components: hashing service, key-value store, redirect service, analytics pipeline
Base62 encoding vs MD5-truncation tradeoffs
Handle collisions with retry or counter-based approach
Read:write ratio is extremely high — perfect cache candidate
Custom alias support and TTL expiration

System Design Primer — URL Shortener High Scalability

Rate Limiter

Tests understanding of distributed state, sliding windows, and protecting upstream services. Appears frequently in SDE 2 interviews.

Key components: rule engine, counter store (Redis), middleware/gateway integration
Algorithms: fixed window, sliding window log, sliding window counter, token bucket, leaky bucket
Distributed rate limiting with Redis INCR + TTL atomicity
Race conditions in multi-node deployments
Return 429 with Retry-After header; degrade gracefully

System Design Primer AWS — API Gateway Throttling

Notification System

Interesting because it combines multiple delivery channels (push, SMS, email), priority queues, and exactly-once delivery challenges.

Key components: notification service, priority queue, channel adapters, template engine, preference store
Fan-out to millions of devices with rate limiting per channel
Idempotency to prevent duplicate notifications
User preference and opt-out management
Retry with exponential backoff per delivery channel

AWS SNS Documentation System Design Primer

Chat System (WhatsApp-like)

Tests real-time communication, presence detection, and message ordering. Interesting for its WebSocket management and offline message sync challenges.

Key components: WebSocket gateway, chat service, presence service, message store, push notification fallback
Connection management: WebSocket with heartbeat, reconnection logic
Message ordering with sequence IDs per conversation
Offline message queue and read receipts
Group chat fan-out strategy and media storage

System Design Primer High Scalability

News Feed / Timeline

The classic fan-out problem. Interesting because it forces a decision between push (fan-out on write) and pull (fan-out on read) models with hybrid approaches.

Key components: feed service, fan-out service, post store, social graph, feed cache
Push model for normal users, pull model for celebrity accounts (hybrid)
Feed ranking: chronological vs ML-based relevance scoring
Cache invalidation when posts are deleted or edited
Pagination with cursor-based approach for infinite scroll

System Design Primer — News Feed High Scalability

Search Autocomplete

Fascinating for its latency requirements (<100ms), trie data structures, and the interplay between real-time popularity updates and pre-computed suggestions.

Key components: trie service, query aggregation pipeline, ranking service, suggestion cache
Trie with frequency counters at each node, top-K heap
Offline pipeline updates trie periodically from query logs
Prefix matching with character-by-character API calls (debounce on client)
Sharding trie by prefix range for horizontal scaling

System Design Primer Baeldung — Trie

Distributed Cache

A foundational building block that appears in almost every design. Interesting for consistent hashing, eviction policies, and cache coherence in multi-node setups.

Key components: cache nodes, consistent hash ring, client library, eviction engine
Eviction policies: LRU, LFU, TTL-based — when to use each
Consistent hashing with virtual nodes for balanced distribution
Cache-aside, write-through, write-behind patterns
Thundering herd / cache stampede mitigation (locking, probabilistic early refresh)

System Design Primer — Cache AWS ElastiCache Docs

Job Scheduler

Tests distributed coordination, exactly-once execution, and fault tolerance. Interesting because of priority handling, retries, and DAG-based dependency management.

Key components: scheduler service, job queue, worker pool, state store, dead-letter queue
Cron-based vs event-driven vs DAG-based scheduling
Exactly-once execution with distributed locking (Redlock or DB-based)
Worker heartbeat and task reassignment on failure
Priority queues and resource-based scheduling

System Design Primer Baeldung — Job Scheduling

Tier 2 — Important

Payment System

Interesting for its strict consistency requirements, idempotency, and the double-entry bookkeeping model. Mistakes here cost real money.

Key components: payment gateway, ledger service, reconciliation engine, fraud detector, PSP integration
Idempotency keys to prevent duplicate charges
Double-entry ledger for every transaction (debit + credit)
Saga pattern for distributed transactions across services
PCI-DSS compliance: tokenize card data, never store raw PAN

System Design Primer Baeldung — Saga Pattern

Ride-Sharing Location Service

Interesting for geospatial indexing, real-time location streaming, and the matching algorithm between riders and drivers within a radius.

Key components: location ingestion service, geospatial index, matching engine, trip service, ETA calculator
Geohash or QuadTree for spatial queries ("drivers within 3km")
High-frequency location updates (every 3-5 seconds) via UDP or WebSocket
Supply-demand matching with surge pricing signals
Eventual consistency acceptable for location; strong consistency for trip state

System Design Primer High Scalability

Video Upload & Streaming

Interesting for its large-blob handling, async transcoding pipeline, adaptive bitrate streaming, and CDN edge-caching strategy.

Key components: upload service, transcoding pipeline, video store (S3), CDN, metadata service
Chunked/resumable uploads for large files (tus protocol)
Transcoding to multiple resolutions and codecs in parallel (DAG pipeline)
Adaptive bitrate streaming (HLS/DASH) based on client bandwidth
Pre-signed URLs for upload; CDN origin-pull for playback

AWS S3 — Multipart Upload AWS CloudFront Docs

E-Commerce Order System

Interesting for its inventory consistency challenges, distributed transaction coordination, and the complex state machine of order lifecycle.

Key components: order service, inventory service, payment service, fulfillment service, event bus
Order state machine: created → payment pending → confirmed → shipped → delivered
Inventory reservation with TTL to prevent overselling
Saga or 2PC for cross-service consistency (order + payment + inventory)
Event sourcing for full audit trail of order changes

System Design Primer Baeldung — Saga Pattern

Distributed ID Generator

Interesting for its strict uniqueness guarantee, ordering requirements, and the tradeoffs between coordination-based and coordination-free approaches.

Key components: ID service nodes, clock sync mechanism, Zookeeper/etcd for node assignment
Snowflake approach: timestamp + node ID + sequence number (64-bit)
UUID v4 (random) vs UUID v7 (time-ordered) tradeoffs
Database auto-increment with stride (node1: 1,3,5; node2: 2,4,6)
Clock skew handling and monotonicity guarantees

System Design Primer Baeldung — Unique ID Generation

Logging & Metrics Pipeline

Interesting for its extreme write throughput, the tension between real-time alerting and batch analysis, and storage tiering strategies.

Key components: agents (Fluentd), message bus (Kafka), stream processor, time-series DB, alerting engine
Push vs pull collection model (Prometheus pull vs StatsD push)
Log levels, structured logging (JSON), and correlation IDs
Hot/warm/cold storage tiering for cost optimization
Sampling strategies at high volume to control cost

AWS CloudWatch Logs System Design Primer

Tier 3 — Good to Have

Google Docs Collaborative Editing

Interesting for its real-time conflict resolution, OT/CRDT algorithms, and the challenge of maintaining document consistency across concurrent editors.

Key components: WebSocket gateway, operation transform engine, document store, presence service, version history
OT (Operational Transformation) vs CRDT tradeoffs
Cursor position synchronization across clients
Undo/redo in a multi-user context
Offline editing with conflict merge on reconnect

System Design Primer High Scalability

Hotel/Flight Booking

Interesting for its double-booking prevention, distributed lock management, and complex pricing that changes in real-time based on availability.

Key components: search service, inventory service, booking service, pricing engine, payment integration
Optimistic locking for seat/room reservation (version-based)
Temporary hold with TTL before payment confirmation
Search with heavy caching and async price refresh
Overbooking strategy and waitlist management

System Design Primer Baeldung — Optimistic Locking

Ad Click Aggregation

Interesting for its extreme write throughput, exactly-once counting guarantees, and the need for both real-time dashboards and batch reconciliation.

Key components: click ingestion, stream processor (Flink/Kafka Streams), aggregation store, reconciliation pipeline
Exactly-once semantics with Kafka transactions or deduplication
Windowed aggregation: tumbling, sliding, session windows
Lambda architecture: real-time stream + batch correction layer
Click fraud detection and filtering at ingestion

System Design Primer High Scalability

Distributed File Storage (S3-like)

Interesting for its erasure coding for durability, metadata management at scale, and the separation between data plane and control plane.

Key components: API gateway, metadata service, data nodes, placement service, garbage collector
Data plane (actual bytes) vs control plane (metadata, routing) separation
Erasure coding (Reed-Solomon) vs simple replication for durability
Consistent hashing for object placement across data nodes
Multipart upload for large objects with parallel chunk transfers

AWS S3 Documentation System Design Primer

Architecture Patterns Light

Arch Patterns

Microservices vs Monolith Tradeoffs

Know when to split and when to stay monolithic. The answer is almost always "start monolith, split when you have clear domain boundaries."

Monolith: simpler deployment, easier debugging, lower latency (in-process calls)
Microservices: independent scaling, team autonomy, technology heterogeneity
Hidden costs: network latency, distributed tracing, data consistency
Modular monolith as a pragmatic middle ground

System Design Primer Baeldung — Microservices

Event-Driven Architecture

Decouple producers from consumers using events. Essential for scaling write-heavy systems and enabling asynchronous workflows.

Event notification vs event-carried state transfer vs event sourcing
Message brokers: Kafka (log-based) vs RabbitMQ (queue-based)
Eventual consistency is the default; plan for it
Dead-letter queues for poison messages

System Design Primer — Async Baeldung — Event-Driven

CQRS & Event Sourcing

Separate read and write models for different scaling needs. Event sourcing stores every state change as an immutable event.

CQRS: write to normalized store, read from denormalized projections
Event sourcing: append-only event log is the source of truth
Replay events to rebuild state or create new projections
Increased complexity — only use when read/write patterns diverge significantly

Baeldung — CQRS System Design Primer

Sidecar & Service Mesh

Offload cross-cutting concerns (TLS, retries, observability) to a sidecar proxy instead of embedding them in every service.

Sidecar pattern: co-located proxy handles networking (Envoy)
Service mesh (Istio, Linkerd): fleet of sidecars with centralized control plane
Benefits: mTLS everywhere, traffic shifting, circuit breaking without code changes
Cost: increased latency per hop, memory overhead per pod

Baeldung — Service Mesh AWS App Mesh

API Gateway Patterns

Single entry point that handles routing, authentication, rate limiting, and request transformation for all backend services.

Request routing, composition, and protocol translation
Cross-cutting: auth, rate limiting, logging, CORS
Avoid putting business logic in the gateway — keep it thin
AWS API Gateway, Kong, or custom (Spring Cloud Gateway)

AWS API Gateway Docs Baeldung — Spring Cloud Gateway

BFF (Backend for Frontend)

A dedicated backend per frontend type (mobile, web, internal). Prevents a generic API from becoming a bloated "one-size-fits-none" contract.

Mobile BFF returns minimal payloads; web BFF returns richer data
Each BFF calls shared microservices underneath
Reduces over-fetching and under-fetching per client type
Tradeoff: more services to maintain; mitigate with shared libraries

Baeldung — BFF Pattern System Design Primer

Strangler Fig Migration

Incrementally replace a legacy monolith by routing new traffic to new services while the old system still handles existing routes.

Route by URL path, feature flag, or percentage-based traffic split
Facade layer intercepts requests and decides old vs new backend
Gradually migrate one module at a time; never big-bang rewrite
Roll back easily by rerouting traffic to the old system

Baeldung — Strangler Fig AWS — Strangler Fig

Data Patterns

Fan-Out on Write vs Read

The fundamental tradeoff in feed systems: precompute feeds at write time (fast reads) or assemble at read time (fast writes).

Fan-out on write: push post to all follower feeds at publish time
Fan-out on read: pull from followed users at read time
Hybrid: push for normal users, pull for celebrities (high follower count)
Trade storage and write amplification for read latency

System Design Primer High Scalability

Hot Partition Problem

When one shard receives disproportionate traffic (celebrity post, viral event). A make-or-break topic for scaling discussions.

Causes: skewed partition key, viral content, time-based keys
Solutions: salting keys, splitting hot partitions, dedicated shard
Monitoring: track per-partition QPS to detect hotspots early
DynamoDB adaptive capacity and Kafka partition reassignment as examples

AWS DynamoDB — Partition Keys System Design Primer

Write Amplification

When a single logical write triggers many physical writes (LSM compaction, fan-out, denormalized copies). Understand the cost of your data model.

LSM-tree compaction in Cassandra/RocksDB is a primary source
Denormalized writes: updating one entity across multiple tables/caches
Fan-out on write multiplies writes by follower count
Measure write amplification factor to predict disk/IO costs

System Design Primer High Scalability

Read-Your-Writes Consistency

After a user writes data, they should always see their own update, even if other users see eventual consistency.

Route reads to the primary/leader for the writing user
Use session stickiness or read-after-write tokens
Client-side: optimistic UI update before server confirmation
Stronger than eventual, weaker than strong consistency — a practical middle ground

System Design Primer — Consistency AWS DynamoDB — Read Consistency

Infrastructure & Cloud (AWS) Light

AWS Services

Lambda Cold Start Problem

The initialization delay when a Lambda function runs for the first time or after being idle. Critical to mention when proposing serverless in HLD.

Cold start = download code + init runtime + init dependencies
Typical: 100ms-2s for Java, 50-200ms for Python/Node
Mitigations: provisioned concurrency, keep-warm pings, smaller package size
SnapStart (Java) pre-initializes and snapshots the execution environment

AWS Lambda — Concurrency Baeldung — AWS Lambda

SQS vs SNS vs EventBridge

Know which AWS messaging service to pick. SQS = point-to-point queue, SNS = pub-sub fan-out, EventBridge = event routing with rules.

SQS: buffering, decoupling, exactly-once (FIFO), at-least-once (standard)
SNS: fan-out to multiple SQS queues, Lambda, HTTP endpoints
EventBridge: content-based routing with rules, schema registry, cross-account
Common pattern: SNS → SQS fan-out for reliable multi-consumer delivery

AWS SQS Docs AWS SNS Docs AWS EventBridge Docs

S3 Consistency Model

Since December 2020, S3 provides strong read-after-write consistency for all operations. Know this — interviewers may test outdated knowledge.

Strong consistency for PUTs and DELETEs (no more eventual consistency)
List operations are also strongly consistent
Storage classes: Standard, IA, Glacier for cost tiering
S3 Select for server-side filtering before download

AWS S3 Documentation AWS S3 — Storage Classes

RDS vs Aurora

Aurora is AWS's cloud-native relational DB. Know the architectural differences and when Aurora's higher cost is justified.

Aurora: shared storage layer (up to 128TB auto-scaling), 6-way replication
5x throughput improvement over standard MySQL on RDS
Aurora Serverless v2 for variable workloads (scales to zero)
RDS: simpler, cheaper for predictable workloads; supports more engines

AWS Aurora Overview AWS RDS Documentation

ElastiCache Setup

Managed Redis or Memcached for caching. Know when to use each and how to configure replication and failover.

Redis: rich data structures, persistence, pub/sub, Lua scripting
Memcached: simpler, multi-threaded, no persistence — pure cache
Cluster mode: hash slots across shards for horizontal scaling
Multi-AZ with automatic failover for high availability

AWS ElastiCache — Redis Baeldung — Redis with Spring

API Gateway Throttling

AWS API Gateway's built-in throttling at account, stage, and method levels. A managed rate limiter for your APIs.

Token bucket algorithm at the account level (10,000 RPS default)
Usage plans with API keys for per-client throttling
Burst vs sustained rate limits
429 TooManyRequests response with Retry-After header

AWS — API Gateway Throttling AWS — Usage Plans

CloudFront CDN Patterns

AWS's CDN for caching static and dynamic content at edge locations. Essential for reducing latency in global-scale systems.

Origin pull: edge fetches from origin on cache miss, caches response
Cache behaviors: different TTLs for /api/* vs /static/*
Lambda@Edge for request/response manipulation at the edge
Origin shield to reduce load on your origin server

AWS CloudFront Docs AWS Lambda@Edge

Containers & K8s

Docker Layer Caching

Each Dockerfile instruction creates a layer. Proper ordering dramatically reduces build times in CI/CD pipelines.

Order instructions from least-changing to most-changing
COPY package.json first, then npm install, then COPY source code
Multi-stage builds to reduce final image size
Use .dockerignore to exclude node_modules, .git, etc.

AWS ECS — Image Best Practices Baeldung — Docker Caching

K8s Pod Scheduling Basics

How Kubernetes decides which node to place a pod on. Know the basics for discussing deployment strategies in HLD.

Scheduler evaluates resource requests, node affinity, taints/tolerations
Resource requests (guaranteed) vs limits (max allowed)
Node affinity: soft (preferred) vs hard (required) placement rules
Pod anti-affinity to spread replicas across failure domains

AWS EKS Documentation Baeldung — K8s Scheduling

HPA vs VPA

Horizontal Pod Autoscaler adds more pods; Vertical Pod Autoscaler resizes existing pods. Know when each applies.

HPA: scales pod count based on CPU, memory, or custom metrics
VPA: adjusts resource requests/limits per pod (requires restart)
HPA preferred for stateless services; VPA for stateful or single-replica workloads
Do not use HPA and VPA on the same metric (CPU) simultaneously

AWS EKS — HPA Baeldung — K8s Autoscaling

Health Probes (Liveness vs Readiness)

Kubernetes uses probes to know when to restart a container (liveness) and when to route traffic to it (readiness).

Liveness: is the process alive? Restart if failing (e.g., deadlock detection)
Readiness: can this pod handle traffic? Remove from service if failing
Startup probe: gives slow-starting apps time before liveness kicks in
HTTP, TCP, or exec-based probe types

AWS EKS Documentation Baeldung — K8s Health Probes

Service Mesh (Istio Concept)

A dedicated infrastructure layer for managing service-to-service communication with observability, security, and traffic management.

Data plane: Envoy sidecar proxies deployed alongside every pod
Control plane (istiod): configuration, certificate management, service discovery
Features: mTLS, traffic splitting, circuit breaking, distributed tracing
Overhead: ~10ms latency per hop, ~50MB memory per sidecar

AWS App Mesh Baeldung — Service Mesh

HLD Methodology & Framework Full-Focus

Interview Framework

Requirement Clarification (Functional vs Non-Functional)

Capacity Estimation (QPS, Storage, Bandwidth)

API Design First

Data Model Design

High-Level Component Diagram

Deep-Dive on the Bottleneck Component

SLI / SLO / SLA Definitions

Core Systems to Practice Full-Focus

Tier 1 — Must Do

URL Shortener

Rate Limiter

Notification System

Chat System (WhatsApp-like)

News Feed / Timeline

Search Autocomplete

Distributed Cache

Job Scheduler

Tier 2 — Important

Payment System

Ride-Sharing Location Service

Video Upload & Streaming

E-Commerce Order System

Distributed ID Generator

Logging & Metrics Pipeline

Tier 3 — Good to Have

Google Docs Collaborative Editing

Hotel/Flight Booking

Ad Click Aggregation

Distributed File Storage (S3-like)

Architecture Patterns Light

Arch Patterns

Microservices vs Monolith Tradeoffs

Event-Driven Architecture

CQRS & Event Sourcing

Sidecar & Service Mesh

API Gateway Patterns

BFF (Backend for Frontend)

Strangler Fig Migration

Data Patterns

Fan-Out on Write vs Read

Hot Partition Problem

Write Amplification

Read-Your-Writes Consistency

Infrastructure & Cloud (AWS) Light

AWS Services

Lambda Cold Start Problem

SQS vs SNS vs EventBridge

S3 Consistency Model

RDS vs Aurora

ElastiCache Setup

API Gateway Throttling

CloudFront CDN Patterns

Containers & K8s

Docker Layer Caching

K8s Pod Scheduling Basics

HPA vs VPA

Health Probes (Liveness vs Readiness)

Service Mesh (Istio Concept)

Recommended Resources