← Back to Tracker

Pillar 5 — High-Level Design (HLD)

System design end-to-end — from blank page to production-ready

HLD Methodology & Framework Full-Focus

Tip: Practice this framework on every system. The structured process matters as much as the answer itself.

Interview Framework

Requirement Clarification (Functional vs Non-Functional)

The first two minutes of every design interview. Separate what the system does (functional) from how well it does it (non-functional: latency, throughput, durability).

  • Ask clarifying questions before drawing anything — interviewers expect this
  • Functional: core use-cases, user personas, read vs write heavy
  • Non-functional: availability target, consistency model, expected scale
  • Write requirements on the whiteboard so both you and the interviewer can reference them

Capacity Estimation (QPS, Storage, Bandwidth)

Back-of-the-envelope math that proves your design can handle the expected load. Interviewers use this to gauge your system intuition.

  • Start from DAU, derive QPS (peak = 2-3x average)
  • Estimate storage per record, project for 3-5 years
  • Bandwidth = QPS x average response size
  • Know common numbers: 86 400 sec/day, 1 M req/day ~ 12 QPS

API Design First

Define the contract before drawing boxes. A clear REST/gRPC API grounds the design and prevents scope creep during the interview.

  • List 3-5 core endpoints with HTTP methods, params, and response shapes
  • Decide REST vs gRPC vs GraphQL based on use-case
  • Consider pagination, idempotency keys, and auth headers up front
  • API versioning strategy (path-based vs header-based)

Data Model Design

Choose the right schema and storage engine before scaling discussions. The data model drives every downstream decision.

  • Identify core entities and relationships (1:1, 1:N, M:N)
  • SQL vs NoSQL decision based on query patterns, not hype
  • Denormalization tradeoffs for read-heavy workloads
  • Index strategy: what queries will hit this table?

High-Level Component Diagram

The core deliverable of the interview: a box-and-arrow diagram showing clients, load balancers, services, databases, caches, and queues.

  • Draw the happy path first, then add failure handling
  • Label every arrow with protocol (HTTP, gRPC, TCP) and data flow direction
  • Keep it to 5-8 major components — too many boxes signals unclear thinking
  • Narrate while drawing: explain why each component exists

Deep-Dive on the Bottleneck Component

After the high-level sketch, the interviewer will zoom in. Identify the bottleneck yourself before they ask — it shows senior-level thinking.

  • Common bottlenecks: database writes, fan-out, hotspot keys, network hops
  • Discuss scaling strategies: sharding, caching, async processing
  • Mention monitoring: how would you detect this bottleneck in production?
  • Propose a fallback or degradation strategy

SLI / SLO / SLA Definitions

Quantify reliability. SLIs are metrics (latency p99), SLOs are targets (99.9%), SLAs are contractual promises. Mentioning these elevates your answer.

  • SLI: the metric you measure (error rate, latency percentile, throughput)
  • SLO: internal target for an SLI (p99 latency < 200ms)
  • SLA: external contract with penalties if SLO is breached
  • Error budgets: how much downtime is acceptable before halting deployments

Core Systems to Practice Full-Focus

Tier 1 — Must Do

URL Shortener

The classic warm-up design problem. Interesting because it covers hashing, collision handling, read-heavy workloads, and analytics at scale.

  • Key components: hashing service, key-value store, redirect service, analytics pipeline
  • Base62 encoding vs MD5-truncation tradeoffs
  • Handle collisions with retry or counter-based approach
  • Read:write ratio is extremely high — perfect cache candidate
  • Custom alias support and TTL expiration

Rate Limiter

Tests understanding of distributed state, sliding windows, and protecting upstream services. Appears frequently in SDE 2 interviews.

  • Key components: rule engine, counter store (Redis), middleware/gateway integration
  • Algorithms: fixed window, sliding window log, sliding window counter, token bucket, leaky bucket
  • Distributed rate limiting with Redis INCR + TTL atomicity
  • Race conditions in multi-node deployments
  • Return 429 with Retry-After header; degrade gracefully

Notification System

Interesting because it combines multiple delivery channels (push, SMS, email), priority queues, and exactly-once delivery challenges.

  • Key components: notification service, priority queue, channel adapters, template engine, preference store
  • Fan-out to millions of devices with rate limiting per channel
  • Idempotency to prevent duplicate notifications
  • User preference and opt-out management
  • Retry with exponential backoff per delivery channel

Chat System (WhatsApp-like)

Tests real-time communication, presence detection, and message ordering. Interesting for its WebSocket management and offline message sync challenges.

  • Key components: WebSocket gateway, chat service, presence service, message store, push notification fallback
  • Connection management: WebSocket with heartbeat, reconnection logic
  • Message ordering with sequence IDs per conversation
  • Offline message queue and read receipts
  • Group chat fan-out strategy and media storage

News Feed / Timeline

The classic fan-out problem. Interesting because it forces a decision between push (fan-out on write) and pull (fan-out on read) models with hybrid approaches.

  • Key components: feed service, fan-out service, post store, social graph, feed cache
  • Push model for normal users, pull model for celebrity accounts (hybrid)
  • Feed ranking: chronological vs ML-based relevance scoring
  • Cache invalidation when posts are deleted or edited
  • Pagination with cursor-based approach for infinite scroll

Search Autocomplete

Fascinating for its latency requirements (<100ms), trie data structures, and the interplay between real-time popularity updates and pre-computed suggestions.

  • Key components: trie service, query aggregation pipeline, ranking service, suggestion cache
  • Trie with frequency counters at each node, top-K heap
  • Offline pipeline updates trie periodically from query logs
  • Prefix matching with character-by-character API calls (debounce on client)
  • Sharding trie by prefix range for horizontal scaling

Distributed Cache

A foundational building block that appears in almost every design. Interesting for consistent hashing, eviction policies, and cache coherence in multi-node setups.

  • Key components: cache nodes, consistent hash ring, client library, eviction engine
  • Eviction policies: LRU, LFU, TTL-based — when to use each
  • Consistent hashing with virtual nodes for balanced distribution
  • Cache-aside, write-through, write-behind patterns
  • Thundering herd / cache stampede mitigation (locking, probabilistic early refresh)

Job Scheduler

Tests distributed coordination, exactly-once execution, and fault tolerance. Interesting because of priority handling, retries, and DAG-based dependency management.

  • Key components: scheduler service, job queue, worker pool, state store, dead-letter queue
  • Cron-based vs event-driven vs DAG-based scheduling
  • Exactly-once execution with distributed locking (Redlock or DB-based)
  • Worker heartbeat and task reassignment on failure
  • Priority queues and resource-based scheduling

Tier 2 — Important

Payment System

Interesting for its strict consistency requirements, idempotency, and the double-entry bookkeeping model. Mistakes here cost real money.

  • Key components: payment gateway, ledger service, reconciliation engine, fraud detector, PSP integration
  • Idempotency keys to prevent duplicate charges
  • Double-entry ledger for every transaction (debit + credit)
  • Saga pattern for distributed transactions across services
  • PCI-DSS compliance: tokenize card data, never store raw PAN

Ride-Sharing Location Service

Interesting for geospatial indexing, real-time location streaming, and the matching algorithm between riders and drivers within a radius.

  • Key components: location ingestion service, geospatial index, matching engine, trip service, ETA calculator
  • Geohash or QuadTree for spatial queries ("drivers within 3km")
  • High-frequency location updates (every 3-5 seconds) via UDP or WebSocket
  • Supply-demand matching with surge pricing signals
  • Eventual consistency acceptable for location; strong consistency for trip state

Video Upload & Streaming

Interesting for its large-blob handling, async transcoding pipeline, adaptive bitrate streaming, and CDN edge-caching strategy.

  • Key components: upload service, transcoding pipeline, video store (S3), CDN, metadata service
  • Chunked/resumable uploads for large files (tus protocol)
  • Transcoding to multiple resolutions and codecs in parallel (DAG pipeline)
  • Adaptive bitrate streaming (HLS/DASH) based on client bandwidth
  • Pre-signed URLs for upload; CDN origin-pull for playback

E-Commerce Order System

Interesting for its inventory consistency challenges, distributed transaction coordination, and the complex state machine of order lifecycle.

  • Key components: order service, inventory service, payment service, fulfillment service, event bus
  • Order state machine: created → payment pending → confirmed → shipped → delivered
  • Inventory reservation with TTL to prevent overselling
  • Saga or 2PC for cross-service consistency (order + payment + inventory)
  • Event sourcing for full audit trail of order changes

Distributed ID Generator

Interesting for its strict uniqueness guarantee, ordering requirements, and the tradeoffs between coordination-based and coordination-free approaches.

  • Key components: ID service nodes, clock sync mechanism, Zookeeper/etcd for node assignment
  • Snowflake approach: timestamp + node ID + sequence number (64-bit)
  • UUID v4 (random) vs UUID v7 (time-ordered) tradeoffs
  • Database auto-increment with stride (node1: 1,3,5; node2: 2,4,6)
  • Clock skew handling and monotonicity guarantees

Logging & Metrics Pipeline

Interesting for its extreme write throughput, the tension between real-time alerting and batch analysis, and storage tiering strategies.

  • Key components: agents (Fluentd), message bus (Kafka), stream processor, time-series DB, alerting engine
  • Push vs pull collection model (Prometheus pull vs StatsD push)
  • Log levels, structured logging (JSON), and correlation IDs
  • Hot/warm/cold storage tiering for cost optimization
  • Sampling strategies at high volume to control cost

Tier 3 — Good to Have

Google Docs Collaborative Editing

Interesting for its real-time conflict resolution, OT/CRDT algorithms, and the challenge of maintaining document consistency across concurrent editors.

  • Key components: WebSocket gateway, operation transform engine, document store, presence service, version history
  • OT (Operational Transformation) vs CRDT tradeoffs
  • Cursor position synchronization across clients
  • Undo/redo in a multi-user context
  • Offline editing with conflict merge on reconnect

Hotel/Flight Booking

Interesting for its double-booking prevention, distributed lock management, and complex pricing that changes in real-time based on availability.

  • Key components: search service, inventory service, booking service, pricing engine, payment integration
  • Optimistic locking for seat/room reservation (version-based)
  • Temporary hold with TTL before payment confirmation
  • Search with heavy caching and async price refresh
  • Overbooking strategy and waitlist management

Ad Click Aggregation

Interesting for its extreme write throughput, exactly-once counting guarantees, and the need for both real-time dashboards and batch reconciliation.

  • Key components: click ingestion, stream processor (Flink/Kafka Streams), aggregation store, reconciliation pipeline
  • Exactly-once semantics with Kafka transactions or deduplication
  • Windowed aggregation: tumbling, sliding, session windows
  • Lambda architecture: real-time stream + batch correction layer
  • Click fraud detection and filtering at ingestion

Distributed File Storage (S3-like)

Interesting for its erasure coding for durability, metadata management at scale, and the separation between data plane and control plane.

  • Key components: API gateway, metadata service, data nodes, placement service, garbage collector
  • Data plane (actual bytes) vs control plane (metadata, routing) separation
  • Erasure coding (Reed-Solomon) vs simple replication for durability
  • Consistent hashing for object placement across data nodes
  • Multipart upload for large objects with parallel chunk transfers

Architecture Patterns Light

Arch Patterns

Microservices vs Monolith Tradeoffs

Know when to split and when to stay monolithic. The answer is almost always "start monolith, split when you have clear domain boundaries."

  • Monolith: simpler deployment, easier debugging, lower latency (in-process calls)
  • Microservices: independent scaling, team autonomy, technology heterogeneity
  • Hidden costs: network latency, distributed tracing, data consistency
  • Modular monolith as a pragmatic middle ground

Event-Driven Architecture

Decouple producers from consumers using events. Essential for scaling write-heavy systems and enabling asynchronous workflows.

  • Event notification vs event-carried state transfer vs event sourcing
  • Message brokers: Kafka (log-based) vs RabbitMQ (queue-based)
  • Eventual consistency is the default; plan for it
  • Dead-letter queues for poison messages

CQRS & Event Sourcing

Separate read and write models for different scaling needs. Event sourcing stores every state change as an immutable event.

  • CQRS: write to normalized store, read from denormalized projections
  • Event sourcing: append-only event log is the source of truth
  • Replay events to rebuild state or create new projections
  • Increased complexity — only use when read/write patterns diverge significantly

Sidecar & Service Mesh

Offload cross-cutting concerns (TLS, retries, observability) to a sidecar proxy instead of embedding them in every service.

  • Sidecar pattern: co-located proxy handles networking (Envoy)
  • Service mesh (Istio, Linkerd): fleet of sidecars with centralized control plane
  • Benefits: mTLS everywhere, traffic shifting, circuit breaking without code changes
  • Cost: increased latency per hop, memory overhead per pod

API Gateway Patterns

Single entry point that handles routing, authentication, rate limiting, and request transformation for all backend services.

  • Request routing, composition, and protocol translation
  • Cross-cutting: auth, rate limiting, logging, CORS
  • Avoid putting business logic in the gateway — keep it thin
  • AWS API Gateway, Kong, or custom (Spring Cloud Gateway)

BFF (Backend for Frontend)

A dedicated backend per frontend type (mobile, web, internal). Prevents a generic API from becoming a bloated "one-size-fits-none" contract.

  • Mobile BFF returns minimal payloads; web BFF returns richer data
  • Each BFF calls shared microservices underneath
  • Reduces over-fetching and under-fetching per client type
  • Tradeoff: more services to maintain; mitigate with shared libraries

Strangler Fig Migration

Incrementally replace a legacy monolith by routing new traffic to new services while the old system still handles existing routes.

  • Route by URL path, feature flag, or percentage-based traffic split
  • Facade layer intercepts requests and decides old vs new backend
  • Gradually migrate one module at a time; never big-bang rewrite
  • Roll back easily by rerouting traffic to the old system

Data Patterns

Fan-Out on Write vs Read

The fundamental tradeoff in feed systems: precompute feeds at write time (fast reads) or assemble at read time (fast writes).

  • Fan-out on write: push post to all follower feeds at publish time
  • Fan-out on read: pull from followed users at read time
  • Hybrid: push for normal users, pull for celebrities (high follower count)
  • Trade storage and write amplification for read latency

Hot Partition Problem

When one shard receives disproportionate traffic (celebrity post, viral event). A make-or-break topic for scaling discussions.

  • Causes: skewed partition key, viral content, time-based keys
  • Solutions: salting keys, splitting hot partitions, dedicated shard
  • Monitoring: track per-partition QPS to detect hotspots early
  • DynamoDB adaptive capacity and Kafka partition reassignment as examples

Write Amplification

When a single logical write triggers many physical writes (LSM compaction, fan-out, denormalized copies). Understand the cost of your data model.

  • LSM-tree compaction in Cassandra/RocksDB is a primary source
  • Denormalized writes: updating one entity across multiple tables/caches
  • Fan-out on write multiplies writes by follower count
  • Measure write amplification factor to predict disk/IO costs

Read-Your-Writes Consistency

After a user writes data, they should always see their own update, even if other users see eventual consistency.

  • Route reads to the primary/leader for the writing user
  • Use session stickiness or read-after-write tokens
  • Client-side: optimistic UI update before server confirmation
  • Stronger than eventual, weaker than strong consistency — a practical middle ground

Infrastructure & Cloud (AWS) Light

AWS Services

Lambda Cold Start Problem

The initialization delay when a Lambda function runs for the first time or after being idle. Critical to mention when proposing serverless in HLD.

  • Cold start = download code + init runtime + init dependencies
  • Typical: 100ms-2s for Java, 50-200ms for Python/Node
  • Mitigations: provisioned concurrency, keep-warm pings, smaller package size
  • SnapStart (Java) pre-initializes and snapshots the execution environment

SQS vs SNS vs EventBridge

Know which AWS messaging service to pick. SQS = point-to-point queue, SNS = pub-sub fan-out, EventBridge = event routing with rules.

  • SQS: buffering, decoupling, exactly-once (FIFO), at-least-once (standard)
  • SNS: fan-out to multiple SQS queues, Lambda, HTTP endpoints
  • EventBridge: content-based routing with rules, schema registry, cross-account
  • Common pattern: SNS → SQS fan-out for reliable multi-consumer delivery

S3 Consistency Model

Since December 2020, S3 provides strong read-after-write consistency for all operations. Know this — interviewers may test outdated knowledge.

  • Strong consistency for PUTs and DELETEs (no more eventual consistency)
  • List operations are also strongly consistent
  • Storage classes: Standard, IA, Glacier for cost tiering
  • S3 Select for server-side filtering before download

RDS vs Aurora

Aurora is AWS's cloud-native relational DB. Know the architectural differences and when Aurora's higher cost is justified.

  • Aurora: shared storage layer (up to 128TB auto-scaling), 6-way replication
  • 5x throughput improvement over standard MySQL on RDS
  • Aurora Serverless v2 for variable workloads (scales to zero)
  • RDS: simpler, cheaper for predictable workloads; supports more engines

ElastiCache Setup

Managed Redis or Memcached for caching. Know when to use each and how to configure replication and failover.

  • Redis: rich data structures, persistence, pub/sub, Lua scripting
  • Memcached: simpler, multi-threaded, no persistence — pure cache
  • Cluster mode: hash slots across shards for horizontal scaling
  • Multi-AZ with automatic failover for high availability

API Gateway Throttling

AWS API Gateway's built-in throttling at account, stage, and method levels. A managed rate limiter for your APIs.

  • Token bucket algorithm at the account level (10,000 RPS default)
  • Usage plans with API keys for per-client throttling
  • Burst vs sustained rate limits
  • 429 TooManyRequests response with Retry-After header

CloudFront CDN Patterns

AWS's CDN for caching static and dynamic content at edge locations. Essential for reducing latency in global-scale systems.

  • Origin pull: edge fetches from origin on cache miss, caches response
  • Cache behaviors: different TTLs for /api/* vs /static/*
  • Lambda@Edge for request/response manipulation at the edge
  • Origin shield to reduce load on your origin server

Containers & K8s

Docker Layer Caching

Each Dockerfile instruction creates a layer. Proper ordering dramatically reduces build times in CI/CD pipelines.

  • Order instructions from least-changing to most-changing
  • COPY package.json first, then npm install, then COPY source code
  • Multi-stage builds to reduce final image size
  • Use .dockerignore to exclude node_modules, .git, etc.

K8s Pod Scheduling Basics

How Kubernetes decides which node to place a pod on. Know the basics for discussing deployment strategies in HLD.

  • Scheduler evaluates resource requests, node affinity, taints/tolerations
  • Resource requests (guaranteed) vs limits (max allowed)
  • Node affinity: soft (preferred) vs hard (required) placement rules
  • Pod anti-affinity to spread replicas across failure domains

HPA vs VPA

Horizontal Pod Autoscaler adds more pods; Vertical Pod Autoscaler resizes existing pods. Know when each applies.

  • HPA: scales pod count based on CPU, memory, or custom metrics
  • VPA: adjusts resource requests/limits per pod (requires restart)
  • HPA preferred for stateless services; VPA for stateful or single-replica workloads
  • Do not use HPA and VPA on the same metric (CPU) simultaneously

Health Probes (Liveness vs Readiness)

Kubernetes uses probes to know when to restart a container (liveness) and when to route traffic to it (readiness).

  • Liveness: is the process alive? Restart if failing (e.g., deadlock detection)
  • Readiness: can this pod handle traffic? Remove from service if failing
  • Startup probe: gives slow-starting apps time before liveness kicks in
  • HTTP, TCP, or exec-based probe types

Service Mesh (Istio Concept)

A dedicated infrastructure layer for managing service-to-service communication with observability, security, and traffic management.

  • Data plane: Envoy sidecar proxies deployed alongside every pod
  • Control plane (istiod): configuration, certificate management, service discovery
  • Features: mTLS, traffic splitting, circuit breaking, distributed tracing
  • Overhead: ~10ms latency per hop, ~50MB memory per sidecar