Pillar 3 — Databases | SDE 2 Prep Tracker

MySQL Internals

Full-Focus

Tip: Know how to read EXPLAIN output cold. Very common at product companies. Practice on real queries.

Indexing

B-Tree Structure

InnoDB stores data in a B+ tree. Understanding the node layout, page splits, and pointer-following is key to explaining why some queries are fast and others aren't.

B+ tree leaf nodes hold row data (clustered) or PK pointers (secondary)
Page size defaults to 16 KB; controls how many keys fit per node
Balanced height (typically 3-4 levels for millions of rows) means O(log n) lookups
Inserts into a full page cause a page split, impacting write performance

MySQL Docs: InnoDB Index Types Baeldung: B-Tree vs B+ Tree

Clustered vs Secondary Index

InnoDB has exactly one clustered index (the PK). Every secondary index stores a copy of the PK, so lookups via secondary indexes require a double lookup unless the index covers the query.

Clustered index = data is physically ordered by PK
Secondary index leaf stores PK value, not a row pointer
Double lookup: secondary index → PK value → clustered index → row
Wide PKs bloat every secondary index

MySQL Docs: Clustered & Secondary Baeldung: Primary vs Secondary Index

Composite Index — Column Order Matters

A composite index on (A, B, C) can serve queries filtering on A, (A,B), or (A,B,C) efficiently, but not on B alone or C alone. The leftmost prefix rule is one of the most frequently tested MySQL concepts.

Leftmost prefix rule: index is useful only from the leftmost column
Range condition on a column stops index use for subsequent columns
ORDER BY can also leverage the index if column order matches
Use EXPLAIN to verify which key_len portion of the index is actually used

MySQL Docs: Multiple-Column Indexes Baeldung: Composite Indexes

Covering Index

When an index contains all the columns a query needs, MySQL reads data entirely from the index without touching the clustered index. EXPLAIN shows "Using index" in the Extra column.

Eliminates the secondary-to-clustered lookup ("bookmark lookup")
Significantly reduces I/O for read-heavy queries
Trade-off: wider indexes consume more storage and slow writes

MySQL Docs: Index Optimization Baeldung: Covering Index

Index Selectivity

Selectivity measures how many distinct values an index column has relative to total rows. High selectivity (e.g., email) makes an index effective; low selectivity (e.g., boolean) often doesn't justify an index.

Selectivity = COUNT(DISTINCT col) / COUNT(*)
MySQL optimizer may skip a low-selectivity index in favour of a full scan
Cardinality stats in SHOW INDEX help assess selectivity
Prefix indexes can help on long strings, but reduce selectivity

MySQL Docs: SHOW INDEX Baeldung: SQL Indexes

EXPLAIN Output Reading

The EXPLAIN statement shows the query execution plan: which indexes MySQL chose, the join order, estimated rows, and access type. Mastering this output is essential for diagnosing slow queries.

Key columns: type (ALL, ref, range, const), key, rows, Extra
type=ALL means full table scan — usually bad
Look for "Using filesort" and "Using temporary" as red flags
EXPLAIN ANALYZE (MySQL 8.0.18+) shows actual execution times
FORMAT=JSON gives cost estimates and optimizer decisions

MySQL Docs: EXPLAIN Output MySQL Docs: Using EXPLAIN

DB Transactions

ACID Guarantees

Atomicity, Consistency, Isolation, Durability form the reliability contract of relational databases. Interview discussions often test whether you understand the trade-offs each property implies.

Atomicity: undo log ensures all-or-nothing via rollback
Consistency: application + DB constraints enforce valid state transitions
Isolation: MVCC + locking prevent concurrent transaction interference
Durability: redo log (WAL) ensures committed data survives crash

MySQL Docs: ACID Compliance Baeldung: ACID Transactions

MVCC in InnoDB

Multi-Version Concurrency Control lets readers see a consistent snapshot without blocking writers. InnoDB implements this using hidden row versions, undo log chains, and read views.

Each row has hidden DB_TRX_ID and DB_ROLL_PTR columns
Readers construct a read view at transaction start (REPEATABLE READ) or statement start (READ COMMITTED)
Undo log chains let old versions be reconstructed on the fly
Purge thread cleans up versions no longer visible to any transaction

MySQL Docs: InnoDB Multi-Versioning Baeldung: MVCC

Isolation Levels — Read Phenomena

The four standard isolation levels (READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ, SERIALIZABLE) trade off consistency against concurrency. Understand which read phenomena each level permits.

Dirty read: seeing uncommitted data (only RU)
Non-repeatable read: row changes between two reads in same txn (RU, RC)
Phantom read: new rows appear in range query (RU, RC; InnoDB RR prevents via gap locks)
InnoDB default is REPEATABLE READ, which avoids phantoms via next-key locks
SERIALIZABLE adds shared locks on all reads — safe but slow

MySQL Docs: Isolation Levels Baeldung: Isolation Levels

Gap Locks vs Record Locks vs Next-Key Locks

InnoDB uses three lock types to prevent phantoms and ensure isolation. Understanding the difference is critical for debugging deadlocks and designing concurrent writes.

Record lock: locks a single index record
Gap lock: locks the gap between two index records (prevents inserts)
Next-key lock = record lock + gap lock on the gap before the record
Gap locks are only used at REPEATABLE READ and above
Can cause unexpected deadlocks when concurrent transactions lock overlapping gaps

MySQL Docs: InnoDB Locking Baeldung: Gap Locks

Deadlock in DB Layer

A deadlock occurs when two transactions each hold a lock the other needs. InnoDB automatically detects deadlocks and rolls back one transaction. Knowing how to prevent and debug them is essential.

InnoDB runs a wait-for graph detector; victim is the transaction with least undo
SHOW ENGINE INNODB STATUS shows the last detected deadlock
Prevention: always lock rows in the same order across transactions
Keep transactions short and avoid holding locks during external calls

MySQL Docs: Deadlocks Baeldung: Deadlock

Query Optimization

Query Execution Plan

MySQL's optimizer transforms SQL into a physical execution plan: choosing join order, index access paths, and temp-table strategies. Understanding this pipeline helps you write queries the optimizer can handle well.

Parser → Resolver → Optimizer → Executor pipeline
Cost-based optimizer uses table statistics and index cardinality
ANALYZE TABLE refreshes statistics for better plans
Subquery vs JOIN: optimizer may convert correlated subqueries to joins

MySQL Docs: SELECT Optimization Baeldung: Query Optimization

Index Hints

USE INDEX, FORCE INDEX, and IGNORE INDEX let you override the optimizer's index choice. Use sparingly — the optimizer is usually right, but sometimes stale statistics lead it astray.

USE INDEX(idx): suggest an index, optimizer may still ignore
FORCE INDEX(idx): only consider this index or full scan
IGNORE INDEX(idx): exclude a specific index from consideration
Hints are a band-aid; prefer fixing statistics or restructuring queries

MySQL Docs: Index Hints Baeldung: Query Hints

Slow Query Log

MySQL can log queries that exceed a time threshold. This is the primary tool for identifying performance bottlenecks in production without code changes.

Enable with slow_query_log=ON and long_query_time (seconds)
log_queries_not_using_indexes captures unindexed queries
mysqldumpslow or pt-query-digest aggregate and rank slow queries
Performance Schema provides more granular profiling data

MySQL Docs: Slow Query Log Baeldung: MySQL Slow Queries

Partitioning Strategies

Table partitioning splits a large table into smaller physical pieces while keeping it logically unified. It can dramatically speed up range scans and bulk deletes on very large tables.

RANGE partitioning: ideal for time-series data (partition by month/year)
LIST partitioning: discrete categories (status codes, regions)
HASH partitioning: even distribution when no natural range exists
Partition pruning: optimizer skips irrelevant partitions when WHERE matches the partition key
Unique indexes must include the partition key

MySQL Docs: Partitioning Baeldung: Table Partitioning

Sharding vs Partitioning

Partitioning is a single-server optimization; sharding distributes data across multiple database servers. Knowing when to use each (and the operational cost of sharding) is a common system design question.

Partitioning: same server, transparent to application, limited by single-node resources
Sharding: multiple servers, application-aware routing, near-linear horizontal scale
Shard key selection is critical: poor keys cause hot spots
Cross-shard joins and transactions are expensive — design to avoid them
Resharding is painful; plan the key carefully upfront

MySQL Docs: Partitioning Baeldung: Database Sharding

MongoDB

Both

MongoDB Core

Document Model — Embedding vs Referencing

MongoDB stores data as BSON documents. The key design decision is whether to embed related data inside one document or use references across collections — each has performance and consistency trade-offs.

Embed when data is read together and the subdocument won't grow unboundedly
Reference when the related data is large, changes independently, or is shared
Embedded reads are atomic and fast (single document fetch)
References require application-level joins ($lookup) which are slower
16 MB document size limit constrains embedding

MongoDB Docs: Data Model Design Baeldung: Embedded Documents

Index Types (Compound, Text, Geo, TTL)

MongoDB supports several specialized index types beyond the default single-field B-tree. Choosing the right one directly impacts query performance and enables features like full-text search and auto-expiry.

Compound index: multi-field, follows leftmost prefix rule like MySQL
Text index: full-text search with stemming and scoring
2dsphere / 2d index: geospatial queries ($near, $geoWithin)
TTL index: automatically deletes documents after a specified time
Wildcard index: indexes all fields in a document (useful for dynamic schemas)

MongoDB Docs: Indexes Baeldung: MongoDB Indexes

Aggregation Pipeline Stages

The aggregation pipeline processes documents through a sequence of stages ($match, $group, $project, etc.). It is MongoDB's equivalent of SQL GROUP BY + subqueries and is heavily used in interviews.

$match early to reduce pipeline volume (uses indexes)
$group: accumulate values (sum, avg, push to array)
$lookup: left outer join to another collection
$unwind: flatten arrays for per-element processing
$facet: run multiple pipelines in parallel on the same input

MongoDB Docs: Aggregation Pipeline Baeldung: MongoDB Aggregations

Write Concern & Read Preference

Write concern controls how many replica nodes must acknowledge a write before it's confirmed. Read preference controls which node serves reads. Together they define the consistency-availability trade-off.

w:1 (default): acknowledged by primary only
w:"majority": acknowledged by majority of nodes — durable across failover
Read preference: primary (strong consistency), secondary (eventual, reduced primary load)
primaryPreferred: reads from primary, falls back to secondary if unavailable

MongoDB Docs: Write Concern MongoDB Docs: Read Preference

Replica Sets

A replica set is a group of MongoDB nodes that maintain the same data. One primary handles writes; secondaries replicate via the oplog. This is the foundation of MongoDB high availability.

Minimum 3 members (or 2 + arbiter) for automatic failover
Oplog: capped collection on primary that secondaries tail
Election uses Raft-like consensus; priority and votes are configurable
Secondaries can serve reads (read preference), reducing primary load

MongoDB Docs: Replication Baeldung: Replica Sets

Transactions in MongoDB 4+

Starting with version 4.0, MongoDB supports multi-document ACID transactions across replica sets (and sharded clusters from 4.2). They work like SQL transactions but have performance caveats.

Use sessions and startTransaction / commitTransaction APIs
Snapshot isolation: reads see a consistent point-in-time view
60-second default time limit; long transactions risk oplog pressure
Design for single-document atomicity first; use multi-doc transactions only when necessary

MongoDB Docs: Transactions Baeldung: MongoDB Transactions

Schema Design

Design for Access Patterns

In MongoDB, schema design is driven by how you query the data, not by normalization rules. Map out your read/write patterns first, then structure documents to serve the most common queries efficiently.

Group data that is queried together into the same document
Separate data that is updated independently or has different access frequencies
Denormalize read-heavy paths; accept some write duplication
Use the bucket pattern for time-series or IoT data

MongoDB Docs: Data Model Design MongoDB Docs: Data Modeling

Avoid Unbounded Arrays

Appending endlessly to an array field will eventually hit the 16 MB document limit and degrade performance due to document growth and in-place update failures. This is a classic MongoDB anti-pattern.

Arrays that grow without bound cause document migrations (larger BSON)
Large arrays slow down reads even if you only need a subset
Solution: cap arrays or move overflow to a separate collection (bucket pattern)
Use $slice or $push with $each and $position to limit array size

MongoDB Docs: One-to-Many Patterns MongoDB Docs: $slice

When to Normalize vs Denormalize

Normalization (references) avoids duplication but requires joins. Denormalization (embedding) duplicates data but gives fast, atomic reads. The right balance depends on your read/write ratio and consistency needs.

Denormalize for read-heavy workloads where consistency lag is acceptable
Normalize for data that changes frequently and is shared across documents
Hybrid: embed summary fields, reference the full record
Duplicated data requires an update strategy (change streams, batch sync)

MongoDB Docs: Data Model Design Baeldung: Normalization vs Denormalization

ClickHouse & OLAP

Light

Tip: You have real production experience here — mostly conceptual revision to nail the vocabulary.

OLAP Concepts

Column-Store Advantages

Column-oriented storage reads only the columns a query needs, enabling massive compression ratios and vectorized processing. This is why OLAP databases like ClickHouse can scan billions of rows per second.

Only touched columns are loaded from disk — dramatic I/O reduction
Same-type values compress 5-10x better than row-oriented storage
Vectorized execution: operates on column batches, CPU-cache friendly
Poor for point lookups or row-level updates (OLTP workloads)

ClickHouse Docs: Why ClickHouse is Fast Baeldung: Columnar Databases

MergeTree Engine

MergeTree is ClickHouse's primary table engine. Data is written in sorted "parts" that are merged in the background. The primary key defines sort order and enables efficient range scans.

Data is stored in sorted parts; background merges reduce part count
Primary key is a sparse index (not unique like RDBMS) — stores one entry per granule (8192 rows default)
ORDER BY clause defines sort order; primary key defaults to it
Supports data skipping indexes (minmax, set, bloom_filter)

ClickHouse Docs: MergeTree ClickHouse Docs: Creating Tables

Materialized Views

ClickHouse materialized views act as insert triggers: they transform data on the fly as it arrives and store results in a target table. This enables real-time pre-aggregation without ETL pipelines.

Triggered on INSERT to the source table; processes only new data
Target table typically uses AggregatingMergeTree or SummingMergeTree
Great for dashboards that need sub-second aggregations on live data
Multiple materialized views can consume the same source table

ClickHouse Docs: Materialized Views ClickHouse Docs: Cascading MVs

ReplicatedMergeTree

ReplicatedMergeTree adds replication to MergeTree using ZooKeeper (or ClickHouse Keeper) for coordination. Each replica independently fetches and merges data parts, providing high availability.

ZooKeeper/Keeper stores metadata and coordinates part transfers
Each replica is eventually consistent; INSERT waits for configurable quorum
insert_quorum setting controls how many replicas must confirm a write
Replicas auto-recover missing parts from peers

ClickHouse Docs: Replication ClickHouse Docs: Replication Architecture

ClickHouse vs Druid vs BigQuery

All three are OLAP systems, but they differ in architecture, operational model, and cost. Being able to articulate when you'd pick one over another shows depth in data platform thinking.

ClickHouse: self-hosted, SQL-native, fastest for single-table scans, needs ops investment
Druid: real-time ingestion focus, segment-based, good for mixed real-time + historical
BigQuery: fully managed, serverless, pay-per-query, best for irregular/batch workloads
ClickHouse shines when you need low-latency on high-QPS analytic queries

ClickHouse Docs: Why ClickHouse ClickHouse Docs: Distinctive Features

OLAP Query Patterns

Aggregation at Scale

ClickHouse excels at aggregating billions of rows in seconds. Understanding how it parallelizes GROUP BY across cores and merges partial states is key to writing efficient analytic queries.

Parallel GROUP BY: each thread aggregates its data part, then results are merged
Approximate functions (uniq, quantileTDigest) trade accuracy for speed
Hash tables used for GROUP BY must fit in memory; large cardinality may spill to disk
Pre-filter with WHERE before GROUP BY to reduce data volume

ClickHouse Docs: GROUP BY ClickHouse Docs: Aggregate Functions

Pre-aggregation Strategies

For dashboards requiring sub-second latency, pre-aggregating data at insert time (via materialized views or summary tables) avoids scanning raw data on every query.

SummingMergeTree: auto-sums numeric columns during merges
AggregatingMergeTree: stores intermediate aggregate states (count, avg, quantile)
Materialized views populate summary tables on INSERT
Trade-off: more storage and complexity for faster reads

ClickHouse Docs: SummingMergeTree ClickHouse Docs: AggregatingMergeTree

TTL for Data Lifecycle

ClickHouse supports TTL rules at column and table level to automatically delete or move old data. This is essential for managing storage costs on high-volume event data.

Table TTL: DELETE or MOVE TO VOLUME/DISK after a time expression
Column TTL: set individual columns to default value after expiry
Enables tiered storage: hot SSD → cold HDD → delete
TTL is evaluated during part merges, not continuously

ClickHouse Docs: TTL ClickHouse Docs: ALTER TTL

Redis Deep Dive

Both

Redis Data Structures

String / Hash / List / Set / ZSet Internals

Redis's five core data structures each use optimized internal encodings. Knowing the underlying representations helps you pick the right structure and predict memory usage.

String: SDS (Simple Dynamic String); int encoding for numbers ≤ LLONG_MAX
Hash: ziplist for small hashes (≤128 entries), hashtable for larger
List: quicklist (linked list of ziplists) since Redis 3.2
Set: intset for all-integer small sets, hashtable otherwise
ZSet (Sorted Set): skiplist + hashtable for O(log n) rank operations

Redis Docs: Data Types Baeldung: Redis Data Types

HyperLogLog

HyperLogLog is a probabilistic data structure that estimates the cardinality of a set using ~12 KB of memory, regardless of set size. Perfect for counting unique visitors, IPs, or events at scale.

PFADD adds elements; PFCOUNT returns approximate unique count
Standard error: ~0.81% — accurate enough for analytics
PFMERGE combines multiple HLLs (e.g., merge daily counts into weekly)
Fixed 12 KB regardless of millions or billions of elements

Redis Docs: HyperLogLog Baeldung: Redis HyperLogLog

Bloom Filter (RedisBloom)

A Bloom filter answers "is this element in the set?" with no false negatives but possible false positives. RedisBloom module adds native BF.ADD / BF.EXISTS commands to Redis.

Space-efficient: uses bit array + multiple hash functions
False positive rate configurable at creation (e.g., 1%)
Use case: prevent duplicate processing, cache penetration protection
Cannot delete elements (use Cuckoo filter if deletions needed)

Redis Docs: Bloom Filter Baeldung: Bloom Filter

Streams

Redis Streams is an append-only log data structure with consumer group support, similar to Kafka. It enables event-driven architectures with exactly-once-style processing semantics within Redis.

XADD appends entries; XREAD reads from a position; XRANGE scans a range
Consumer groups: multiple consumers share work, with per-consumer acknowledgement
XACK confirms processing; pending entries list (PEL) tracks unacked messages
Suitable for lightweight event streaming without deploying Kafka

Redis Docs: Streams Baeldung: Redis Streams

Caching Patterns

Cache-Aside vs Write-Through vs Write-Behind

The three fundamental caching strategies differ in how they synchronize cache and database. Choosing the right one affects consistency, latency, and complexity.

Cache-aside (lazy loading): app checks cache, loads from DB on miss, populates cache
Write-through: every write goes to cache AND DB synchronously — strong consistency but higher write latency
Write-behind (write-back): writes go to cache first, async flush to DB — fast writes but risk data loss
Cache-aside is most common in practice; write-through for strict consistency needs

Redis Docs: Client-Side Caching Baeldung: Caching Strategies

TTL Expiry Strategies

Redis uses a combination of lazy expiry (check on access) and active expiry (periodic sampling) to remove expired keys. Understanding this prevents surprises with memory usage and stale data.

Lazy expiry: key is checked on GET; if expired, it is deleted and returns nil
Active expiry: Redis samples 20 random keys with TTL 10 times/sec, deleting expired ones
If >25% of sampled keys are expired, it repeats immediately (adaptive)
Very short TTLs on many keys can cause CPU spikes during active expiry

Redis Docs: EXPIRE Baeldung: Redis TTL

Cache Stampede — Mutex Lock Solution

When a popular key expires, many concurrent requests may simultaneously attempt to rebuild the cache, overwhelming the database. The mutex lock pattern ensures only one thread rebuilds while others wait.

SET key NX EX (SETNX with TTL) acquires a distributed lock on the rebuild key
The lock winner fetches from DB and populates cache; losers retry with backoff
Alternative: probabilistic early expiration (stagger TTLs)
Another approach: never expire, use background refresh with logical TTL

Redis Docs: SET (NX option) Baeldung: Cache Stampede

Distributed Lock (Redlock)

Redlock is an algorithm for distributed locking across multiple independent Redis instances. It is used when correctness matters more than what a single-instance SETNX lock can guarantee.

Acquire lock on N/2+1 of N independent Redis masters with clock-bounded TTL
If majority acquired within validity time, lock is held
Controversial: Martin Kleppmann argued it is unsafe under clock skew and GC pauses
For most practical cases, single-instance SETNX with fencing tokens suffices

Redis Docs: Distributed Locks Baeldung: Redlock

Redis Clustering

Sentinel vs Cluster Mode

Sentinel provides automatic failover for a single master-replica setup. Cluster mode shards data across multiple masters. The choice depends on whether you need HA only or HA + horizontal scaling.

Sentinel: monitors one master + replicas, promotes replica on failure, no sharding
Cluster: data partitioned across 16384 hash slots on multiple masters
Sentinel is simpler and sufficient when data fits on one node
Cluster is required when dataset exceeds single-node memory

Redis Docs: Sentinel Redis Docs: Cluster

Hash Slots

Redis Cluster divides the key space into 16384 hash slots. Each master owns a subset of slots, and clients route commands based on CRC16(key) mod 16384.

CRC16 hash of the key determines the slot; slot maps to a specific master
Hash tags: {user:123}.profile forces keys to the same slot for multi-key commands
Resharding moves slots between nodes with minimal downtime (MIGRATE)
MOVED and ASK redirections tell clients to retry on the correct node

Redis Docs: Cluster Specification Baeldung: Redis Cluster

Hot Key Problem

When a single key receives disproportionate traffic, it overloads the node owning that slot. This is a common Redis scaling bottleneck in high-traffic systems like flash sales or viral content.

All requests for one key route to one node — no cluster parallelism
Solution 1: key splitting — shard the value across key:1, key:2, ..., key:N
Solution 2: local caching with short TTL to absorb reads
Solution 3: read replicas for the hot key's slot (cluster read replicas)

Redis Docs: Benchmarks Baeldung: Redis Cluster

Memory Eviction Policies

When Redis hits maxmemory, it must evict keys to make room. The eviction policy determines which keys are removed. Choosing the wrong policy can silently drop important data.

noeviction: return errors on writes when full (safe but disruptive)
allkeys-lru: evict least recently used across all keys (most common)
volatile-lru: evict LRU only among keys with TTL set
allkeys-lfu: evict least frequently used (Redis 4.0+, better for skewed access)
volatile-ttl: evict keys closest to expiration

Redis Docs: Eviction Baeldung: Redis Eviction Policies

Database Selection & CAP Theory

Light

CAP Theory

CAP Theorem Applied to Real Systems

CAP states a distributed system can only guarantee two of Consistency, Availability, and Partition tolerance simultaneously. In practice, P is non-negotiable, so the real choice is between C and A during partitions.

CP systems (e.g., ZooKeeper, HBase): reject requests during partition to stay consistent
AP systems (e.g., Cassandra, DynamoDB): serve requests during partition, may return stale data
MySQL single-node is CA — but CA is meaningless in distributed contexts
Most real systems are tunable: MongoDB (w:majority = CP, w:1 = AP-leaning)

Baeldung: CAP Theorem MongoDB Docs: Read Isolation & Consistency

Eventual Consistency Patterns

In eventually consistent systems, replicas converge over time. Understanding the practical patterns (read repair, anti-entropy, hinted handoff) helps you design systems that behave well under eventual consistency.

Read repair: fix stale replica when a read detects divergence
Anti-entropy: background Merkle-tree comparison to sync replicas
Hinted handoff: temporarily store writes for an unavailable node, replay later
Last-write-wins (LWW) is simple but can lose concurrent updates

Baeldung: Eventual Consistency MongoDB Docs: Replication

Read-Your-Writes

Read-your-writes consistency guarantees that after a client writes, subsequent reads by the same client reflect that write. This is critical for user experience even in eventually consistent systems.

Achieved by routing reads to the same node that handled the write
Alternative: use a version token / timestamp; reject reads behind that token
MongoDB: use readConcern "majority" + readPreference "primary" for the same session
Without this, a user may save data and then see the old version on page refresh

MongoDB Docs: Causal Consistency Baeldung: Consistency Models

Monotonic Reads

Monotonic read consistency guarantees a client never sees an older version of data after having seen a newer one. Without it, successive reads from different replicas can appear to "go back in time."

Problem: load balancer routes read 1 to an up-to-date replica, read 2 to a stale one
Solution: sticky sessions — pin a client to one replica for reads
Alternative: track read timestamp; only read from replicas at or past that point
MongoDB causal sessions provide monotonic reads automatically

MongoDB Docs: Causal Consistency Baeldung: Consistency Guarantees

CRDT Basics

Conflict-free Replicated Data Types are data structures that can be updated independently on different replicas and merged without conflicts. They are the theoretical foundation for many AP system designs.

State-based (CvRDT): replicas periodically send full state, merge via join
Operation-based (CmRDT): replicas broadcast operations, must be commutative
Examples: G-Counter, PN-Counter, LWW-Register, OR-Set
Used in Redis Enterprise (CRDB), Riak, and collaborative editing

Redis Docs: Active-Active (CRDT) Baeldung: CRDTs in Java

DB Tradeoffs

RDBMS vs Document vs Column vs Graph

Each database paradigm optimizes for different access patterns. Knowing when to use which is a staple system design interview question.

RDBMS (MySQL, PostgreSQL): structured data, complex joins, ACID — default choice for transactional workloads
Document (MongoDB): flexible schema, nested data, horizontal scale — good for content, catalogs
Column (ClickHouse, Cassandra): analytic aggregations or wide-row time-series
Graph (Neo4j): relationship-heavy queries (social networks, fraud detection, recommendations)
Pick based on primary access pattern, not hype

MongoDB Docs: Data Modeling Baeldung: Types of Databases

NewSQL Tradeoffs

NewSQL systems (CockroachDB, TiDB, Spanner) promise SQL semantics with horizontal scalability. They solve the sharding pain of MySQL but introduce new operational complexity and latency characteristics.

Distributed SQL: automatic sharding, distributed transactions, strong consistency
Spanner uses TrueTime (atomic clocks) for global serializable isolation
CockroachDB and TiDB use Raft consensus for replication
Higher write latency than single-node MySQL due to consensus overhead
Good fit when you outgrow single-node RDBMS but need ACID guarantees

Baeldung: NewSQL Databases MySQL Docs: Group Replication

Polyglot Persistence Rationale

Polyglot persistence means using different databases for different parts of the same application, each chosen for its strengths. It is the practical application of "no single database does everything well."

Example: MySQL for orders (ACID), Redis for sessions (speed), ClickHouse for analytics (OLAP), MongoDB for product catalog (flexible schema)
Benefits: each store is used where it excels, better performance and scalability
Costs: operational overhead, data synchronization complexity, more failure modes
Need a clear data flow strategy (CDC, event sourcing, or dual writes)

Baeldung: Polyglot Persistence MongoDB Docs: Data Modeling

Recommended Resources

MySQL 8.0 Reference Manual — Official docs covering InnoDB internals, optimization, and administration
MongoDB Manual — Complete reference for data modeling, aggregation, replication, and transactions
ClickHouse Documentation — Engine types, query optimization, and operational guides
Redis Documentation — Data types, commands, clustering, persistence, and patterns
Baeldung: Database Concepts — Tutorials on indexing, transactions, caching, and distributed DB theory
MySQL Optimization Guide — EXPLAIN, query tuning, index best practices, and server configuration