There is a pervasive misconception in modern backend development that "single-threaded" implies "slow" or "unscalable." In an era where even budget laptops boast eight cores, relying on a single processing unit seems almost archaic. Yet, Redis—the ubiquitous in-memory data store—shatters this assumption.
Despite running its core command execution loop on a single CPU core, Redis routinely handles over 100,000 requests per second on standard hardware, with sub-millisecond latency. How is this possible? The answer lies in a fundamental architectural distinction that many developers overlook: Concurrency is not the same as Parallelism.
Redis proves that high throughput does not require doing many things at the exact same instant (parallelism); rather, it requires dealing with many things efficiently over overlapping time periods (concurrency). In this article, we will move from the high-level concepts of the Event Loop down to the metal of CPU Cache Locality to understand why Redis remains the gold standard for performance.
Concurrency vs. Parallelism: A Developer's Distinction
To understand Redis, we must first agree on the definitions, which are often used interchangeably but mean vastly different things in systems programming.
Concurrency is the ability of an algorithm or program to handle multiple tasks in overlapping time periods. It is about structure. For example, a web server can initiate a database query, and while waiting for the response, it accepts a new incoming HTTP request.
Parallelism is the simultaneous execution of multiple tasks. It is about execution. This requires hardware support, such as multiple CPU cores running different threads at the exact same nanosecond.
The Barista Analogy
Think of a coffee shop:
- Parallelism: You have three baristas. They can make three lattes simultaneously. If one is slow, the others continue. This scales well but requires coordination (who uses the milk steamer? who takes the next order?).
- Concurrency (The Redis Way): You have one super-efficient barista. They take an order, initiate the espresso shot, and while the machine is pouring, they steam the milk. They never stand still waiting for a process to finish. They are "context switching" effectively between tasks, handling hundreds of orders an hour alone.
Redis is that single, hyper-optimized barista. It is concurrent because it manages thousands of client connections via time-slicing, but it is not parallel in its command execution—it processes one command at a time, sequentially.
The Architecture: Reactor Pattern and Non-Blocking I/O
Redis achieves this high concurrency through the Reactor Pattern. In this architecture, a single thread runs an event loop that watches for I/O events (like a socket becoming readable) and dispatches them to the appropriate handler.
I/O Multiplexing
At the system level, Redis delegates the heavy lifting of waiting for connections to the operating system kernel using I/O Multiplexing technologies: epoll (on Linux) or kqueue (on BSD/macOS).
Unlike older blocking I/O models where a thread sleeps until data arrives, or inefficient polling methods like select and poll (which scale poorly because they iterate over entire lists of descriptors), epoll is event-driven and operates with O(1) complexity.
When a client sends a command, the flow looks like this:
- Socket Readable: The network card receives packets; the kernel signals via
epollthat a specific file descriptor is ready for reading. - Event Loop Triggers: The Redis main thread wakes up and reads the data from the socket.
- Command Execution: Redis parses the command and executes it in memory.
- Reply: The result is written back to the socket buffer.
Because Redis operations are mostly in-memory logic (Hash map lookups, list pushes), the "execution" phase is incredibly fast—often taking nanoseconds. The thread rarely blocks, keeping the loop spinning rapidly.
The 'Why': Performance Gains by Avoiding Threads
If multi-threading allows for parallelism, why did Redis's creator, Salvatore Sanfilippo, staunchly avoid it for so long? The answer is that threads come with significant overhead.
The Cost of Context Switching
In a multi-threaded environment, the OS scheduler must constantly swap threads in and out of the CPU. This is a Context Switch. To switch threads, the CPU must save the current thread's state (registers, stack pointer, program counter) and load the next thread's state. While this takes microseconds, doing it thousands of times per second destroys throughput.
Lock-Free Execution
Multi-threading introduces the "Shared State" problem. If two threads try to update the same key simultaneously, data corruption occurs. To prevent this, developers use Mutexes or Semaphores.
Locks introduce latency. Threads must wait for locks to release, and managing lock contention is complex. Because Redis is single-threaded, it requires zero locks for data manipulation. Every operation is atomic by default. INCR key is safe because no other thread can touch key while the instruction executes.
CPU Cache Locality
This is perhaps the biggest hidden performance booster. CPUs have L1, L2, and L3 caches that are orders of magnitude faster than main RAM.
- L1 Cache: ~1 nanosecond latency
- RAM: ~100 nanoseconds latency
In a multi-threaded system, when a thread migrates to a different core, the cache is cold. Furthermore, cores must synchronize their caches (Cache Coherency), leading to "Cache Thrashing." A single-threaded process stays pinned to a core, keeping hot data in the L1/L2 cache, resulting in maximum CPU efficiency.
Modern Redis: Why 6.0 Introduced I/O Threads
For years, the single-threaded dogma held firm. However, Redis 6.0 introduced a major change: Threaded I/O. Why the shift?
The Bottleneck Shift
Redis operations are fast, but reading requests from the network and writing responses back involves system calls and memory copying. With 100k+ connections or large payloads, the time spent just parsing the network packets began to dominate the CPU time, starving the actual command execution.
The Solution
Redis 6.0 allows you to configure io-threads. Crucially, this does not make command execution parallel. The architecture remains:
- Main Thread: Offloads the reading/parsing of sockets to background threads.
- I/O Threads: Parse the bytes into commands.
- Main Thread: Executes the commands atomically (sequentially).
- I/O Threads: Write the response bytes back to the sockets.
This approach preserves the lock-free simplicity of the data store while parallelizing the network stack overhead. On multi-core machines, enabling I/O threading can essentially double the throughput.
# inside redis.conf
io-threads 4
io-threads-do-reads yesConclusion: Simplicity as a Feature
Redis teaches us a valuable lesson in software architecture: complexity is not a prerequisite for scale. By utilizing the Event Loop and Non-blocking I/O, Redis outperforms systems that rely on complex, lock-heavy multi-threading architectures.
Scaling isn't always about throwing more cores at a problem; often, it is about removing the overhead that slows those cores down—context switching, lock contention, and cache misses. When designing high-performance backends, consider your bottleneck. If it's CPU computation, add threads. If it's I/O and coordination, a single-threaded event loop might just be the fastest path forward.
Building secure, privacy-first tools means staying ahead of security threats. At ToolShelf, all operations happen locally in your browser—your data never leaves your device.
Check out our Hash Generator and other developer utilities to streamline your workflow.
Stay secure & happy coding,
— ToolShelf Team