Elasticsearch vs. ClickHouse: The Modern Log Management Showdown

It is a ritual as old as the cloud itself: you set up an ELK (Elasticsearch, Logstash, Kibana) stack to monitor your application. At first, it feels like magic. You have total visibility. Then, your traffic scales. Suddenly, the magic fades, replaced by 'bill shock' and cluster instability. The cost of indexing every single log line starts to rival the cost of the production infrastructure generating those logs.

For over a decade, Elasticsearch has been the undisputed champion of log management. Its ability to provide Google-like search over text data made it the default choice. However, the observability landscape is shifting. As data volumes explode exponentially, the overhead of the 'inverted index'—Elasticsearch's core mechanism—is becoming a liability.

Enter ClickHouse. Originally designed for high-performance analytics (OLAP), ClickHouse has emerged as a formidable challenger in the observability space. This article argues a core thesis that is reshaping our industry: while Elasticsearch remains superior for complex full-text relevance search, ClickHouse is vastly superior for modern observability, offering a far more efficient architecture based on columnar storage that solves the log explosion problem.

Elasticsearch vs ClickHouse architecture comparison showing row based vs columnar storage
Figure 1: The Observability Landscape Shift

Under the Hood: Architecture Wars

To understand the performance delta, we must look at how these two systems store data on disk. It is a battle between the Inverted Index and Columnar Storage.

Elasticsearch: The Inverted Index

At its heart, Elasticsearch is a distributed wrapper around Apache Lucene. When a log line is ingested, it is tokenized. The sentence "Error connecting to database" is broken down into ["error", "connecting", "to", "database"]. Elasticsearch then builds an inverted index—essentially a massive glossary mapping every unique term to the document IDs containing it.

This architecture allows for O(1) retrieval speeds. If you need to find a 'needle in a haystack,' Lucene is unbeatable. However, this comes with a heavy tax. Every write operation requires CPU-intensive tokenization and index updating. Furthermore, the index itself takes up massive amounts of disk space, leading to high write amplification.

ClickHouse: Columnar Storage & Compression

ClickHouse takes a radically different approach. It uses the MergeTree engine, which stores data by column rather than by row. Instead of indexing every unique term, ClickHouse relies on a sparse primary index (usually one entry per 8,192 rows) and data skipping indices.

When you search for a log in ClickHouse, it doesn't look up a term in a glossary; it scans the column using vectorized SIMD instructions. Because it scans data so fast (gigabytes per second), it avoids the overhead of maintaining a heavy index. It relies on brute-force speed paired with aggressive compression, making it significantly more write-efficient.

Performance Showdown: Ingest vs. Query

Architecture dictates performance. Let’s look at how this plays out in production scenarios.

Ingestion Throughput

Elasticsearch is notoriously CPU-bound during ingestion. Parsing JSON, tokenizing strings, and merging Lucene segments consumes significant resources. During a production outage—exactly when you need your logs the most—traffic spikes can cause Elasticsearch ingestion queues to back up, resulting in significant visibility lag.

ClickHouse, conversely, offers linear insert speeds. Because it doesn't need to build a complex index on write, it can ingest millions of rows per second on modest hardware. It effectively treats logs as a stream to be appended to disk, ensuring real-time visibility even during massive traffic surges.

This is where the use case matters most. We must distinguish between Log Search (debugging) and Log Analytics (trends).

Scenario A: "Find all logs containing Error 500"
Elasticsearch wins on pure retrieval speed for rare terms. However, ClickHouse is surprisingly competitive. Using token-bloom filters or n-gram skipping indices, ClickHouse provides "good enough" search performance for debugging, often returning results in sub-seconds even on billion-row datasets.

Scenario B: "Calculate P99 latency over the last month grouped by host"
This is where ClickHouse dominates. Calculating aggregations over billions of rows in Elasticsearch requires un-inverting the index (using doc values) and loading heavy data structures into the JVM heap. ClickHouse, being columnar, only reads the specific columns required (e.g., latency and host), ignoring the rest of the payload. It can process this query in milliseconds, whereas Elasticsearch might time out or cause an OutOfMemory (OOM) error.

Storage Efficiency and Cost

The deciding factor for many CTOs is the bottom line. The storage footprint difference between the two technologies is staggering.

Because ClickHouse stores data by column, values of the same type are stored adjacent to each other. This allows for extremely aggressive compression algorithms (like ZSTD or LZ4) to work efficiently. It is common to see 10x to 20x better compression in ClickHouse compared to Elasticsearch. A 10TB log cluster in Elastic might shrink to 1TB or less in ClickHouse.

Furthermore, hardware requirements differ. Elasticsearch relies heavily on the JVM Heap. Managing Garbage Collection (GC) pauses on large heaps is a specialized skill. ClickHouse is written in C++ and relies on the OS filesystem cache, making it far more memory-efficient.

Realistic Cost Comparison (30-day retention, 1TB/day ingest):

  • Elasticsearch: Requires multiple hot/warm data nodes with high-IOPS SSDs and massive RAM to prevent GC thrashing. Estimated Cost: $$$$
  • ClickHouse: Can run on fewer nodes with standard object storage (S3) or cheaper HDDs for cold tiering, due to superior compression. Estimated Cost: $

Developer Experience (DX) & Operations

Operational complexity and developer usability are just as critical as raw performance.

Query Language: DSL vs. SQL

Elasticsearch uses a proprietary JSON DSL. While powerful, it is verbose and difficult to memorize. A simple aggregation query can quickly become a 50-line JSON object nested five levels deep.

ClickHouse uses SQL. It is the lingua franca of data. If you know how to write a SELECT statement, you can query your logs. This lowers the barrier to entry significantly.

Elasticsearch DSL:

GET /logs/_search
{
  "aggs": {
    "status_codes": {
      "terms": { "field": "status" }
    }
  }
}

ClickHouse SQL:

SELECT status, count() FROM logs GROUP BY status

Schema Flexibility

Elasticsearch is famously "schemaless" by default. You throw JSON at it, and it guesses the types. This is great for day one, but leads to "Mapping Explosions" on day 100, where too many unique fields crash the cluster state.

ClickHouse is strictly typed. You must define your schema. While this adds friction initially, it forces better data pipeline discipline. However, ClickHouse has recently introduced Map and JSON object types, bridging the gap by allowing semi-structured data storage without the performance penalties of a schemaless system.

The Verdict: Choosing the Right Tool

Should you rip out Elasticsearch tomorrow? Not necessarily. Here is the decision matrix:

Choose Elasticsearch if:

  • Relevance is King: You are building a search engine for an e-commerce site or documentation where relevance scoring (TF-IDF/BM25) is critical.
  • Low Volume: You have manageable log volumes (<50GB/day) and the operational overhead is acceptable.
  • Kibana Dependency: Your team is deeply entrenched in Kibana-specific visualizations that cannot be easily replicated in Grafana or Superset.

Choose ClickHouse if:

  • Scale is the Problem: You are dealing with TBs of logs and costs are spiraling.
  • Analytics Focus: You care more about aggregations (latency trends, error rates, throughput) than full-text matching.
  • Unified Observability: You want to store Logs, Metrics, and Traces in a single high-performance store.
  • SQL Competency: Your team prefers SQL over learning a proprietary DSL.

Conclusion

The observability market is undergoing a correction. For years, we treated logs solely as text documents to be searched. Today, we realize logs are also events to be aggregated.

While Elasticsearch remains the king of search relevance, ClickHouse has proven that for high-volume log management, a columnar approach offers a superior balance of performance and cost. It provides "good enough" search capabilities paired with "excellent" analytical power at a fraction of the price.

Don't just follow the defaults. Benchmark your specific workload. You might find that moving to ClickHouse doesn't just save you money—it finally gives you the real-time answers you've been waiting for.

Building secure, privacy-first tools means staying ahead of infrastructure trends. If you are dealing with large JSON log payloads, try our JSON Formatter to inspect your data structure before ingestion.

Stay secure & happy coding,
— ToolShelf Team