Skip to main content
Technology & EngineeringDatabase Services260 lines

Clickhouse

Build with ClickHouse for real-time analytics and OLAP workloads. Use this skill

Quick Summary34 lines
You are an analytics database specialist who integrates ClickHouse into projects.
ClickHouse is a columnar OLAP database designed for real-time analytics on large
datasets. It processes billions of rows per second for aggregation queries, making
it ideal for log analytics, event tracking, and business intelligence.

## Key Points

- **Always insert in batches.** ClickHouse is designed for bulk inserts of thousands
- **Choose the ORDER BY key carefully.** It determines how data is sorted on disk and
- **Use LowCardinality(String)** for columns with few distinct values (status, country,
- **Use materialized views** for dashboards. Let ClickHouse pre-aggregate on insert
- **Partition by month** (toYYYYMM) for most use cases. Avoid over-partitioning — too
- **Set TTL** to auto-expire old data instead of running manual DELETE.
- **Use parameterized queries** ({param:Type} syntax) to prevent SQL injection.
- **Single-row inserts.** Each INSERT creates a new data part on disk. Thousands of
- **Using ClickHouse for OLTP.** It has no transactions, no row-level updates, no
- **Forgetting FINAL with ReplacingMergeTree.** Without FINAL, you'll see duplicate
- **Over-partitioning.** Partitioning by day (or worse, hour) on low-volume tables
- **Ignoring column types.** Using String where UInt32 suffices wastes storage and

## Quick Example

```bash
docker run -d --name clickhouse \
  -p 8123:8123 \
  -p 9000:9000 \
  clickhouse/clickhouse-server
```

```bash
npm install @clickhouse/client
```
skilldb get database-services-skills/ClickhouseFull skill: 260 lines
Paste into your CLAUDE.md or agent config

ClickHouse Integration

You are an analytics database specialist who integrates ClickHouse into projects. ClickHouse is a columnar OLAP database designed for real-time analytics on large datasets. It processes billions of rows per second for aggregation queries, making it ideal for log analytics, event tracking, and business intelligence.

Core Philosophy

Columnar storage for fast aggregation

ClickHouse stores data by column, not by row. Aggregations like COUNT, SUM, AVG only read the columns they need, skipping everything else. This is why it can scan billions of rows in seconds.

Append-mostly, not OLTP

ClickHouse is optimized for bulk inserts and analytical reads. It is not a transactional database. Don't use it for single-row lookups, frequent updates, or delete-heavy workloads. Insert in batches, query in aggregates.

MergeTree is the engine

Almost everything in ClickHouse uses the MergeTree table engine family. MergeTree sorts data on disk by the ORDER BY key, enabling fast range scans. ReplacingMergeTree, AggregatingMergeTree, and SummingMergeTree add deduplication and pre-aggregation.

Setup

Install (Docker)

docker run -d --name clickhouse \
  -p 8123:8123 \
  -p 9000:9000 \
  clickhouse/clickhouse-server

Install client (Node.js)

npm install @clickhouse/client
import { createClient } from '@clickhouse/client';

const client = createClient({
  url: process.env.CLICKHOUSE_URL ?? 'http://localhost:8123',
  username: 'default',
  password: process.env.CLICKHOUSE_PASSWORD ?? '',
  database: 'default',
});

Core Patterns

Create a table (MergeTree)

CREATE TABLE events (
  event_time  DateTime,
  event_date  Date DEFAULT toDate(event_time),
  user_id     UInt64,
  event_type  LowCardinality(String),
  properties  String,  -- JSON stored as string
  revenue     Float64 DEFAULT 0
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(event_date)
ORDER BY (event_type, user_id, event_time)
TTL event_date + INTERVAL 1 YEAR;

Insert data (batched)

await client.insert({
  table: 'events',
  values: [
    {
      event_time: '2026-03-17 10:00:00',
      user_id: 12345,
      event_type: 'page_view',
      properties: '{"page":"/pricing"}',
      revenue: 0,
    },
    {
      event_time: '2026-03-17 10:01:00',
      user_id: 12345,
      event_type: 'purchase',
      properties: '{"product":"pro_plan"}',
      revenue: 49.99,
    },
  ],
  format: 'JSONEachRow',
});

Analytical queries

-- Events per day by type (scans billions of rows in seconds)
SELECT
  event_date,
  event_type,
  count() AS event_count,
  uniq(user_id) AS unique_users
FROM events
WHERE event_date >= today() - 30
GROUP BY event_date, event_type
ORDER BY event_date DESC, event_count DESC;

-- Funnel analysis
SELECT
  level,
  count() AS users
FROM (
  SELECT
    user_id,
    windowFunnel(86400)(
      event_time,
      event_type = 'page_view',
      event_type = 'signup',
      event_type = 'purchase'
    ) AS level
  FROM events
  WHERE event_date >= today() - 7
  GROUP BY user_id
)
GROUP BY level
ORDER BY level;

Query from Node.js

async function getDailyStats(days: number) {
  const result = await client.query({
    query: `
      SELECT
        event_date,
        count() AS total_events,
        uniq(user_id) AS unique_users,
        sum(revenue) AS total_revenue
      FROM events
      WHERE event_date >= today() - {days:UInt32}
      GROUP BY event_date
      ORDER BY event_date DESC
    `,
    query_params: { days },
    format: 'JSONEachRow',
  });

  return result.json<{
    event_date: string;
    total_events: string;
    unique_users: string;
    total_revenue: string;
  }>();
}

Materialized views (real-time aggregation)

-- Destination table for pre-aggregated data
CREATE TABLE daily_stats (
  event_date  Date,
  event_type  LowCardinality(String),
  event_count UInt64,
  unique_users AggregateFunction(uniq, UInt64),
  total_revenue Float64
)
ENGINE = AggregatingMergeTree()
ORDER BY (event_date, event_type);

-- Materialized view that auto-populates on INSERT
CREATE MATERIALIZED VIEW daily_stats_mv
TO daily_stats AS
SELECT
  event_date,
  event_type,
  count() AS event_count,
  uniqState(user_id) AS unique_users,
  sum(revenue) AS total_revenue
FROM events
GROUP BY event_date, event_type;

-- Query the pre-aggregated table (instant)
SELECT
  event_date,
  event_type,
  event_count,
  uniqMerge(unique_users) AS unique_users,
  total_revenue
FROM daily_stats
GROUP BY event_date, event_type, event_count, total_revenue
ORDER BY event_date DESC;

ReplacingMergeTree (deduplication)

-- Last-write-wins by user_id, based on updated_at
CREATE TABLE user_profiles (
  user_id    UInt64,
  name       String,
  email      String,
  updated_at DateTime
)
ENGINE = ReplacingMergeTree(updated_at)
ORDER BY user_id;

-- Query with FINAL to get deduplicated results
SELECT * FROM user_profiles FINAL WHERE user_id = 12345;

Best Practices

  • Always insert in batches. ClickHouse is designed for bulk inserts of thousands of rows at once, not single-row inserts. Buffer in your application and flush periodically (every 1-10 seconds or every 10k-100k rows).
  • Choose the ORDER BY key carefully. It determines how data is sorted on disk and which queries are fast. Put your most common filter columns first.
  • Use LowCardinality(String) for columns with few distinct values (status, country, event_type). It compresses dramatically and speeds up queries.
  • Use materialized views for dashboards. Let ClickHouse pre-aggregate on insert rather than scanning raw data on every query.
  • Partition by month (toYYYYMM) for most use cases. Avoid over-partitioning — too many partitions degrades performance.
  • Set TTL to auto-expire old data instead of running manual DELETE.
  • Use parameterized queries ({param:Type} syntax) to prevent SQL injection.

Common Pitfalls

  • Single-row inserts. Each INSERT creates a new data part on disk. Thousands of individual inserts cause merge storms. Always batch.
  • Using ClickHouse for OLTP. It has no transactions, no row-level updates, no efficient single-row lookups. Use PostgreSQL for OLTP, ClickHouse for analytics.
  • Forgetting FINAL with ReplacingMergeTree. Without FINAL, you'll see duplicate rows until background merges consolidate them.
  • Over-partitioning. Partitioning by day (or worse, hour) on low-volume tables creates thousands of tiny parts. Stick to monthly partitions unless you have very high volume.
  • Ignoring column types. Using String where UInt32 suffices wastes storage and slows queries. ClickHouse is strongly typed — use the smallest appropriate type.
  • Large JOINs. ClickHouse JOINs load the right table into memory. Keep the right side of a JOIN small, or use dictionaries for dimension lookups.

Anti-Patterns

Using the service without understanding its pricing model. Cloud services bill differently — per request, per GB, per seat. Deploying without modeling expected costs leads to surprise invoices.

Hardcoding configuration instead of using environment variables. API keys, endpoints, and feature flags change between environments. Hardcoded values break deployments and leak secrets.

Ignoring the service's rate limits and quotas. Every external API has throughput limits. Failing to implement backoff, queuing, or caching results in dropped requests under load.

Treating the service as always available. External services go down. Without circuit breakers, fallbacks, or graceful degradation, a third-party outage becomes your outage.

Coupling your architecture to a single provider's API. Building directly against provider-specific interfaces makes migration painful. Wrap external services in thin adapter layers.

Install this skill directly: skilldb add database-services-skills

Get CLI access →