2 min read
Designing a Top-K System with Sub-Minute Latency

When companies like YouTube, Twitter, or Instagram show Trending Now, what they’re really computing is a Top-K problem under strict latency constraints. Any product that surfaces “popular right now” — whether it’s trending restaurants on a food app or top-performing ads — faces the same fundamental challenge.

But what changes when the requirement becomes:

“Top 10 videos in the last 1 minute — updated every few seconds.”

Let’s walk through the full architecture evolution.


The Core Problem

We need to:

  • Continuously ingest millions of events (views, likes, hashtag mentions)
  • Aggregate counts within time windows (hour/day/week/minute)
  • Maintain a Top-K ranking
  • Serve results with <1 minute end-to-end latency
  • Support restarts and failure recovery

Step-by-Step Architecture

1. Event Generation

Events originate from:

  • Frontend clients (web/mobile)
  • Backend APIs
  • Log pipelines
  • CDC streams

Example event:

{
  "event": "video_view",
  "video_id": "abc123",
  "region": "IN",
  "timestamp": "2026-03-10T10:15:23Z"
}

These events are pushed into a streaming system.


2. Ingestion Layer — Kafka

Use Kafka as the distributed commit log.

For sub-minute latency, tune Kafka:

  • linger.ms = 0-1
  • acks = 1 (if acceptable)
  • LZ4 compression
  • Enough partitions for parallelism

Retention strategy:

  • 24 hours is typically enough
  • Kafka is not your permanent storage
  • For long-term reprocessing → offload to S3

Kafka acts as:

  • Buffer
  • Backpressure absorber
  • Replay source (within retention window)

This is where the real work happens.

We use Sliding Windows for near-real-time (if you’re unfamiliar with Flink watermarks and event-time windowing, see my Ad Click Aggregator post for a deeper explanation):

.window(SlidingEventTimeWindows.of(Time.minutes(1), Time.seconds(10)))

Meaning:

  • 1-minute window
  • Updated every 10 seconds

This ensures the Top-K updates frequently.

Incremental Aggregation (Critical)

Do NOT recompute the full window every time. Instead:

  • Maintain counts in state (RocksDB or memory)
  • Increment per event
  • Emit partial updates

This keeps latency low.


4. Computing Top-K Efficiently

Two common approaches:

Option A — Redis Sorted Sets (Most Practical)

Each window gets a key:

topk:videos:2026-03-10T10:15

Update:

ZINCRBY topk:videos:2026-03-10T10:15 1 abc123

Query:

ZREVRANGE topk:videos:2026-03-10T10:15 0 9 WITHSCORES

Redis gives:

  • O(log N) updates
  • O(K) reads
  • Millisecond latency

For a deeper look at how Redis Lua scripts can enforce atomicity in high-concurrency scenarios, see the Planet-Scale Auction System post.

Maintain:

  • Local top-K per partition
  • Global merge stage

This reduces Redis writes under heavy traffic.


5. Serving Layer

API:

GET /top-k/videos?window=1m&region=IN

The API:

  • Reads from Redis
  • Returns instantly (<10ms)

For extreme performance:

  • Add an in-memory cache layer
  • Or use CDN for globally popular queries

Full Sub-Minute Pipeline

graph TD
    A[User Action] --> B[Kafka - < 10ms ingest]
    B --> C[Flink Sliding Window - < 100ms processing]
    C --> D[Redis ZSET - < 5ms update]
    D --> E[API - < 10ms read]
    E --> F[Client]

Total latency target: < 1 second — well below the 1 minute requirement.


What Changes Compared to Normal Top-K?

ComponentStandard DesignSub-Minute Design
WindowsHourly/DailySliding 1-min
KafkaDefault batchingMinimal batching
FlinkTumbling windowsSliding + incremental
StateLarger window stateSmall fast state
StoreRedis or OLAPRedis (fast path)
UpdatesEvery hourEvery few seconds

Failure Handling

To maintain reliability:

  • Enable Flink checkpointing (5–10 seconds)
  • Use RocksDB state backend
  • Store checkpoints in S3
  • Kafka retention ≥ worst-case downtime

If Flink crashes:

  1. It restores from checkpoint
  2. Resumes from Kafka offset

Scaling Strategy

Scale by:

  • Increasing Kafka partitions
  • Parallel Flink operators
  • Sharding Redis by:
    • Region
    • Category
    • Time bucket

For global systems:

  • Maintain per-region Top-K
  • Compute global Top-K via aggregation layer

When to Use Approximate Algorithms?

At extreme scale (billions of events/minute):

  • Use Count-Min Sketch
  • Use Heavy Hitters algorithms
  • Accept slight accuracy loss
  • Dramatically reduce memory footprint

Key Engineering Insight

Kafka is not your storage. Flink is not your database. Redis is not your source of truth.

Each layer has a clear responsibility:

  • Kafka → streaming buffer
  • Flink → stateful computation
  • Redis → fast serving layer
  • S3/OLAP → historical reprocessing

Final Thought

Top-K seems simple — “just count and sort” — until you add the constraint of sub-minute freshness at scale. The gap between a batch Top-K (run a query every hour) and a real-time one (update every 10 seconds across millions of events) is enormous.

What closes that gap is incremental computation: sliding windows that don’t recompute from scratch, Redis sorted sets that update in O(log N), and Flink state that survives failures. And when even that isn’t enough — billions of events per minute — approximate algorithms like Count-Min Sketch become the frontier, trading a small accuracy margin for an order-of-magnitude reduction in memory and compute.

The real engineering lesson: simple-sounding features (“show what’s trending”) often hide the hardest distributed systems problems.