Designing a Top-K System with Sub-Minute Latency

When companies like YouTube, Twitter, or Instagram show Trending Now, what they’re really computing is a Top-K problem under strict latency constraints. Any product that surfaces “popular right now” — whether it’s trending restaurants on a food app or top-performing ads — faces the same fundamental challenge.

But what changes when the requirement becomes:

“Top 10 videos in the last 1 minute — updated every few seconds.”

Let’s walk through the full architecture evolution.

The Core Problem

We need to:

Continuously ingest millions of events (views, likes, hashtag mentions)
Aggregate counts within time windows (hour/day/week/minute)
Maintain a Top-K ranking
Serve results with <1 minute end-to-end latency
Support restarts and failure recovery

Step-by-Step Architecture

1. Event Generation

Events originate from:

Frontend clients (web/mobile)
Backend APIs
Log pipelines
CDC streams

Example event:

{
  "event": "video_view",
  "video_id": "abc123",
  "region": "IN",
  "timestamp": "2026-03-10T10:15:23Z"
}

These events are pushed into a streaming system.

2. Ingestion Layer — Kafka

Use Kafka as the distributed commit log.

For sub-minute latency, tune Kafka:

linger.ms = 0-1
acks = 1 (if acceptable)
LZ4 compression
Enough partitions for parallelism

Retention strategy:

24 hours is typically enough
Kafka is not your permanent storage
For long-term reprocessing → offload to S3

Kafka acts as:

Buffer
Backpressure absorber
Replay source (within retention window)

3. Stream Processing — Flink

This is where the real work happens.

We use Sliding Windows for near-real-time (if you’re unfamiliar with Flink watermarks and event-time windowing, see my Ad Click Aggregator post for a deeper explanation):

.window(SlidingEventTimeWindows.of(Time.minutes(1), Time.seconds(10)))

Meaning:

1-minute window
Updated every 10 seconds

This ensures the Top-K updates frequently.

Incremental Aggregation (Critical)

Do NOT recompute the full window every time. Instead:

Maintain counts in state (RocksDB or memory)
Increment per event
Emit partial updates

This keeps latency low.

4. Computing Top-K Efficiently

Two common approaches:

Option A — Redis Sorted Sets (Most Practical)

Each window gets a key:

topk:videos:2026-03-10T10:15

Update:

ZINCRBY topk:videos:2026-03-10T10:15 1 abc123

Query:

ZREVRANGE topk:videos:2026-03-10T10:15 0 9 WITHSCORES

Redis gives:

O(log N) updates
O(K) reads
Millisecond latency

For a deeper look at how Redis Lua scripts can enforce atomicity in high-concurrency scenarios, see the Planet-Scale Auction System post.

Option B — Flink Internal Top-K

Maintain:

Local top-K per partition
Global merge stage

This reduces Redis writes under heavy traffic.

5. Serving Layer

API:

GET /top-k/videos?window=1m&region=IN

The API:

Reads from Redis
Returns instantly (<10ms)

For extreme performance:

Add an in-memory cache layer
Or use CDN for globally popular queries

Full Sub-Minute Pipeline

graph TD
    A[User Action] --> B[Kafka - < 10ms ingest]
    B --> C[Flink Sliding Window - < 100ms processing]
    C --> D[Redis ZSET - < 5ms update]
    D --> E[API - < 10ms read]
    E --> F[Client]

Total latency target: < 1 second — well below the 1 minute requirement.

What Changes Compared to Normal Top-K?

Component	Standard Design	Sub-Minute Design
Windows	Hourly/Daily	Sliding 1-min
Kafka	Default batching	Minimal batching
Flink	Tumbling windows	Sliding + incremental
State	Larger window state	Small fast state
Store	Redis or OLAP	Redis (fast path)
Updates	Every hour	Every few seconds

Failure Handling

To maintain reliability:

Enable Flink checkpointing (5–10 seconds)
Use RocksDB state backend
Store checkpoints in S3
Kafka retention ≥ worst-case downtime

If Flink crashes:

It restores from checkpoint
Resumes from Kafka offset

Scaling Strategy

Scale by:

Increasing Kafka partitions
Parallel Flink operators
Sharding Redis by:
- Region
- Category
- Time bucket

For global systems:

Maintain per-region Top-K
Compute global Top-K via aggregation layer

When to Use Approximate Algorithms?

At extreme scale (billions of events/minute):

Use Count-Min Sketch
Use Heavy Hitters algorithms
Accept slight accuracy loss
Dramatically reduce memory footprint

Key Engineering Insight

Kafka is not your storage. Flink is not your database. Redis is not your source of truth.

Each layer has a clear responsibility:

Kafka → streaming buffer
Flink → stateful computation
Redis → fast serving layer
S3/OLAP → historical reprocessing

Final Thought

Top-K seems simple — “just count and sort” — until you add the constraint of sub-minute freshness at scale. The gap between a batch Top-K (run a query every hour) and a real-time one (update every 10 seconds across millions of events) is enormous.

What closes that gap is incremental computation: sliding windows that don’t recompute from scratch, Redis sorted sets that update in O(log N), and Flink state that survives failures. And when even that isn’t enough — billions of events per minute — approximate algorithms like Count-Min Sketch become the frontier, trading a small accuracy margin for an order-of-magnitude reduction in memory and compute.

The real engineering lesson: simple-sounding features (“show what’s trending”) often hide the hardest distributed systems problems.