Designing a Planet-Scale Auction System

Auction systems look simple on the surface.

They are not. Working on real-time bidding and pricing systems at Bolt — where delivery promotions compete for budget in milliseconds — gave me a deep appreciation for how hard concurrent state updates really are.

They combine:

High write concurrency
Real-time correctness requirements
Financial incentives (which attract fraud)
Global latency sensitivity
Hard consistency boundaries

This is not a CRUD system.
This is a correctness-under-load system.

In this article, I’ll walk through how I would design it — focusing on trade-offs, consistency models, and scaling strategy.

1. Clarifying Requirements

Functional

Create auctions
Place bids
Highest valid bid wins
Auctions have start/end times
Notify users when outbid or auction ends
Admin moderation support

Non-Functional

Low latency (<100ms bid response)
Strong consistency for highest bid
Horizontal scalability
Auditability
High availability
Fraud resistance

2. The First Senior Insight: Identify the True Bottleneck

The hardest part of the system is:

Concurrent bid updates on the same auction.

Everything else is standard microservices work.

So we optimize around:

Atomicity
Serialization of competing bids
Low latency validation

3. High-Level Architecture

flowchart LR
    User([User]) --> AG[API Gateway]
    AG --> BS[Bid Service]
    BS --> Redis[(Redis)]
    BS --> Kafka[[Kafka]]
    Kafka --> PW[Persistence Worker]
    PW --> DB[(Database)]
    BS --> NS[Notification Service]

Why this architecture?

Redis handles real-time, atomic bid updates.
Kafka decouples durability from latency (for a deeper dive into Kafka ingestion patterns and streaming pipelines, see the Ad Click Aggregator post).
DB stores immutable audit history.
Services scale horizontally.

4. The Critical Path: Placing a Bid

This is the heart of the system.

Request

POST /api/auctions/{auctionId}/bids
{
  "amount": 150.00
}

5. TypeScript Implementation (Core Logic)

This is simplified but production-oriented.

interface PlaceBidRequest {
  auctionId: string;
  userId: string;
  amount: number;
}

class BidService {
  constructor(
    private redis: RedisClient,
    private kafka: KafkaProducer
  ) {}

  async placeBid(req: PlaceBidRequest) {
    const script = `
      local status = redis.call("GET", KEYS[1])
      if status ~= "ACTIVE" then
        return "INVALID_AUCTION"
      end

      local current = tonumber(redis.call("GET", KEYS[2]) or "0")
      local newBid = tonumber(ARGV[1])

      if newBid <= current then
        return "BID_TOO_LOW"
      end

      redis.call("SET", KEYS[2], newBid)
      redis.call("SET", KEYS[3], ARGV[2])

      return "OK"
    `;

    const result = await this.redis.eval(script, {
      keys: [
        `auction:${req.auctionId}:status`,
        `auction:${req.auctionId}:highestBid`,
        `auction:${req.auctionId}:highestBidder`
      ],
      arguments: [req.amount.toString(), req.userId]
    });

    if (result !== "OK") {
      throw new Error(result);
    }

    await this.kafka.publish("bid_created", {
      auctionId: req.auctionId,
      userId: req.userId,
      amount: req.amount,
      timestamp: Date.now()
    });

    return { success: true };
  }
}

Why Lua?

Because Redis guarantees:

Lua scripts execute atomically.

That eliminates race conditions without database locking.

6. What Happens if Two Bids Arrive at the Same Millisecond?

Redis serializes execution.

One script runs first. The second runs after.

Only one wins.

This guarantees:

No double highest bid
No inconsistent state
Deterministic behavior

If bids are equal, business logic defines tie-breaking (timestamp or deterministic ID ordering).

7. Sequence Diagram (Full Bid Flow)

sequenceDiagram
    actor User
    participant AG as API Gateway
    participant BS as Bid Service
    participant Redis
    participant Kafka
    participant Worker
    participant DB as Database

    User->>AG: POST /bids
    AG->>BS: placeBid()
    BS->>Redis: atomic Lua validation

    alt Valid Bid
        BS->>Kafka: publish bid_created
        BS-->>AG: 200 OK
    else Invalid Bid
        BS-->>AG: 400 Error
    end

    Worker->>Kafka: consume event
    Worker->>DB: insert bid record

8. Scaling to 1 Million Concurrent Bids

You do not scale this with bigger machines.

You scale it horizontally.

Strategy

1. Shard by auction_id

const shard = hash(auctionId) % totalShards;

Each shard has:

Dedicated Redis instance
Dedicated BidService cluster

This prevents hot auctions from blocking others.

2. Cache-First Architecture

Redis = source of truth for active auctions Database = historical durability layer

You trade immediate durability for throughput.

That’s intentional.

3. Backpressure Strategy

Under extreme load:

Rate limit per user
Reject bids if queue depth exceeds threshold
Apply request TTL (e.g., 2 seconds)
Use circuit breakers if downstream fails

Fail fast > fail catastrophically.

9. Fraud & Shill Bidding Prevention

Shill bidding = fake bids to inflate price.

This is not a backend-only problem. This is a data science + behavioral problem.

Detection Approaches

Same IP/device fingerprint across accounts
Bid clustering on single seller
Abnormal bid escalation patterns
Graph-based account relationship analysis
ML anomaly detection

Every bid must be immutably logged.

Never trust surface-level heuristics alone.

10. Multi-Region Global Scaling

Now it gets interesting.

The Problem:

Users are global
Auctions are time-sensitive
Latency matters

Strategy:

Geo-partition auctions by origin region
Route bids to home region
Replicate bid events asynchronously
Use timestamp-based conflict resolution
Region-local notifications

You do NOT want cross-region locking.

That kills latency.

11. Trade-Off Discussion (Senior-Level Framing)

Decision	Trade-Off
Redis as live state	In-memory risk vs ultra-low latency
Async persistence	Eventual durability vs throughput
Partition by auction	Complexity vs isolation
Region-based ownership	Simpler consistency vs flexibility

Every design decision here is about:

Reducing contention while preserving correctness.

12. Final Thoughts

Auction systems are deceptively complex because the core challenge isn’t throughput — it’s correctness under contention. Two users bidding on the same item at the same millisecond must produce a deterministic, auditable result every single time.

That’s what makes this different from a typical high-write system. Financial incentives mean every bid must be immutably logged, every race condition eliminated, and every edge case (shill bidding, region failover, tie-breaking) explicitly handled. You can’t paper over bugs with eventual consistency when real money is on the line.

The architecture here — Redis Lua for atomic validation, Kafka for durable event streaming, sharding by auction ID — is designed around one principle: serialize the contention point, parallelize everything else.