6 min read
Designing a Planet-Scale Auction System

Auction systems look simple on the surface.

They are not. Working on real-time bidding and pricing systems at Bolt — where delivery promotions compete for budget in milliseconds — gave me a deep appreciation for how hard concurrent state updates really are.

They combine:

  • High write concurrency
  • Real-time correctness requirements
  • Financial incentives (which attract fraud)
  • Global latency sensitivity
  • Hard consistency boundaries

This is not a CRUD system.
This is a correctness-under-load system.

In this article, I’ll walk through how I would design it — focusing on trade-offs, consistency models, and scaling strategy.


1. Clarifying Requirements

Functional

  • Create auctions
  • Place bids
  • Highest valid bid wins
  • Auctions have start/end times
  • Notify users when outbid or auction ends
  • Admin moderation support

Non-Functional

  • Low latency (<100ms bid response)
  • Strong consistency for highest bid
  • Horizontal scalability
  • Auditability
  • High availability
  • Fraud resistance

2. The First Senior Insight: Identify the True Bottleneck

The hardest part of the system is:

Concurrent bid updates on the same auction.

Everything else is standard microservices work.

So we optimize around:

  • Atomicity
  • Serialization of competing bids
  • Low latency validation

3. High-Level Architecture

flowchart LR
    User([User]) --> AG[API Gateway]
    AG --> BS[Bid Service]
    BS --> Redis[(Redis)]
    BS --> Kafka[[Kafka]]
    Kafka --> PW[Persistence Worker]
    PW --> DB[(Database)]
    BS --> NS[Notification Service]

Why this architecture?

  • Redis handles real-time, atomic bid updates.
  • Kafka decouples durability from latency (for a deeper dive into Kafka ingestion patterns and streaming pipelines, see the Ad Click Aggregator post).
  • DB stores immutable audit history.
  • Services scale horizontally.

4. The Critical Path: Placing a Bid

This is the heart of the system.

Request

POST /api/auctions/{auctionId}/bids
{
  "amount": 150.00
}

5. TypeScript Implementation (Core Logic)

This is simplified but production-oriented.

interface PlaceBidRequest {
  auctionId: string;
  userId: string;
  amount: number;
}

class BidService {
  constructor(
    private redis: RedisClient,
    private kafka: KafkaProducer
  ) {}

  async placeBid(req: PlaceBidRequest) {
    const script = `
      local status = redis.call("GET", KEYS[1])
      if status ~= "ACTIVE" then
        return "INVALID_AUCTION"
      end

      local current = tonumber(redis.call("GET", KEYS[2]) or "0")
      local newBid = tonumber(ARGV[1])

      if newBid <= current then
        return "BID_TOO_LOW"
      end

      redis.call("SET", KEYS[2], newBid)
      redis.call("SET", KEYS[3], ARGV[2])

      return "OK"
    `;

    const result = await this.redis.eval(script, {
      keys: [
        `auction:${req.auctionId}:status`,
        `auction:${req.auctionId}:highestBid`,
        `auction:${req.auctionId}:highestBidder`
      ],
      arguments: [req.amount.toString(), req.userId]
    });

    if (result !== "OK") {
      throw new Error(result);
    }

    await this.kafka.publish("bid_created", {
      auctionId: req.auctionId,
      userId: req.userId,
      amount: req.amount,
      timestamp: Date.now()
    });

    return { success: true };
  }
}

Why Lua?

Because Redis guarantees:

Lua scripts execute atomically.

That eliminates race conditions without database locking.


6. What Happens if Two Bids Arrive at the Same Millisecond?

Redis serializes execution.

One script runs first. The second runs after.

Only one wins.

This guarantees:

  • No double highest bid
  • No inconsistent state
  • Deterministic behavior

If bids are equal, business logic defines tie-breaking (timestamp or deterministic ID ordering).


7. Sequence Diagram (Full Bid Flow)

sequenceDiagram
    actor User
    participant AG as API Gateway
    participant BS as Bid Service
    participant Redis
    participant Kafka
    participant Worker
    participant DB as Database

    User->>AG: POST /bids
    AG->>BS: placeBid()
    BS->>Redis: atomic Lua validation

    alt Valid Bid
        BS->>Kafka: publish bid_created
        BS-->>AG: 200 OK
    else Invalid Bid
        BS-->>AG: 400 Error
    end

    Worker->>Kafka: consume event
    Worker->>DB: insert bid record

8. Scaling to 1 Million Concurrent Bids

You do not scale this with bigger machines.

You scale it horizontally.

Strategy

1. Shard by auction_id

const shard = hash(auctionId) % totalShards;

Each shard has:

  • Dedicated Redis instance
  • Dedicated BidService cluster

This prevents hot auctions from blocking others.


2. Cache-First Architecture

Redis = source of truth for active auctions Database = historical durability layer

You trade immediate durability for throughput.

That’s intentional.


3. Backpressure Strategy

Under extreme load:

  • Rate limit per user
  • Reject bids if queue depth exceeds threshold
  • Apply request TTL (e.g., 2 seconds)
  • Use circuit breakers if downstream fails

Fail fast > fail catastrophically.


9. Fraud & Shill Bidding Prevention

Shill bidding = fake bids to inflate price.

This is not a backend-only problem. This is a data science + behavioral problem.

Detection Approaches

  • Same IP/device fingerprint across accounts
  • Bid clustering on single seller
  • Abnormal bid escalation patterns
  • Graph-based account relationship analysis
  • ML anomaly detection

Every bid must be immutably logged.

Never trust surface-level heuristics alone.


10. Multi-Region Global Scaling

Now it gets interesting.

The Problem:

  • Users are global
  • Auctions are time-sensitive
  • Latency matters

Strategy:

  • Geo-partition auctions by origin region
  • Route bids to home region
  • Replicate bid events asynchronously
  • Use timestamp-based conflict resolution
  • Region-local notifications

You do NOT want cross-region locking.

That kills latency.


11. Trade-Off Discussion (Senior-Level Framing)

DecisionTrade-Off
Redis as live stateIn-memory risk vs ultra-low latency
Async persistenceEventual durability vs throughput
Partition by auctionComplexity vs isolation
Region-based ownershipSimpler consistency vs flexibility

Every design decision here is about:

Reducing contention while preserving correctness.


12. Final Thoughts

Auction systems are deceptively complex because the core challenge isn’t throughput — it’s correctness under contention. Two users bidding on the same item at the same millisecond must produce a deterministic, auditable result every single time.

That’s what makes this different from a typical high-write system. Financial incentives mean every bid must be immutably logged, every race condition eliminated, and every edge case (shill bidding, region failover, tie-breaking) explicitly handled. You can’t paper over bugs with eventual consistency when real money is on the line.

The architecture here — Redis Lua for atomic validation, Kafka for durable event streaming, sharding by auction ID — is designed around one principle: serialize the contention point, parallelize everything else.