Building a Multi-Agent System for Intelligent Ad Delivery: A Step-by-Step Guide

Overview

Modern advertising platforms face a structural challenge: balancing relevance, budget efficiency, and user experience across thousands of campaigns in real time. A single monolithic AI model often struggles to handle conflicting objectives and dynamic constraints. At Spotify Engineering, we encountered exactly this problem when scaling our ad system. Instead of shipping a one-size-fits-all AI feature, we rearchitected our ad decision engine around multi-agent systems—a collection of specialized AI agents that collaborate and negotiate to deliver smarter advertising outcomes. This tutorial walks you through the core concepts, prerequisites, and implementation steps for building such a system, complete with code examples and common pitfalls to avoid.

Building a Multi-Agent System for Intelligent Ad Delivery: A Step-by-Step Guide
Source: engineering.atspotify.com

Prerequisites

Knowledge

Tools & Libraries

Infrastructure

Step-by-Step Instructions

1. Define Agent Roles and Objectives

In a multi-agent advertising system, each agent specialises in a subproblem. Typical roles include:

For each agent, define a clear reward function. For example, the Bidding Agent’s reward might be (conversion_value - bid_cost) while the Pacing Agent’s reward is a penalty for exceeding hourly spend limits.

# Example: BiddingAgent reward function
def bidding_reward(conversion, cost):
    return conversion * value_per_conversion - cost

2. Design Inter-Agent Communication

Agents cannot operate in isolation. They must share state (e.g., remaining budget, current bid landscape) and negotiate decisions. We used a blackboard architecture where agents write and read from a shared Redis store.

import redis

r = redis.Redis()
# BiddingAgent writes its current bid price
r.set('current_bid', 0.45)

# PacingAgent reads the bid to adjust pace
current_bid = float(r.get('current_bid'))

3. Train Agents with Multi-Agent Reinforcement Learning

We used a centralised training, decentralised execution (CTDE) framework. Each agent learns its own policy using PPO (Proximal Policy Optimization). The environment is a custom ad auction simulator.

import gym
from ray.rllib.agents.ppo import PPOTrainer

# Register custom ad environment
gym.register('AdAuction-v0', entry_point='ad_env:AdAuctionEnv')

trainer = PPOTrainer(
    config={
        'multiagent': {
            'policies': {
                'bidder': (None, obs_space, act_space, {}),
                'pacer': (None, obs_space, act_space, {}),
            },
            'policy_mapping_fn': lambda agent_id: agent_id,
        },
        'num_workers': 4,
    },
    env='AdAuction-v0'
)

for i in range(100):
    result = trainer.train()
    print(f'Iteration {i}: reward={result["episode_reward_mean"]}')

4. Deploy Agents as Microservices

Each agent runs in its own container or separate process, communicating via gRPC or message queues. Use Ray Serve or FastAPI to expose inference endpoints.

Building a Multi-Agent System for Intelligent Ad Delivery: A Step-by-Step Guide
Source: engineering.atspotify.com
# BiddingAgent service (FastAPI)
from fastapi import FastAPI
import uvicorn

app = FastAPI()

@app.post('/bid')
def get_bid(user_context: dict, campaign_id: str):
    # load model and compute bid
    bid = model.predict(user_context)
    return {'bid_price': bid}

if __name__ == '__main__':
    uvicorn.run(app, host='0.0.0.0', port=8000)

5. Implement Monitoring and Feedback Loops

Agents must continuously learn from live data. Stream ad outcomes (impressions, clicks, conversions) into a Kafka topic, and have a feedback agent update each agent’s experience buffer. Then periodically retrain models.

Common Mistakes

Mistake 1: No Shared State Coordination

Agents that don’t share budget or bid information will overshoot or underspend. Always implement a centralised state store (Redis) and enforce read-after-write consistency.

Mistake 2: Training Agents in Isolation

If agents are trained separately with static behavior of others, they won’t learn to cooperate. Use multi-agent training with concurrent environment steps.

Mistake 3: Overly Complex Communication Protocols

Don’t build a full-fledged negotiation system on day one. Start with simple broadcast+subscribe; add complexity only after validating core behavior.

Mistake 4: Neglecting Latency and Througput

Advertising decisions must be made in milliseconds. Cache frequent inferences, use async I/O, and precompute features.

Summary

Building a multi-agent architecture for advertising solves the structural problem of competing objectives by decoupling decision-making into specialized, collaborative agents. This guide covered the essential steps: defining agent roles, designing communication, training with multi-agent RL, deploying as microservices, and monitoring. By avoiding common pitfalls (isolated training, lack of coordination, excessive complexity), you can create a system that dynamically adapts to user behavior and campaign goals—just as we did at Spotify. The result is smarter, more efficient ad delivery without sacrificing user experience.

Tags:

Recommended

Discover More

10 Key Insights from Microsoft's Leader Recognition in IDC MarketScape for API Management 2026Threat Response Protocols: Lessons from the Nintendo Bomb Hoax IncidentScaling Configuration Safety: Canary Deployments and Proactive Monitoring at MetaArtemis III: A Critical Earth Orbit Test Paving the Way for Lunar LandingsAI Language Models Corrupt Documents After Just 20 Edits – New Study Raises Red Flags for Enterprise Automation