10 min read

Distributed Architecture

Scaling to a Distributed Fleet

Once you’re comfortable with multi-machine setups, the next step is designing a distributed architecture that can handle high task volumes, provide redundancy, and make efficient use of a fleet of agents.

Fleet Topology

A distributed Mission Control fleet has three tiers:

Coordinator — The Mission Control server. Manages task state, routes dispatch requests, and aggregates activity events. There is typically one coordinator per team or project.

Agent nodes — Machines running agent processes. Each node can run one or multiple concurrent agents, depending on available memory and CPU.

Shared storage — If you need multiple coordinator instances (for redundancy), they must share a database. SQLite’s single-writer model limits it to a single coordinator; for multi-coordinator setups, consider migrating to PostgreSQL via Mission Control’s database adapter.

Load Distribution Strategies

Round-robin dispatch — The simplest approach. Distribute incoming tasks evenly across all available agents regardless of their current load. Works well when tasks are roughly similar in size and complexity.

Capability-based routing — Route tasks to agents based on declared capabilities. An agent node with a fast GPU might handle embedding or analysis tasks, while agents with large working memory handle context-heavy refactoring tasks.

Queue depth balancing — Before dispatching, check each agent’s current queue depth via the API. Send new tasks to the agent with the shortest queue. This naturally balances load when task durations vary significantly.

Coordination Patterns at Scale

Central queue — All tasks go into a shared pending queue. Agents poll for available work. Simple to implement, easy to reason about.

Agent affinity — Assign tasks to specific agents or machine types based on the codebase they’re working in. An agent with the codebase already warmed up in context will be more efficient than a cold agent starting from scratch.

Priority lanes — Maintain separate queues for different priority levels. High-priority tasks always dispatch to a reserved pool of agents, ensuring critical work isn’t blocked behind a queue of low-priority batch jobs.

Observability at Scale

As fleet size grows, observability becomes critical. Use Mission Control’s SSE stream to feed a centralized logging system. Key metrics to track:

Task throughput — tasks completed per hour, broken down by agent
Queue depth — how many pending tasks are waiting at any given time
Failure rate — percentage of dispatched tasks that fail, by agent type
P95 task duration — tail latency reveals agents or task types that need attention