Agentic Systems · Multi-agent
A designed environment where AI agents with distinct personalities deliberate on news topics. The characters are the method, not the product. The question: what emerges when autonomous entities with different positions interact within a designed system?
Most multi-agent AI research treats agents as computational units with tasks to complete. Character Arena approached the question differently: what if agents have distinct personalities, values, and positions — and the system is designed to let those interact?
The name is deliberately misleading. This is not about characters in the narrative sense. The characters are instruments for studying something else: deliberation, emergence, disagreement, and synthesis in a designed multi-agent environment.
V1 was an exploration of the basic architecture: multiple agents with distinct system prompts, a shared topic, turn-taking, and an observer mechanism. It established that the approach was viable and that distinct personalities produced meaningfully different responses to the same prompts.
V2 rebuilt on a more serious infrastructure. GPT-5.5 replaced the prior model — the combination of reasoning depth and conversational quality that V2 needed. Supabase replaced Google Sheets for conversation memory, enabling proper persistence and query patterns. Two significant new capabilities arrived:
The most significant observation from V2 was not what any individual character said. It was what emerged from their interaction.
Positions arose in synthesis that no single character had stated. The deliberation process — the exchange of arguments, the challenge and response, the moments where one character's position forced another to refine theirs — produced conclusions that couldn't have come from any character operating alone. The system was producing something beyond its components.
This is a finding about multi-agent dynamics, not about AI characters. When you design an environment with structured disagreement and a mechanism for resolution, the environment generates emergent intelligence. The design of the conversation structure matters as much as the design of the individual agents.
V2 was closed in June 2026, not because it failed but because V2 had gone as far as the V2 architecture could take it. The open questions for V3 require architectural changes that amount to a new system: universe-gap enrichment (researching what each character doesn't know), a deliberation pass (structured rounds before synthesis), multi-reaction interjections (characters responding to specific claims rather than turn-taking). These required a rebuild, not an extension.
V2 is archived with its conversation data and workflow JSONs. The conversations are available as a dataset for understanding multi-agent deliberation patterns.
V3 is in early design. The core questions it's built around: what happens when agents are explicitly aware of the gaps in their knowledge (universe-gap enrichment) — does the deliberation become more productive or more uncertain? What does a deliberation pass before synthesis change about the quality of conclusions? Can multi-reaction interjections — where a character responds to a specific sentence rather than a full turn — produce richer dialogue?
These are questions about the design of multi-agent deliberation systems, not about the characters. Character Arena's deeper subject is always the system, not the participants.
V2 stack
V3 backlog
The deeper question
When does the deliberation structure produce emergent conclusions that no individual agent could reach?