Docs · The LLM Pipeline ▾
The LLM Pipeline
How Tapestria splits responsibility between a deterministic engine and the LLM — the architectural commitments that mean the AI can narrate the world without hallucinating the rules.
The single architectural claim that distinguishes Tapestria from every other "AI RPG" is that the LLM can narrate but cannot adjudicate. This page covers the commitments that back that claim — the layers, the contracts between them, and the audit trail that makes the claim verifiable rather than aspirational.
Two responsibilities, two layers
The most common pattern in AI-driven games is "ask the LLM what happens." Its failure modes are well-known: hallucinated stats, fudged rolls, inconsistent enforcement of rules, forgotten consequences, narrative drift. The pattern fails because it asks a single component to do two fundamentally different jobs — interpret player intent, and adjudicate the mechanical outcome.
Tapestria splits those responsibilities into two layers:
A deterministic core owns mechanics. Dice, modifiers, skill checks, attacks, saves, conditions, effects, spells, leveling — all implemented per the D&D 5e SRD in pure code. No LLM access, no judgment. Given the same inputs it produces the same outputs every time. The randomness is real; the result is the result.
A narrative layer owns voice. An LLM-driven parser turns free text into a structured intent; an LLM-driven narrator turns the engine's outcome into the prose the player reads. Neither component decides anything mechanical. They communicate with the engine and with each other through strict typed contracts.
The narrative layer in more detail
The narrative layer has three roles, each with its own LLM call:
The parser turns player text into a structured intent the engine can act on. It is the only component that handles free-form input. Its output is server-bound — it can only reference entities that exist in the current scene — and strictly typed, so the model can't fabricate fields or actions that aren't in the catalog. The parser also decides whether the action implies a skill check, and which one, against a set of options the engine has surfaced as available for the scene.
The narrator turns the engine's outcome into prose. It can color how something is described but cannot change what happened. The result of the turn is decided before the narrator writes a word.
The impersonator speaks for an NPC when the player addresses one. It runs as part of the narrator stage but with hermetic context: it sees the world from that NPC's perspective, not the scene's omniscient context. The narrator then composes the NPC's response into the surrounding prose without re-authoring it. This is what makes theory-of-mind an architectural property of the system, not a prompt instruction — an NPC's response can only draw on what that NPC has access to.
NPCs in the scene who aren't addressed stay silent for the turn. They witness what happens in their surroundings, which can shape their later behavior, but they don't speak unprompted.
How mutations get committed
The narrator (and the impersonator) can propose changes to the world: an NPC's disposition shifts, an item changes hands, a quest advances, a new event is recorded. They don't write those changes directly. They emit proposals — typed entries from a fixed catalog — and a validation layer commits or rejects each one.
The validation layer checks that the proposed mutation references entities that actually exist and were in scope for the call, that the change is allowed for the call type that proposed it, and that the underlying state hasn't shifted since the call read it. Mutations that fail validation are dropped; the call can be re-prompted with the rejection reasons. The narrator cannot win by simply asserting a change that didn't follow from the scene.
Every committed change lands in an append-only event log alongside the call that proposed it. The log is the audit trail — at any point you can ask "what changed?" or "what proposed this?" and the answers are concrete.
When a quest ends
Most turns are local — a single scene, a single resolution. Some turns terminate a quest, and quest termination has consequences that have to outlive the session.
When a quest completes or fails, a separate aftermath call fires off the hot path. It authors the durable, cross-session record of what the quest did: shifts to NPC dispositions, relocations for the NPCs the quest moved, new world events anchored to the locations where they happened, knowledge updates that propagate through the social fabric. The narrator stays focused on the immediate turn; aftermath owns the world's memory of what just happened.
Aftermath mutations flow through the same validation layer as everything else. The call has its own allow-list of what it can change — it can mark a quest complete; the narrator cannot.
Why the claim is verifiable
"AI can't fudge the rules" is the kind of claim that's easy to assert and hard to back up. The commitments that make it auditable:
- Layer boundary. Mechanics code cannot call LLM code; the constraint is enforced at build time, not by convention.
- Typed contracts. Every LLM call has a strict response schema. Mechanical outcomes are not in any of those schemas — they only enter the system from the deterministic engine.
- Mutation catalog. Every world-state change has a typed entry. New mutation kinds require a code change and a code review. The LLM cannot invent them.
- Audit trail. Every change is in the event log with its origin recorded. The history is concrete and queryable.
The combination is what we'd point at if someone asked "how do you know the AI didn't make it up?"
What this doesn't solve
A few honest caveats:
- Narrative coherence over many sessions. A single turn the narrator can write well; a 50-session campaign arc is a harder problem. Hierarchical summaries and projected context are the current shape of our answer, and they evolve.
- SRD edge cases. The D&D 5e SRD has corners. We've covered what comes up at low-to-mid levels; high-level spell interactions and ruleset edges still surprise us. The engine's test suite is the ground truth.
- Latency. Multiple sequential LLM calls cost more time than one. We cache where we can, use the fastest viable model per call, and parallelize where possible — but a turn is seconds, not milliseconds. Aftermath fires off the hot path and doesn't block the player. The cost is the price of the architectural choice; we don't pretend otherwise.
Read on
- The Architecture overview has the system-level picture.
- Gameplay as Graph Mutation explains the design lens — world as graph, gameplay as mutation, skills and spells as atomic operators — that the pipeline implements.
- The Mechanics overview is the player-facing version of the loop.
A Living World Awaits
Join the Waitlist