Ubisoft: Scaling Game Content Production with LLMs and Data Augmentation

Overview

Ubisoft, the creator of major gaming franchises like Assassin’s Creed, Just Dance, and Watch Dogs, partnered with AI21 Labs to address scaling challenges in their game content production pipeline. This case study, published in March 2023, illustrates how a major gaming company integrated large language models into their creative workflow while maintaining a human-centric approach to content creation. The partnership focuses on augmenting the work of script writers rather than replacing them, with LLMs serving as tools for inspiration and efficiency rather than autonomous content generators.

The core use case centers on the creation of “bark trees” — collections of standalone NPC (non-player character) dialogue lines that require numerous variations to prevent repetitive player experiences. Each variation needs to express a shared motivation (such as hunger or hostility) in distinct ways. This is a classic example of where LLMs can provide significant value: generating diverse variations of content that follow specific patterns and constraints.

Technical Architecture and LLMOps Approach

Model Selection and Integration

Ubisoft’s approach demonstrates several important LLMOps considerations. They chose AI21 Labs based on three key factors: alignment of vision around creator-friendly tools, the readiness of AI21’s vanilla model for their use case, and resolved legal compliance concerns. Notably, Ubisoft was able to use AI21’s model without extensive customization or fine-tuning for their initial data augmentation needs, which suggests careful upfront evaluation of model capabilities against requirements.

The integration was facilitated through AI21’s API, which the case study describes as “intuitive.” This API-first approach is a common pattern in LLMOps, allowing companies to leverage external LLM capabilities without managing the infrastructure complexity of hosting large models themselves.

Data Augmentation Pipeline

The most technically interesting aspect of this case study is the data augmentation workflow. Ubisoft uses AI21’s LLM not as a direct content generator for games, but as a data generation tool for training their own internal models. This represents a sophisticated multi-stage approach:

AI21’s LLM generates candidate dialogue variations based on prompts constructed around character motivations
Writers review these outputs using a binary feedback system (thumbs up/thumbs down) or make edits
The curated outputs become fine-tuning data for Ubisoft’s internal paraphrasing models
These internal models are then embedded directly into writing tools for scripters
Writers continue to provide pairwise comparison feedback, enabling further model improvement through human feedback

This architecture addresses several practical concerns. By training internal models on curated LLM outputs, Ubisoft gains models that are specifically tuned to their content needs, can be run at lower marginal cost, and maintain consistency with their creative standards. The human-in-the-loop validation step ensures quality control while still dramatically accelerating the data creation process.

Human-in-the-Loop Design

The case study emphasizes a “writer-in-the-loop” philosophy rather than fully autonomous generation. This design choice reflects both practical quality requirements and organizational values around supporting human creativity. The workflow is explicitly designed to shift writers from the “writing brain” (generating from nothing) to the “editing brain” (polishing and refining), which is presented as a significant productivity improvement.

The pairwise comparison feedback mechanism mentioned for the embedded models suggests implementation of a reinforcement learning from human feedback (RLHF) or similar preference-based training approach. This continuous feedback loop allows the internal models to improve over time based on writer preferences.

Cost Optimization

An interesting LLMOps challenge highlighted in this case study is managing data augmentation costs when using external LLM APIs. Ubisoft’s use case involved a high input-to-output token ratio (30:1), meaning prompts with extensive context were being used to generate relatively short outputs. Since most LLM providers charge for both input and output tokens, this ratio meant most of their costs came from input tokens.

The challenge was compounded by the need to experiment with multiple prompt variations to fine-tune output quality. Standard pricing would make this experimentation prohibitively expensive. AI21 addressed this by implementing custom pricing that charged only for output tokens, enabling Ubisoft to iterate extensively on prompt engineering without linear cost scaling.

This highlights an important consideration in LLMOps: the economic model of LLM usage needs to align with the specific use case patterns. Data augmentation workloads with high prompt engineering requirements have different cost profiles than conversational use cases with more balanced input/output ratios.

What Was Previously Not Possible

A telling quote from the case study: “Before AI21 we wouldn’t do this manually — we weren’t doing it at all. It was THAT tedious.” This suggests the LLM integration didn’t just make an existing process more efficient; it enabled an entirely new capability. The ability to generate diverse, high-quality training data on demand (“an unlimited fountain of training data of whatever precise format we require”) unlocked workflows that were previously impractical.

This represents a common pattern in successful LLMOps implementations: the technology doesn’t just marginally improve existing processes but enables qualitatively different approaches to problems.

Critical Considerations and Balance

While this case study presents a positive picture of the partnership, it’s worth noting several caveats. This is a vendor case study published by AI21 Labs, so it naturally emphasizes positive outcomes. Some specific metrics that would strengthen the claims are absent, such as quantified productivity improvements, specific quality benchmarks, or details on the actual volume of content generated.

The legal compliance aspects are mentioned but not detailed. Given the ongoing discussions around AI training data, copyright, and creative works, understanding how these issues were resolved would be valuable context for other organizations considering similar implementations.

The case study also focuses heavily on the data augmentation use case but mentions future directions including building “a statistical reference model for video games” that could offer facts to players on command. However, details on these more ambitious applications are sparse.

Future Directions

The partnership appears oriented toward expanding use cases, with mention of building documentation systems for complex game worlds. The emphasis on LLMs as tools in a human-centric workflow rather than chatbot-style direct generation reflects a considered approach to AI integration in creative industries.

The architecture described — using external LLMs for data generation, human curation for quality control, and internal fine-tuned models for production deployment — represents a mature pattern that balances external capability leverage with internal control and customization. This multi-model approach allows organizations to benefit from large foundation models while developing specialized capabilities tailored to their specific needs.

Scaling Game Content Production with LLMs and Data Augmentation

Industry

Technologies

Overview

Technical Architecture and LLMOps Approach

Model Selection and Integration

Data Augmentation Pipeline

Human-in-the-Loop Design

Cost Optimization

What Was Previously Not Possible

Critical Considerations and Balance

Future Directions

More Like This

Agentic AI Copilot for Insurance Underwriting with Multi-Tool Integration

Enterprise-Scale AI-First Translation Platform with Agentic Workflows

Scaling Generative AI in Gaming: From Safety to Creation Tools