Ubisoft leveraged AI21 Labs' LLM capabilities to automate tedious scriptwriting tasks and generate training data for their internal models. By implementing a writer-in-the-loop workflow for NPC dialogue generation and using AI21's models for data augmentation, they successfully scaled their content production while maintaining creative control. The solution included optimized token pricing for extensive prompt experimentation and resulted in significant efficiency gains in their game development process.
Ubisoft, the creator of major gaming franchises like Assassin’s Creed, Just Dance, and Watch Dogs, partnered with AI21 Labs to address scaling challenges in their game content production pipeline. This case study, published in March 2023, illustrates how a major gaming company integrated large language models into their creative workflow while maintaining a human-centric approach to content creation. The partnership focuses on augmenting the work of script writers rather than replacing them, with LLMs serving as tools for inspiration and efficiency rather than autonomous content generators.
The core use case centers on the creation of “bark trees” — collections of standalone NPC (non-player character) dialogue lines that require numerous variations to prevent repetitive player experiences. Each variation needs to express a shared motivation (such as hunger or hostility) in distinct ways. This is a classic example of where LLMs can provide significant value: generating diverse variations of content that follow specific patterns and constraints.
Ubisoft’s approach demonstrates several important LLMOps considerations. They chose AI21 Labs based on three key factors: alignment of vision around creator-friendly tools, the readiness of AI21’s vanilla model for their use case, and resolved legal compliance concerns. Notably, Ubisoft was able to use AI21’s model without extensive customization or fine-tuning for their initial data augmentation needs, which suggests careful upfront evaluation of model capabilities against requirements.
The integration was facilitated through AI21’s API, which the case study describes as “intuitive.” This API-first approach is a common pattern in LLMOps, allowing companies to leverage external LLM capabilities without managing the infrastructure complexity of hosting large models themselves.
The most technically interesting aspect of this case study is the data augmentation workflow. Ubisoft uses AI21’s LLM not as a direct content generator for games, but as a data generation tool for training their own internal models. This represents a sophisticated multi-stage approach:
This architecture addresses several practical concerns. By training internal models on curated LLM outputs, Ubisoft gains models that are specifically tuned to their content needs, can be run at lower marginal cost, and maintain consistency with their creative standards. The human-in-the-loop validation step ensures quality control while still dramatically accelerating the data creation process.
The case study emphasizes a “writer-in-the-loop” philosophy rather than fully autonomous generation. This design choice reflects both practical quality requirements and organizational values around supporting human creativity. The workflow is explicitly designed to shift writers from the “writing brain” (generating from nothing) to the “editing brain” (polishing and refining), which is presented as a significant productivity improvement.
The pairwise comparison feedback mechanism mentioned for the embedded models suggests implementation of a reinforcement learning from human feedback (RLHF) or similar preference-based training approach. This continuous feedback loop allows the internal models to improve over time based on writer preferences.
An interesting LLMOps challenge highlighted in this case study is managing data augmentation costs when using external LLM APIs. Ubisoft’s use case involved a high input-to-output token ratio (30:1), meaning prompts with extensive context were being used to generate relatively short outputs. Since most LLM providers charge for both input and output tokens, this ratio meant most of their costs came from input tokens.
The challenge was compounded by the need to experiment with multiple prompt variations to fine-tune output quality. Standard pricing would make this experimentation prohibitively expensive. AI21 addressed this by implementing custom pricing that charged only for output tokens, enabling Ubisoft to iterate extensively on prompt engineering without linear cost scaling.
This highlights an important consideration in LLMOps: the economic model of LLM usage needs to align with the specific use case patterns. Data augmentation workloads with high prompt engineering requirements have different cost profiles than conversational use cases with more balanced input/output ratios.
A telling quote from the case study: “Before AI21 we wouldn’t do this manually — we weren’t doing it at all. It was THAT tedious.” This suggests the LLM integration didn’t just make an existing process more efficient; it enabled an entirely new capability. The ability to generate diverse, high-quality training data on demand (“an unlimited fountain of training data of whatever precise format we require”) unlocked workflows that were previously impractical.
This represents a common pattern in successful LLMOps implementations: the technology doesn’t just marginally improve existing processes but enables qualitatively different approaches to problems.
While this case study presents a positive picture of the partnership, it’s worth noting several caveats. This is a vendor case study published by AI21 Labs, so it naturally emphasizes positive outcomes. Some specific metrics that would strengthen the claims are absent, such as quantified productivity improvements, specific quality benchmarks, or details on the actual volume of content generated.
The legal compliance aspects are mentioned but not detailed. Given the ongoing discussions around AI training data, copyright, and creative works, understanding how these issues were resolved would be valuable context for other organizations considering similar implementations.
The case study also focuses heavily on the data augmentation use case but mentions future directions including building “a statistical reference model for video games” that could offer facts to players on command. However, details on these more ambitious applications are sparse.
The partnership appears oriented toward expanding use cases, with mention of building documentation systems for complex game worlds. The emphasis on LLMs as tools in a human-centric workflow rather than chatbot-style direct generation reflects a considered approach to AI integration in creative industries.
The architecture described — using external LLMs for data generation, human curation for quality control, and internal fine-tuned models for production deployment — represents a mature pattern that balances external capability leverage with internal control and customization. This multi-model approach allows organizations to benefit from large foundation models while developing specialized capabilities tailored to their specific needs.
Snorkel developed a specialized benchmark dataset for evaluating AI agents in insurance underwriting, leveraging their expert network of Chartered Property and Casualty Underwriters (CPCUs). The benchmark simulates an AI copilot that assists junior underwriters by reasoning over proprietary knowledge, using multiple tools including databases and underwriting guidelines, and engaging in multi-turn conversations. The evaluation revealed significant performance variations across frontier models (single digits to ~80% accuracy), with notable error modes including tool use failures (36% of conversations) and hallucinations from pretrained domain knowledge, particularly from OpenAI models which hallucinated non-existent insurance products 15-45% of the time.
Smartling operates an enterprise-scale AI-first agentic translation delivery platform serving major corporations like Disney and IBM. The company addresses challenges around automation, centralization, compliance, brand consistency, and handling diverse content types across global markets. Their solution employs multi-step agentic workflows where different model functions validate each other's outputs, combining neural machine translation with large language models, RAG for accessing validated linguistic assets, sophisticated prompting, and automated post-editing for hyper-localization. The platform demonstrates measurable improvements in throughput (from 2,000 to 6,000-7,000 words per day), cost reduction (4-10x cheaper than human translation), and quality approaching 70% human parity for certain language pairs and content types, while maintaining enterprise requirements for repeatability, compliance, and brand voice consistency.
Roblox has implemented a comprehensive suite of generative AI features across their gaming platform, addressing challenges in content moderation, code assistance, and creative tools. Starting with safety features using transformer models for text and voice moderation, they expanded to developer tools including AI code assistance, material generation, and specialized texture creation. The company releases new AI features weekly, emphasizing rapid iteration and public testing, while maintaining a balance between automation and creator control. Their approach combines proprietary solutions with open-source contributions, demonstrating successful large-scale deployment of AI in a production gaming environment serving 70 million daily active users.