Rakuten Group leveraged LangChain and LangSmith to build and deploy multiple AI applications for both their business clients and employees. They developed Rakuten AI for Business, a comprehensive AI platform that includes tools like AI Analyst for market intelligence, AI Agent for customer support, and AI Librarian for documentation management. The team also created an employee-focused chatbot platform using OpenGPTs package, achieving rapid development and deployment while maintaining enterprise-grade security and scalability.
Rakuten is a major Japanese technology conglomerate best known for operating one of the largest online shopping malls in Japan. The company spans over 70 businesses across e-commerce, travel, digital content, fintech, and telecommunications. This case study describes how Rakuten’s AI team built a suite of LLM-powered applications to serve both their business clients (merchants on their platform) and their internal workforce of 32,000 employees.
The initiative centers around “Rakuten AI for Business,” a comprehensive AI platform designed to support business clients with essential operations including market analysis and customer support. The platform aims to enhance productivity across sales, marketing, and IT support functions. The case study was published in February 2024 and describes work that began with LangChain adoption in January 2023.
Rakuten faces the dual challenge of supporting a massive ecosystem of business clients on their e-commerce platform while also empowering their substantial internal workforce. When business clients onboard to the Rakuten marketplace, they receive support from dedicated onboarding consultants and ongoing assistance. The AI team identified opportunities to augment this human support with LLM-powered tools, aiming to improve productivity by 20% according to their stated goals.
On the internal side, with 32,000 employees (“Rakutenians”) across 70+ businesses, knowledge management and employee enablement present significant challenges at scale. The company sought to democratize AI access across the entire employee base rather than limiting it to select teams.
Rakuten developed several distinct AI products using LangChain and LangSmith:
Rakuten AI Analyst serves as a research assistant providing market intelligence to business clients. It delivers business insights backed by relevant data and charts, helping merchants understand market trends and make data-driven decisions.
Rakuten AI Agent functions as a customer support automation tool, enabling self-serve support for clients with questions about listing and transacting on the marketplace. This represents a classic use case for reducing support ticket volume and improving response times.
Rakuten AI Librarian is a document summarization and Q&A system that processes client documentation to answer questions from end customers and prospects in real time. This appears to be a RAG (Retrieval-Augmented Generation) implementation that ingests client knowledge bases.
Internal Employee Chatbot Platform leverages LangChain’s OpenGPTs package to enable teams to build their own chatbots over internal documentation. Notably, the case study claims this was built by just three engineers in one week, demonstrating the rapid development capabilities enabled by the framework.
The LLMOps aspects of this case study are particularly instructive for understanding how large enterprises operationalize LLM applications:
Rakuten was an early LangChain adopter, beginning in January 2023. The framework provided “common, successful interaction patterns for building with LLMs,” and the off-the-shelf chain and agent architectures enabled rapid iteration. This early adoption gave them significant experience with the evolving LLM development landscape.
As Rakuten moved from prototyping to production scale, they adopted LangSmith to “harden their work and provide visibility into what’s happening and why.” This is a common pattern in LLMOps where observability becomes critical as applications move beyond development into production with real users.
General Manager Yusuke Kaji highlighted a key organizational challenge: “At a large company, usually multiple teams develop their ideas independently. Some teams find good approaches, while others don’t.” LangSmith Hub addresses this by enabling distribution of best prompts across teams, promoting collaboration rather than siloed development. This is particularly relevant for an organization with 70+ businesses where redundant effort is a real risk.
The case study emphasizes a scientific approach to LLM development. Kaji states: “By using LangSmith Testing and Eval with our custom evaluation metrics, we can run experiments on multiple approaches (models, cognitive architecture, etc.) and measure the results.” The ability to define custom evaluation metrics is important because generic metrics often fail to capture domain-specific quality requirements.
The experimentation capability allows the team to systematically compare different models, prompt strategies, and cognitive architectures rather than relying on intuition. This is essential for enterprise LLMOps where decisions need to be justifiable and reproducible.
For the OpenGPTs-based employee platform, Rakuten valued “maximal flexibility and control on designing the cognitive architecture and user experience.” At a scale of 32,000 employees, cost/performance tradeoffs become critical business decisions. LangChain’s abstraction layer enables the team to swap models or providers without major refactoring, which is an important consideration for enterprise AI procurement.
LangSmith provides enterprise-grade assurances that Rakuten requires. Specifically mentioned are: data staying within Rakuten’s environment, and separation of access between development and production workflows. This environment isolation is crucial for organizations handling sensitive business client data and maintaining compliance standards.
While this case study presents an impressive scope of LLM deployment, several observations warrant balanced consideration:
The case study originates from LangChain’s own blog, so it naturally emphasizes the benefits of their tools. The 20% productivity improvement goal is mentioned as a target rather than an achieved result, and specific metrics on actual performance improvements are not provided.
The claim that three engineers built the employee chatbot platform in one week is notable but should be understood in context—OpenGPTs provides significant scaffolding, and the one-week timeline likely refers to initial deployment rather than full production hardening with all the evaluation, monitoring, and edge case handling that enterprise systems require.
The case study doesn’t detail specific challenges encountered, failure modes, or lessons learned during development, which would provide more practical insights for other practitioners.
The planned rollout to 32,000 employees represents significant scale for an internal AI platform. The distribution of Rakuten AI for Business across merchants, hotels, retail stores, and local economies suggests ambitions for widespread adoption across their marketplace ecosystem.
The organizational structure—with a dedicated AI Team, Data Science & ML team, and AI for Business division—indicates substantial investment in AI capabilities beyond just technology adoption.
This case study demonstrates a large enterprise’s approach to deploying LLM applications across both B2B customer-facing and internal employee scenarios. The emphasis on prompt management, systematic evaluation, and enterprise security reflects mature LLMOps practices. The multi-product strategy (Analyst, Agent, Librarian) shows how different LLM architectures can address distinct use cases within a unified platform approach. While the specific quantitative results are not detailed, the technical architecture decisions and organizational considerations provide valuable reference points for similar enterprise LLM deployments.
Snorkel developed a specialized benchmark dataset for evaluating AI agents in insurance underwriting, leveraging their expert network of Chartered Property and Casualty Underwriters (CPCUs). The benchmark simulates an AI copilot that assists junior underwriters by reasoning over proprietary knowledge, using multiple tools including databases and underwriting guidelines, and engaging in multi-turn conversations. The evaluation revealed significant performance variations across frontier models (single digits to ~80% accuracy), with notable error modes including tool use failures (36% of conversations) and hallucinations from pretrained domain knowledge, particularly from OpenAI models which hallucinated non-existent insurance products 15-45% of the time.
Digits, a company providing automated accounting services for startups and small businesses, implemented production-scale LLM agents to handle complex workflows including vendor hydration, client onboarding, and natural language queries about financial books. The company evolved from a simple 200-line agent implementation to a sophisticated production system incorporating LLM proxies, memory services, guardrails, observability tooling (Phoenix from Arize), and API-based tool integration using Kotlin and Golang backends. Their agents achieve a 96% acceptance rate on classification tasks with only 3% requiring human review, handling approximately 90% of requests asynchronously and 10% synchronously through a chat interface.
Rippling, an enterprise platform providing HR, payroll, IT, and finance solutions, has evolved its AI strategy from simple content summarization to building complex production agents that assist administrators and employees across their entire platform. Led by Anker, their head of AI, the company has developed agents that handle payroll troubleshooting, sales briefing automation, interview transcript summarization, and talent performance calibration. They've transitioned from deterministic workflow-based approaches to more flexible deep agent paradigms, leveraging LangChain and LangSmith for development and tracing. The company maintains a dual focus: embedding AI capabilities within their product for customers running businesses on their platform, and deploying AI internally to increase productivity across all teams. Early results show promise in handling complex, context-dependent queries that traditional rule-based systems couldn't address.