Schneider Electric partnered with AWS Machine Learning Solutions Lab to automate their CRM account linking process using Retrieval Augmented Generation (RAG) with Flan-T5-XXL model. The solution combines LangChain, Google Search API, and SEC-10K data to identify and maintain up-to-date parent-subsidiary relationships between customer accounts, improving accuracy from 55% to 71% through domain-specific prompt engineering.
Schneider Electric, a global leader in digital transformation of energy management and industrial automation, faced a significant operational challenge in maintaining accurate customer relationship data across their CRM systems. As their customer base grew, new customer accounts needed to be manually linked to their proper parent entities—a process that required domain-specific knowledge and access to the most current information about corporate acquisitions, market news, and organizational restructuring. The manual nature of this process was time-consuming and struggled to keep pace with the dynamic nature of corporate relationships.
In early 2023, Schneider Electric partnered with the AWS Machine Learning Solutions Lab (MLSL) to develop an AI-powered solution that would automate significant portions of this account linking workflow. The resulting system demonstrates a practical implementation of LLMs in a production environment, addressing both the capabilities and limitations of large language models through thoughtful architectural decisions.
A fundamental limitation of LLMs is their knowledge cutoff date—the model only knows information up to the point at which it was trained. For Schneider Electric’s use case, this was a critical problem because account linking decisions often depend on recent corporate events like acquisitions. The case study provides a concrete example: the acquisition of One Medical by Amazon occurred in February 2023, which would not be captured by many LLMs trained before that date.
This limitation necessitated an architecture that could supplement the LLM’s inherent knowledge with real-time external information, leading to the adoption of a Retrieval Augmented Generation (RAG) approach.
The team selected the Flan-T5-XXL model from the Flan-T5 family, an 11-billion parameter instruction-tuned model. This choice was deliberate and reflects thoughtful consideration of the task requirements. The case study notes that for their downstream task, there was no need to accommodate vast amounts of world knowledge—rather, the model needed to perform well on question answering given a context of texts provided through search results. The instruction-tuned nature of Flan-T5 made it capable of performing various zero-shot NLP tasks without fine-tuning.
The model was deployed using Amazon SageMaker JumpStart, which provides convenient deployment options through both Amazon SageMaker Studio and the SageMaker SDK. JumpStart offers the entire Flan-T5 family (Small, Base, Large, XL, and XXL) and provides multiple versions of Flan-T5 XXL at different levels of quantization, offering flexibility in balancing model performance against computational requirements.
The deployment process is relatively straightforward, with the model being spun up as a SageMaker endpoint:
llm = SagemakerEndpoint(...)
LangChain was selected as the orchestration framework, described as a “popular and fast growing framework” for developing LLM-powered applications. The framework’s concept of chains—combinations of different components designed to improve LLM functionality for specific tasks—proved well-suited to the use case.
The RAG implementation consists of two core steps:
Retrieval: The system uses Google Serper API (via LangChain’s GoogleSerperAPIWrapper) to perform web searches. Given a company name, the system constructs a query like “{company} parent company” and retrieves relevant text chunks from external sources.
Augmentation: The retrieved information is combined with a prompt template and the original question, then passed to the LLM for processing. This approach ensures the model has access to the most current publicly available information about corporate relationships.
The LangChain implementation chains these components together using a custom prompt template:
my_template = """
Answer the following question using the information. \n
Question : {question}? \n
Information : {search_result} \n
Answer: """
One of the more interesting LLMOps insights from this case study is the significant impact of domain-specific prompt engineering. The team discovered that a blanket prompt asking for “the parent company” performed well for most business sectors but failed to generalize to education and healthcare, where the concept of a parent company may not be meaningful.
To address this, they implemented a two-step process:
Step 1 - Domain Classification: A RAG query first determines what domain a given account belongs to using a multiple-choice question: “What is the domain of {account}?” with options including healthcare, education, oil and gas, banking, pharma, and other domains.
Step 2 - Domain-Specific Query: Based on the identified domain, the system selects an appropriate prompt template. While the case study doesn’t specify the exact alternative prompts for education and healthcare, it notes that different terminology is used to query relationships in these sectors.
The impact of this prompt engineering work was substantial: overall accuracy improved from 55% to 71%, representing a 16 percentage point improvement. The case study emphasizes that “the effort and time invested to develop effective prompts appear to significantly improve the quality of LLM response”—a valuable lesson for production LLM deployments.
Beyond web search, the solution also incorporates SEC 10K filings as an additional data source. These annual filings from publicly traded companies contain reliable information about subsidiaries and corporate structures, available through SEC EDGAR or the CorpWatch API.
For working with this tabular data, the team used LangChain’s create_pandas_dataframe_agent abstraction. This approach offers two key advantages:
The agent translates natural language queries into pandas operations, as demonstrated in the case study:
query = "Who is the parent of WHOLE FOODS MARKET?"
agent.run(query)
# Agent translates to: df[df['subsidiary'] == 'WHOLE FOODS MARKET']
# Returns: AMAZON
The case study provides concrete accuracy metrics, which is valuable for understanding the real-world performance of the system. The baseline accuracy of 55% with generic prompts improved to 71% with domain-specific prompts. While 71% is not perfect, it represents a significant reduction in manual effort—the system can confidently handle a large portion of account linking decisions, with human review reserved for uncertain cases or edge situations.
The architecture is designed for scalability, leveraging AWS services:
The solution is positioned to enable Schneider Electric to “maintain up-to-date and accurate organizational structures of their customers, and unlock the ability to do analytics on top of this data.”
The case study notes that Schneider Electric’s team will be able to extend and design their own prompts, mimicking the way they classify public sector accounts. This extensibility is important for production systems, as business requirements evolve and domain experts identify new patterns or edge cases.
While the case study presents a well-architected solution, a few considerations merit attention:
Accuracy Ceiling: A 71% accuracy rate means nearly 30% of decisions still require human intervention or correction. For critical business processes, organizations should plan for appropriate human-in-the-loop workflows.
External API Dependencies: The reliance on Google Search API introduces external dependencies that could affect availability, cost, and consistency of results over time.
SEC Data Limitations: SEC 10K filings only cover publicly traded US companies, limiting the utility of this data source for private companies or international entities.
Prompt Maintenance: Domain-specific prompts may require ongoing maintenance as business terminology evolves or new sectors are added.
Despite these considerations, the case study demonstrates a practical, production-grade implementation of RAG for a real business problem, with measurable improvements in efficiency and accuracy. The combination of web search, structured data sources, and domain-specific prompt engineering represents a thoughtful approach to deploying LLMs in production environments where up-to-date, accurate information is essential.
Snorkel developed a specialized benchmark dataset for evaluating AI agents in insurance underwriting, leveraging their expert network of Chartered Property and Casualty Underwriters (CPCUs). The benchmark simulates an AI copilot that assists junior underwriters by reasoning over proprietary knowledge, using multiple tools including databases and underwriting guidelines, and engaging in multi-turn conversations. The evaluation revealed significant performance variations across frontier models (single digits to ~80% accuracy), with notable error modes including tool use failures (36% of conversations) and hallucinations from pretrained domain knowledge, particularly from OpenAI models which hallucinated non-existent insurance products 15-45% of the time.
Fastweb / Vodafone, a major European telecommunications provider serving 9.5 million customers in Italy, transformed their customer service operations by building two AI agent systems to address the limitations of traditional customer support. They developed Super TOBi, a customer-facing agentic chatbot system, and Super Agent, an internal tool that empowers call center consultants with real-time diagnostics and guidance. Built on LangGraph and LangChain with Neo4j knowledge graphs and monitored through LangSmith, the solution achieved a 90% correctness rate, 82% resolution rate, 5.2/7 Customer Effort Score for Super TOBi, and over 86% One-Call Resolution rate for Super Agent, delivering faster response times and higher customer satisfaction while reducing agent workload.
Swisscom, Switzerland's leading telecommunications provider, developed a Network Assistant using Amazon Bedrock to address the challenge of network engineers spending over 10% of their time manually gathering and analyzing data from multiple sources. The solution implements a multi-agent RAG architecture with specialized agents for documentation management and calculations, combined with an ETL pipeline using AWS services. The system is projected to reduce routine data retrieval and analysis time by 10%, saving approximately 200 hours per engineer annually while maintaining strict data security and sovereignty requirements for the telecommunications sector.