ZenML

Custom RAG Implementation for Enterprise Technology Research and Knowledge Management

Trace3 2024
View original source

Trace3's Innovation Team developed Innovation-GPT, a custom solution to streamline their technology research and knowledge management processes. The system uses LLMs and RAG architecture to automate the collection and analysis of data about enterprise technology companies, combining web scraping, structured data generation, and natural language querying capabilities. The solution addresses the challenges of managing large volumes of company research data while maintaining human oversight for quality control.

Industry

Consulting

Technologies

Overview

Trace3, a technology consulting firm, developed an internal generative AI solution called “Innovation-GPT” to address operational inefficiencies within their Innovation Team. The team is responsible for tracking emerging enterprise technology solutions and funding events across the market—a task that involves substantial manual research and knowledge retrieval challenges. This case study, published in September 2024, provides a practical example of how organizations can build custom LLM-powered solutions tailored to their specific business processes rather than relying solely on generic third-party AI implementations.

The case study is authored by Justin “Hutch” Hutchens, an Innovation Principal at Trace3, who frames the discussion within the broader context of organizations struggling to operationalize GenAI beyond surface-level adoption. The narrative emphasizes that while enabling generic AI features in existing tools can improve efficiency, the true potential of GenAI lies in custom implementations designed around unique business challenges.

Problem Statement

The Trace3 Innovation Team faced two interconnected challenges:

The first challenge was research efficiency. When new funding events occur in the enterprise technology space, the team manually researches each company by crawling websites, press releases, and third-party sources. This process is time-consuming and repetitive, creating a bottleneck in their ability to track the rapidly evolving technology landscape.

The second challenge was knowledge management and retrieval. As the number of companies and solutions being tracked grows, the accumulated information becomes overwhelming. Team members frequently need to sift through notes and documents to find solutions matching specific use cases or customer inquiries. This retrieval process was inefficient and relied heavily on individual memory and ad-hoc searching.

Technical Architecture

The Innovation-GPT solution addresses both challenges through a two-phase architecture that combines automated research with a custom RAG (Retrieval Augmented Generation) system.

Automated Research Pipeline

The research component automates the initial data gathering and processing workflow. The system begins by spidering and scraping the websites of newly funded enterprise technology solutions. This web scraping approach gathers unstructured information from company websites, press releases, and other publicly available sources.

The scraped data is then processed through LLMs to compile robust company profiles. A key step in this pipeline is the aggregation of unstructured content into structured JSON data records, organized by category. The system generates verbose metadata annotations for these records, which enhances the quality of downstream retrieval operations.

Finally, the structured data is vectorized using embedding models and stored in a vector database for subsequent retrieval. This vectorization step is essential for enabling semantic search capabilities in the knowledge management component.

RAG Architecture for Knowledge Retrieval

The second component implements a custom RAG architecture that enables natural language interaction with the accumulated research data. Users can query the system through a chatbot interface, asking questions about companies, solutions, or specific use cases. The RAG system retrieves relevant information from the vectorized database and generates contextually appropriate responses.

This approach directly addresses the knowledge retrieval challenge by replacing manual document searches with semantic, conversational queries. Team members can ask complex questions about the technology landscape and receive synthesized answers drawing from the entire corpus of tracked companies.

LLMOps Considerations and Risk Management

The case study offers valuable insights into the operational considerations for deploying LLMs in production environments, though it should be noted that this appears to be primarily an internal tool rather than a customer-facing production system.

Use Case Selection Framework

The authors present a framework for determining when GenAI is the appropriate tool versus simpler automation or classical analytics. They characterize the trade-off as generalization versus predictability: LLMs excel at handling variable, unstructured inputs but sacrifice predictable outputs. This trade-off is only worthwhile when input data is highly variable and unstructured, or when decision-making requires complex, variable rationale.

For Trace3’s use case, the team determined that tracking enterprise technology solutions inherently involves highly variable, unstructured data (company descriptions, solution details) and requires answering complex, variable questions from customers and internal stakeholders. This made it an appropriate candidate for GenAI rather than simpler approaches.

Data Flow Mapping

The case study recommends data flow mapping as a methodology for identifying GenAI integration opportunities. By documenting the lifecycle of information through business processes—creation, collection, validation, storage, enrichment, access, and deprecation—organizations can establish a baseline for identifying where LLMs might add value. This structured approach to use case identification is a practical LLMOps consideration that helps avoid implementing AI for its own sake.

Hallucination Mitigation

The team implemented specific techniques to address the risk of LLM hallucinations:

Multi-shot in-context learning (referred to as “in-context fine-tuning” in the article) was used to provide examples of expected and desirable output. This technique helps constrain the model’s responses to match the desired format and quality.

Additionally, the model was explicitly instructed to disclose when requested information was not included in its available dataset. This approach to handling knowledge gaps is a common pattern for reducing confident but incorrect responses.

Human-in-the-Loop Approach

Perhaps the most important operational safeguard described is the human-in-the-loop approach. All outputs generated by the model are treated as starting points rather than final answers. Every output is fact-checked and supplemented with manual research and analysis. The authors are candid that “the day may come when critical processes can be confidently handed over to GenAI systems without human oversight, but we are certainly not there yet.”

This approach represents a mature perspective on current LLM capabilities—using AI to augment human operations rather than replace them entirely. The risk-adjusted approach prioritizes accuracy over full automation.

Governance Framework Reference

The case study references the NIST AI Risk Management Framework (AI RMF) as providing high-level guidance for mapping, measuring, and managing AI-related risks. While specific implementation details are not provided, the acknowledgment of formal governance frameworks indicates awareness of the broader operational and compliance considerations for AI deployment.

Critical Assessment

While this case study provides useful practical insights, several limitations should be acknowledged:

The article comes from a consulting firm that offers AI implementation services, so there is an inherent promotional aspect to the content. The specific quantitative results or efficiency gains are not disclosed, making it difficult to assess the actual impact of the solution.

The technical details provided are high-level, with limited information about specific models used, infrastructure choices, or the precise implementation of the RAG architecture. Terms like “in-context fine-tuning” conflate in-context learning with fine-tuning, which are technically distinct approaches.

The solution appears to be an internal tool rather than a customer-facing production system, which may limit the transferability of lessons learned to organizations building external-facing AI products with stricter reliability requirements.

Despite these limitations, the case study offers a grounded perspective on custom GenAI implementations that acknowledges both the potential and the limitations of current technology. The emphasis on appropriate use case selection, risk management, and human oversight provides a responsible framework for organizations considering similar implementations.

Key Takeaways for LLMOps Practitioners

The case study reinforces several important principles for operationalizing LLMs: custom solutions tailored to specific business processes often deliver more value than generic AI implementations; understanding when GenAI is the right tool versus simpler automation is crucial for resource efficiency; hallucination risks can be partially mitigated through in-context learning and explicit handling of knowledge gaps; human oversight remains essential for most production use cases; and formal governance frameworks like NIST AI RMF provide useful guardrails for managing AI risks. Organizations should approach GenAI adoption with “a careful balance of enthusiasm and caution,” focusing on augmenting rather than replacing human capabilities in most current use cases.

More Like This

Agentic AI Copilot for Insurance Underwriting with Multi-Tool Integration

Snorkel 2025

Snorkel developed a specialized benchmark dataset for evaluating AI agents in insurance underwriting, leveraging their expert network of Chartered Property and Casualty Underwriters (CPCUs). The benchmark simulates an AI copilot that assists junior underwriters by reasoning over proprietary knowledge, using multiple tools including databases and underwriting guidelines, and engaging in multi-turn conversations. The evaluation revealed significant performance variations across frontier models (single digits to ~80% accuracy), with notable error modes including tool use failures (36% of conversations) and hallucinations from pretrained domain knowledge, particularly from OpenAI models which hallucinated non-existent insurance products 15-45% of the time.

healthcare fraud_detection customer_support +90

Fine-tuning Custom Embedding Models for Enterprise Search

Glean 2023

Glean implements enterprise search and RAG systems by developing custom embedding models for each customer. They tackle the challenge of heterogeneous enterprise data by using a unified data model and fine-tuning embedding models through continued pre-training and synthetic data generation. Their approach combines traditional search techniques with semantic search, achieving a 20% improvement in search quality over 6 months through continuous learning from user feedback and company-specific language adaptation.

document_processing question_answering unstructured_data +32

Multi-Agent AI System for Financial Intelligence and Risk Analysis

Moody’s 2025

Moody's Analytics, a century-old financial institution serving over 1,500 customers across 165 countries, transformed their approach to serving high-stakes financial decision-making by evolving from a basic RAG chatbot to a sophisticated multi-agent AI system on AWS. Facing challenges with unstructured financial data (PDFs with complex tables, charts, and regulatory documents), context window limitations, and the need for 100% accuracy in billion-dollar decisions, they architected a serverless multi-agent orchestration system using Amazon Bedrock, specialized task agents, custom workflows supporting up to 400 steps, and intelligent document processing pipelines. The solution processes over 1 million tokens daily in production, achieving 60% faster insights and 30% reduction in task completion times while maintaining the precision required for credit ratings, risk intelligence, and regulatory compliance across credit, climate, economics, and compliance domains.

fraud_detection document_processing question_answering +42