ZenML

LLMOps Tag: unstructured_data

172 tools with this tag

← Back to LLMOps Database

Common industries

View all industries →

Advancing Patient Experience and Business Operations Analytics with Generative AI in Healthcare

Huron

Huron Consulting Group implemented generative AI solutions to transform healthcare analytics across patient experience and business operations. The consulting firm faced challenges with analyzing unstructured data from patient rounding sessions and revenue cycle management notes, which previously required manual review and resulted in delayed interventions due to the 3-4 month lag in traditional HCAHPS survey feedback. Using AWS services including Amazon Bedrock with the Nova LLM model, Redshift, and S3, Huron built sentiment analysis capabilities that automatically process survey responses, staff interactions, and financial operation notes. The solution achieved 90% accuracy in sentiment classification (up from 75% initially) and now processes over 10,000 notes per week automatically, enabling real-time identification of patient dissatisfaction, revenue opportunities, and staff coaching needs that directly impact hospital funding and operational efficiency.

Agent-Based AI Assistants for Enterprise and E-commerce Applications

Prosus

Prosus developed two major AI agent applications: Toan, an internal enterprise AI assistant used by 15,000+ employees across 24 companies, and OLX Magic, an e-commerce assistant that enhances product discovery. Toan achieved significant reduction in hallucinations (from 10% to 1%) through agent-based architecture, while saving users approximately 50 minutes per day. OLX Magic transformed the traditional e-commerce experience by incorporating generative AI features for smarter product search and comparison.

Agent-Based Workflow Automation in Spreadsheets for Non-Technical Users

Otto

Otto, founded by Suli Omar, addresses the challenge of making AI agents accessible to non-technical users by embedding agent workflows directly into spreadsheet interfaces. The company transforms unstructured data processing tasks into spreadsheet-based workflows where each cell acts as an autonomous agent capable of executing tasks, waiting for dependencies, and outputting structured results. By leveraging the familiar spreadsheet UX instead of traditional chatbot interfaces, Otto enables finance teams, accountants, and other business users to harness agent capabilities without requiring technical expertise. The solution involves sophisticated model selection across three tiers (workhorse, middle-tier, and heavy reasoning models) to optimize cost and performance, continuous evaluation through customer usage patterns, and iterative model testing to maintain service quality as new LLM capabilities emerge.

Agentic AI Copilot for Insurance Underwriting with Multi-Tool Integration

Snorkel

Snorkel developed a specialized benchmark dataset for evaluating AI agents in insurance underwriting, leveraging their expert network of Chartered Property and Casualty Underwriters (CPCUs). The benchmark simulates an AI copilot that assists junior underwriters by reasoning over proprietary knowledge, using multiple tools including databases and underwriting guidelines, and engaging in multi-turn conversations. The evaluation revealed significant performance variations across frontier models (single digits to ~80% accuracy), with notable error modes including tool use failures (36% of conversations) and hallucinations from pretrained domain knowledge, particularly from OpenAI models which hallucinated non-existent insurance products 15-45% of the time.

Agentic AI Systems for Drug Discovery and Business Intelligence

Loka

Loka, an AWS partner specializing in generative AI solutions, and Domo, a business intelligence platform, demonstrate production implementations of agentic AI systems across multiple industries. Loka showcases their drug discovery assistant (ADA) that integrates multiple AI models and databases to accelerate pharmaceutical research workflows, while Domo presents agentic solutions for call center optimization and financial analysis. Both companies emphasize the importance of systematic approaches to AI implementation, moving beyond simple chatbots to multi-agent systems that can take autonomous actions while maintaining human oversight through human-in-the-loop architectures.

Agentic News Analysis Platform for Digital Asset Market Making

FSI

Digital asset market makers face the challenge of rapidly analyzing news events and social media posts to adjust trading strategies within seconds to avoid adverse selection and inventory risk. Traditional dictionary-based and statistical machine learning approaches proved too slow or required extensive labeled data. The solution involved building an agentic LLM-based platform on AWS that processes streaming news in near real-time, using fine-tuned embeddings for deduplication, reasoning models for sentiment analysis and impact assessment, and optimized inference infrastructure. Through progressive optimization from SageMaker JumpStart to VLLM to SGLNG, the team achieved 180 output tokens per second, enabling end-to-end latency under 10 seconds and doubling news processing capacity compared to initial deployment.

Agentic RAG Implementation for Retail Personalization and Customer Support

MongoDB

MongoDB and Dataworkz partnered to implement an agentic RAG (Retrieval Augmented Generation) solution for retail and e-commerce applications. The solution combines MongoDB Atlas's vector search capabilities with Dataworkz's RAG builder to create a scalable system that integrates operational data with unstructured information. This enables personalized customer experiences through intelligent chatbots, dynamic product recommendations, and enhanced search functionality, while maintaining context-awareness and real-time data access.

AI Agent for Automated Merchant Classification and Transaction Matching

Ramp

Ramp built an AI agent using LLMs, embeddings, and RAG to automatically fix incorrect merchant classifications that previously required hours of manual intervention from customer support teams. The agent processes user requests to reclassify transactions in under 10 seconds, handling nearly 100% of requests compared to the previous 1.5-3% manual handling rate, while maintaining 99% accuracy according to LLM-based evaluation and reducing customer support costs from hundreds of dollars to cents per request.

AI Agent System for Automated B2B Research and Sales Pipeline Generation

Unify

UniFi built an AI agent system that automates B2B research and sales pipeline generation by deploying research agents at scale to answer customer-defined questions about companies and prospects. The system evolved from initial React-based agents using GPT-4 and O1 models to a more sophisticated architecture incorporating browser automation, enhanced internet search capabilities, and cost-optimized model selection, ultimately processing 36+ billion tokens monthly while reducing per-query costs from 35 cents to 10 cents through strategic model swapping and architectural improvements.

AI Data Analyst with Multi-Stage LLM Architecture for Enterprise Data Discovery

Delivery Hero

The BADA team at Woowa Brothers (part of Delivery Hero) developed QueryAnswerBird (QAB), an LLM-based agentic system to improve employee data literacy across the organization. The problem addressed was that employees with varying levels of data expertise struggled to discover, understand, and utilize the company's vast internal data resources, including structured tables and unstructured log data. The solution involved building a multi-layered architecture with question understanding (Router Supervisor) and information acquisition stages, implementing various features including query/table explanation, syntax verification, table/column guidance, and log data utilization. Through two rounds of beta testing with data analysts, engineers, and product managers, the team iteratively refined the system to handle diverse question types beyond simple Text-to-SQL, ultimately creating a comprehensive data discovery platform that integrates with existing tools like Data Catalog and Log Checker to provide contextualized answers and improve organizational productivity.

AI-Assisted Activity Onboarding in Travel Marketplace

GetYourGuide

GetYourGuide faced challenges with their lengthy 16-step activity creation process, where suppliers spent up to an hour manually entering content that often had quality issues, leading to traveler confusion and lower conversion rates. They implemented a generative AI solution that allows activity providers to paste existing content and automatically generates descriptions and fills structured fields across 8 key onboarding steps. After an initial failed experiment due to UX confusion and measurement challenges, they iterated with improved UI/UX design and developed a novel permutation testing framework. The second rollout successfully increased activity completion rates, improved content quality, and reduced onboarding time to as little as 14 minutes, ultimately achieving positive impacts on both supplier efficiency and traveler engagement metrics.

AI-Augmented Cybersecurity Triage Using Graph RAG for Cloud Security Operations

Deloitte

Deloitte developed a Cybersecurity Intelligence Center to help SecOps engineers manage the overwhelming volume of security alerts generated by cloud security platforms like Wiz and CrowdStrike. Using AWS's open-source Graph RAG Toolkit, Deloitte built "AI for Triage," a human-in-the-loop system that combines long-term organizational memory (stored in hierarchical lexical graphs) with short-term operational data (document graphs) to generate AI-assisted triage records. The solution reduced 50,000 security issues across 7 AWS domains to approximately 1,300 actionable items, converting them into over 6,500 nodes and 19,000 relationships for contextual analysis. This approach enables SecOps teams to make informed remediation decisions based on organizational policies, historical experiences, and production system context, while maintaining human accountability and creating automation recipes rather than brittle code-based solutions.

AI-Driven Media Analysis and Content Assembly Platform for Large-Scale Video Archives

Bloomberg Media

Bloomberg Media, facing challenges in analyzing and leveraging 13 petabytes of video content growing at 3,000 hours per day, developed a comprehensive AI-driven platform to analyze, search, and automatically create content from their massive media archive. The solution combines multiple analysis approaches including task-specific models, vision language models (VLMs), and multimodal embeddings, unified through a federated search architecture and knowledge graphs. The platform enables automated content assembly using AI agents to create platform-specific cuts from long-form interviews and documentaries, dramatically reducing time to market while maintaining editorial trust and accuracy. This "disposable AI strategy" emphasizes modularity, versioning, and the ability to swap models and embeddings without re-engineering entire workflows, allowing Bloomberg to adapt quickly to evolving AI capabilities while expanding reach across multiple distribution platforms.

AI-Driven User Memory System for Dynamic Real Estate Personalization

Zillow

Zillow developed a sophisticated user memory system to address the challenge of personalizing real estate discovery for home shoppers whose preferences evolve significantly over time. The solution combines AI-driven preference profiles, embedding models, affordability-aware quantile models, and raw interaction history into a unified memory layer that operates across three dimensions: recency/frequency, flexibility/rigidity, and prediction/planning. This system is powered by a dual-layered architecture blending batch processing for long-term preferences with real-time streaming pipelines for short-term behavioral signals, enabling personalized experiences across search, recommendations, and notifications while maintaining user trust through privacy-centered design.

AI-Powered Clinical Outcome Assessment Review Using Generative AI

Clario

Clario, a clinical trials endpoint data provider, developed an AI-powered solution to automate the analysis of Clinical Outcome Assessment (COA) interviews in clinical trials for psychosis, anxiety, and mood disorders. The traditional approach of manually reviewing audio-video recordings was time-consuming, logistically complex, and introduced variability that could compromise trial reliability. Using Amazon Bedrock and other AWS services, Clario built a system that performs speaker diarization, multi-lingual transcription, semantic search, and agentic AI-powered quality review to evaluate interviews against standardized criteria. The solution demonstrates potential for reducing manual review effort by over 90%, providing 100% data coverage versus subset sampling, and decreasing review turnaround time from weeks to hours, while maintaining regulatory compliance and improving data quality for submissions.

AI-Powered Community Voice Intelligence for Local Government

ZenCity

ZenCity builds AI-powered platforms that help local governments understand and act on community voices by synthesizing diverse data sources including surveys, social media, 311 requests, and public engagement data. The company faced the challenge of processing millions of data points daily and delivering actionable insights to government officials who need to make informed decisions about budgets, policies, and services. Their solution involves a multi-layered AI architecture that enriches raw data with sentiment analysis and topic modeling, creates trend highlights, generates topic-specific insights, and produces automated briefs for specific government workflows like annual budgeting or crisis management. By implementing LLM-driven agents with MCP (Model Context Protocol) servers, they created an AI assistant that allows government officials to query data on-demand while maintaining data accuracy through citation requirements and multi-tenancy security. The system successfully delivers personalized, timely briefs to different government roles, reducing the need for manual analysis while ensuring community voices inform every decision.

AI-Powered Contact Center Transformation with Amazon Connect

Traeger

Traeger Grills transformed their customer experience operations from a legacy contact center with poor performance metrics (35% CSAT, 30% first contact resolution) into a modern AI-powered system built on Amazon Connect. The company implemented generative AI capabilities for automated case note generation, email composition, and chatbot interactions while building a "single pane of glass" agent experience using Amazon Connect Cases. This eliminated their legacy CRM, reduced new hire training time by 40%, improved agent satisfaction, and enabled seamless integration of their acquired Meater thermometer brand. The implementation leveraged AI to handle non-value-added work while keeping human agents focused on building emotional connections with customers in the "Traeger Hood" community, demonstrating a shift from cost center to profit center thinking.

AI-Powered Content Generation and Shot Commentary System for Live Golf Tournament Coverage

PGA Tour

The PGA Tour faced the challenge of engaging fans with golf content across multiple tournaments running nearly every week of the year, generating meaningful content from 31,000+ shots per tournament across 156 players, and maintaining relevance during non-tournament days. They implemented an agentic AI system using AWS Bedrock that generates up to 800 articles per week across eight different content types (betting profiles, tournament previews, player recaps, round recaps, purse breakdowns, etc.) and a real-time shot commentary system that provides contextual narration for live tournament play. The solution achieved 95% cost reduction (generating articles at $0.25 each), enabled content publication within 5-10 minutes of live events, resulted in billions of annual page views for AI-generated content, and became their highest-engaged content on non-tournament days while maintaining brand voice and factual accuracy through multi-agent validation workflows.

AI-Powered Customer Feedback Analysis at Scale

Github

GitHub faced the challenge of manually processing vast amounts of customer feedback from support tickets, with data scientists spending approximately 80% of their time on data collection and organization tasks. To address this, GitHub's Customer Success Engineering team developed an internal AI analytics tool that combines open-source machine learning models (BERTopic with BERT embeddings and HDBSCAN clustering) to identify patterns in feedback, and GPT-4 to generate human-readable summaries of customer pain points. This system transformed their feedback analysis from manual classification to automated trend identification, enabling faster identification of common issues, improved feature prioritization, data-driven decision making, and discovery of self-service opportunities for customers.

AI-Powered Customer Feedback Analysis Using RAG and LLMs in Product Analytics

Meta

Meta's Reality Labs developed a self-service AI tool powered by their open-source Llama 4 LLM to analyze customer feedback for their Quest VR headsets and Ray-Ban Meta products. The challenge was that customer feedback dataโ€”from reviews, bug reports, surveys, and social mediaโ€”was underutilized due to noise, bias, and lack of structure. By building a comprehensive feedback repository from internal and external sources and implementing a Retrieval Augmented Generation (RAG) system with embedding-based similarity search, Meta created a production system that transforms qualitative feedback into actionable insights. The tool is being used for bug deduplication, internal testing summaries, and strategic planning, enabling the company to bridge quantitative metrics with qualitative customer insights and dramatically reduce manual analysis time from hours to minutes.

AI-Powered Developer Productivity and Product Discovery at Wholesale Marketplace

Faire

Faire, a wholesale marketplace connecting brands and retailers, implemented multiple AI initiatives across their engineering organization to enhance both internal developer productivity and external customer-facing features. The company deployed agentic development workflows using GitHub Copilot and custom orchestration systems to automate repetitive coding tasks, introduced natural-language and image-based search capabilities for retailers seeking products, and built a hybrid Python-Kotlin architecture to support multi-step AI agents that compose purchasing recommendations. These efforts aimed to reduce manual workflows, accelerate product discovery, and deliver more personalized experiences for their wholesale marketplace customers.

AI-Powered Ecommerce Content Optimization Platform

Pattern

Pattern developed Content Brief, an AI-driven tool that processes over 38 trillion ecommerce data points to optimize product listings across multiple marketplaces. Using Amazon Bedrock and other AWS services, the system analyzes consumer behavior, content performance, and competitive data to provide actionable insights for product content optimization. In one case study, their solution helped Select Brands achieve a 21% month-over-month revenue increase and 14.5% traffic improvement through optimized product listings.

AI-Powered Fax Processing Automation for Healthcare Referrals

Providence

Providence Health System automated the processing of over 40 million annual faxes using GenAI and MLflow on Databricks to transform manual referral workflows into real-time automated triage. The system combines OCR with GPT-4.0 models to extract referral data from diverse document formats and integrates seamlessly with Epic EHR systems, eliminating months-long backlogs and freeing clinical staff to focus on patient care across 1,000+ clinics.

AI-Powered Multi-Agent Decision Support System for Enterprise Strategic Planning

Coinbase

Coinbase developed RAPID-D, an AI-powered decision support tool to augment their existing RAPID decision-making framework used for critical strategic choices. The system employs a multi-agent architecture where specialized AI agents collaborate to analyze decision documents, surface risks, challenge assumptions, and provide comprehensive recommendations to human decision-makers. By implementing a modular approach with agents serving as analysts, contextual seekers, devil's advocates, and synthesizers, Coinbase created a transparent and auditable system that helps mitigate cognitive bias while maintaining human oversight. The solution was iteratively developed based on leadership feedback, achieving strong accuracy benchmarks with Claude 3.7 Sonnet, and incorporates real-time feedback mechanisms to continuously improve recommendation quality.

AI-Powered Music Lyric Analysis and Semantic Search Platform

LyricLens

LyricLens, developed by Music Smatch, is a production AI system that extracts semantic meaning, themes, entities, cultural references, and sentiment from music lyrics at scale. The platform analyzes over 11 million songs using Amazon Bedrock's Nova family of foundation models to provide real-time insights for brands, artists, developers, and content moderators. By migrating from a previous provider to Amazon Nova models, Music Smatch achieved over 30% cost savings while maintaining accuracy, processing over 2.5 billion tokens. The system employs a multi-level semantic engine with knowledge graphs, supports content moderation with granular PG ratings, and enables natural language queries for playlist generation and trend analysis across demographics, genres, and time periods.

AI-Powered Product Description Generation for E-commerce Marketplaces

Handmade.com

Handmade.com, a hand-crafts marketplace with over 60,000 products, automated their product description generation process to address scalability challenges and improve SEO performance. The company implemented an end-to-end AI pipeline using Amazon Bedrock's Anthropic Claude 3.7 Sonnet for multimodal content generation, Amazon Titan Text Embeddings V2 for semantic search, and Amazon OpenSearch Service for vector storage. The solution employs Retrieval Augmented Generation (RAG) to enrich product descriptions by leveraging a curated dataset of 1 million handmade products, reducing manual processing time from 10 hours per week while improving content quality and search discoverability.

AI-Powered Sales Intelligence and Go-to-Market Orchestration Platform

Clay

Clay is a creative sales and marketing platform that helps companies execute go-to-market strategies by turning unstructured data about companies and people into actionable insights. The platform addresses the challenge of finding unique competitive advantages in sales ("go-to-market alpha") by integrating with over 150 data providers and using LLM-powered agents to research prospects, enrich data, and automate outreach. Their flagship agent "Claygent" performs web research to extract custom data points that aren't available in traditional sales databases, while their newer "Navigator" agent can interact with web forms and complex websites. Clay has achieved significant scale, crossing one billion agent runs and targeting two billion runs annually, while maintaining a philosophy that data will be imperfect and building tools for rapid iteration, validation, and trust-building through features like session replay.

AI-Powered Shift-Left Testing Platform with Multiple LLM Agents

QyrusAI

QyrusAI developed a comprehensive shift-left testing platform that integrates multiple AI agents powered by Amazon Bedrock's foundation models. The solution addresses the challenge of maintaining quality while accelerating development cycles by implementing AI-driven testing throughout the software development lifecycle. Their implementation resulted in an 80% reduction in defect leakage, 20% reduction in UAT effort, and 36% faster time to market.

AI-Powered Similar Issues Detection for Project Management

Linear

Linear developed a Similar Issues matching feature to address the persistent challenge of duplicate issues and backlog management in large team workflows. The solution uses large language models to generate vector embeddings that capture the semantic meaning of issue descriptions, enabling accurate detection of related or duplicate issues across their project management platform. The feature integrates at multiple touchpointsโ€”during issue creation, in the Triage inbox, and within support integrations like Intercomโ€”allowing teams to identify duplicates before they enter the system. The implementation uses PostgreSQL with pgvector on Google Cloud Platform for vector storage and search, with partitioning strategies to handle tens of millions of issues at scale.

AI-Powered Social Intelligence for Life Sciences

Indegene

Indegene developed an AI-powered social intelligence solution to help pharmaceutical companies extract insights from digital healthcare conversations on social media. The solution addresses the challenge that 52% of healthcare professionals now prefer receiving medical content through social channels, while the life sciences industry struggles with analyzing complex medical discussions at scale. Using Amazon Bedrock, SageMaker, and other AWS services, the platform provides healthcare-focused analytics including HCP identification, sentiment analysis, brand monitoring, and adverse event detection. The layered architecture delivers measurable improvements in time-to-insight generation and operational cost savings while maintaining regulatory compliance.

AI-Powered Tour Guide for Financial Platform Navigation

Ramp

Ramp developed an AI-powered Tour Guide agent to help users navigate their financial operations platform more effectively. The solution guides users through complex tasks by taking control of cursor movements while providing step-by-step explanations. Using an iterative action-taking approach and optimized prompt engineering, the Tour Guide increases user productivity and platform accessibility while maintaining user trust through transparent human-agent collaboration.

Automated Prompt Optimization for Intelligent Text Processing using Amazon Bedrock

Yuewen Group

Yuewen Group, a global online literature platform, transitioned from traditional NLP models to Claude 3.5 Sonnet on Amazon Bedrock for intelligent text processing. Initially facing challenges with unoptimized prompts performing worse than traditional models, they implemented Amazon Bedrock's Prompt Optimization feature to automatically enhance their prompts. This led to significant improvements in accuracy for tasks like character dialogue attribution, achieving 90% accuracy compared to the previous 70% with unoptimized prompts and 80% with traditional NLP models.

Automating Enterprise Workflows with Foundation Models in Healthcare

Various

The researchers present aCLAr (Demonstrate, Execute, Validate framework), a system that uses multimodal foundation models to automate enterprise workflows, particularly in healthcare settings. The system addresses limitations of traditional RPA by enabling passive learning from demonstrations, human-like UI navigation, and self-monitoring capabilities. They successfully demonstrated the system automating a real healthcare workflow in Epic EHR, showing how foundation models can be leveraged for complex enterprise automation without requiring API integration.

Autonomous Semiconductor Manufacturing with Multi-Modal LLMs and Reinforcement Learning

Samsung

Samsung is implementing a comprehensive LLMOps system for autonomous semiconductor fabrication, using multi-modal LLMs and reinforcement learning to transform manufacturing processes. The system combines sensor data analysis, knowledge graphs, and LLMs to automate equipment control, defect detection, and process optimization. Early results show significant improvements in areas like RF matching efficiency and anomaly detection, though challenges remain in real-time processing and time series prediction accuracy.

BM25 vs Vector Search for Large-Scale Code Repository Search

Github

Github faces the challenge of providing efficient search across 100+ billion documents while maintaining low latency and supporting diverse search use cases. They chose BM25 over vector search due to its computational efficiency, zero-shot capabilities, and ability to handle diverse query types. The solution involves careful optimization of search infrastructure, including strategic data routing and field-specific indexing approaches, resulting in a system that effectively serves Github's massive scale while keeping costs manageable.

Building a Conversational Shopping Assistant with Multi-Modal Search and Agent Architecture

OLX

OLX developed "OLX Magic", a conversational AI shopping assistant for their secondhand marketplace. The system combines traditional search with LLM-powered agents to handle natural language queries, multi-modal searches (text, image, voice), and comparative product analysis. The solution addresses challenges in e-commerce personalization and search refinement, while balancing user experience with technical constraints like latency and cost. Key innovations include hybrid search combining keyword and semantic matching, visual search with modifier capabilities, and an agent architecture that can handle both broad and specific queries.

Building a Horizontal Enterprise Agent Platform with Infrastructure-First Approach

Dust.tt

Dust.tt evolved from a developer framework competitor to LangChain into a horizontal enterprise platform for deploying AI agents, achieving remarkable 88% daily active user rates in some deployments. The company focuses on building robust infrastructure for agent deployment, maintaining its own integrations with enterprise systems like Notion and Slack, while making agent creation accessible to non-technical users through careful UX design and abstraction of technical complexities.

Building a Modern Search Engine for Parliamentary Records with RAG Capabilities

Hansard

The Singapore government developed Pair Search, a modern search engine for accessing Parliamentary records (Hansard), addressing the limitations of traditional keyword-based search. The system combines semantic search using e5 embeddings with ColbertV2 reranking, and is designed to serve both human users and as a retrieval backend for RAG applications. Early deployment shows significant user satisfaction with around 150 daily users and 200 daily searches, demonstrating improved search result quality over the previous system.

Building a Multi-Model LLM API Marketplace and Infrastructure Platform

OpenRouter

OpenRouter was founded in early 2023 to address the fragmented landscape of large language models by creating a unified API marketplace that aggregates over 400 models from 60+ providers. The company identified that the LLM inference market would not be winner-take-all, and built infrastructure to normalize different model APIs, provide intelligent routing, caching, and uptime guarantees. Their platform enables developers to switch between models with near-zero switching costs while providing better prices, uptime, and choice compared to using individual model providers directly.

Building a Multi-Model LLM Marketplace and Routing Platform

OpenRouter

OpenRouter was founded in 2023 to address the challenge of choosing between rapidly proliferating language models by creating a unified API marketplace that aggregates over 400 models from 60+ providers. The platform solves the problem of model selection, provider heterogeneity, and high switching costs by providing normalized access, intelligent routing, caching, and real-time performance monitoring. Results include 10-100% month-over-month growth, sub-30ms latency, improved uptime through provider aggregation, and evidence that the AI inference market is becoming multi-model rather than winner-take-all.

Building a RAG System for Cybersecurity Research and Reporting

Trainingracademy

TrainGRC developed a Retrieval Augmented Generation (RAG) system for cybersecurity research and reporting to address the challenge of fragmented knowledge in the cybersecurity domain. The system tackles issues with LLM censorship of security topics while dealing with complex data processing challenges including PDF extraction, web scraping, and vector search optimization. The implementation focused on solving data quality issues, optimizing search quality through various embedding algorithms, and establishing effective context chunking strategies.

Building a Reliable AI Quote Generation Assistant with LangGraph

Tradestack

Tradestack developed an AI-powered WhatsApp assistant to automate quote generation for trades businesses, reducing quote creation time from 3.5-10 hours to under 15 minutes. Using LangGraph Cloud, they built and launched their MVP in 6 weeks, improving end-to-end performance from 36% to 85% through rapid iteration and multimodal input processing. The system incorporated sophisticated agent architectures, human-in-the-loop interventions, and robust evaluation frameworks to ensure reliability and accuracy.

Building a Rust-Based AI Agentic Framework for Multimodal Data Quality Monitoring

Zectonal

Zectonal, a data quality monitoring company, developed a custom AI agentic framework in Rust to scale their multimodal data inspection capabilities beyond traditional rules-based approaches. The framework enables specialized AI agents to autonomously call diagnostic function tools for detecting defects, errors, and anomalous conditions in large datasets, while providing full audit trails through "Agent Provenance" tracking. The system supports multiple LLM providers (OpenAI, Anthropic, Ollama) and can operate both online and on-premise, packaged as a single binary executable that the company refers to as their "genie-in-a-binary."

Building a Scalable Conversational Video Agent with LangGraph and Twelve Labs APIs

Jockey

Jockey is an open-source conversational video agent that leverages LangGraph and Twelve Labs' video understanding APIs to process and analyze video content intelligently. The system evolved from v1.0 to v1.1, transitioning from basic LangChain to a more sophisticated LangGraph architecture, enabling better scalability and precise control over video workflows through a multi-agent system consisting of a Supervisor, Planner, and specialized Workers.

Building a Search Engine for AI Agents: Infrastructure, Product Development, and Production Deployment

Exa.ai

Exa.ai has built the first search engine specifically designed for AI agents rather than human users, addressing the fundamental problem that existing search engines like Google are optimized for consumer clicks and keyword-based queries rather than semantic understanding and agent workflows. The company trained its own models, built its own index, and invested heavily in compute infrastructure (including purchasing their own GPU cluster) to enable meaning-based search that returns raw, primary data sources rather than listicles or summaries. Their solution includes both an API for developers building AI applications and an agentic search tool called Websites that can find and enrich complex, multi-criteria queries. The results include serving hundreds of millions of queries across use cases like sales intelligence, recruiting, market research, and research paper discovery, with 95% inbound growth and expanding from 7 to 28+ employees within a year.

Building a Secure AI Assistant for Visual Effects Artists Using Amazon Bedrock

Untold Studios

Untold Studios developed an AI assistant integrated into Slack to help their visual effects artists access internal resources and tools more efficiently. Using Amazon Bedrock with Claude 3.5 Sonnet and a serverless architecture, they created a natural language interface that handles 120 queries per day, reducing information search time from minutes to seconds while maintaining strict data security. The solution combines RAG capabilities with function calling to access multiple knowledge bases and internal systems, significantly reducing the support team's workload.

Building a Silicon Brain for Universal Enterprise Search

Dropbox

Dropbox is transforming from a file storage company to an AI-powered universal search and organization platform. Through their Dash product, they are implementing LLM-powered search and organization capabilities across enterprise content, while maintaining strict data privacy and security. The engineering approach combines open-source LLMs, custom inference stacks, and hybrid architectures to deliver AI features to 700M+ users cost-effectively.

Building a Universal Search Product with RAG and AI Agents

Dropbox

Dropbox developed Dash, a universal search and knowledge management product that addresses the challenges of fragmented business data across multiple applications and formats. The solution combines retrieval-augmented generation (RAG) and AI agents to provide powerful search capabilities, content summarization, and question-answering features. They implemented a custom Python interpreter for AI agents and developed a sophisticated RAG system that balances latency, quality, and data freshness requirements for enterprise use.

Building a Video Q&A System with RAG and Speaker Detection

Vimeo

Vimeo developed a sophisticated video Q&A system that enables users to interact with video content through natural language queries. The system uses RAG (Retrieval Augmented Generation) to process video transcripts at multiple granularities, combined with an innovative speaker detection system that identifies speakers without facial recognition. The solution generates accurate answers, provides relevant video timestamps, and suggests related questions to maintain user engagement.

Building AI Assist: LLM Integration for E-commerce Product Listings

Mercari

Mercari developed an AI Assist feature to help sellers create better product listings using LLMs. They implemented a two-part system using GPT-4 for offline attribute extraction and GPT-3.5-turbo for real-time title suggestions, conducting both offline and online evaluations to ensure quality. The team focused on practical implementation challenges including prompt engineering, error handling, and addressing LLM output inconsistencies in a production environment.

Building an AI Agent Platform with Cloud-Based Virtual Machines and Extended Context

Manus

Manus AI, founded in late 2024, developed a consumer-focused AI agent platform that addresses the limitation of frontier LLMs having intelligence but lacking the ability to take action in digital environments. The company built a system where each user task is assigned a fully functional cloud-based virtual machine (Linux, with plans for Windows and Android) running real applications including file systems, terminals, VS Code, and Chromium browsers. By adopting a "less structure, more intelligence" philosophy that avoids predefined workflows and multi-role agent systems, and instead provides rich context to foundation models (primarily Anthropic's Claude), Manus created an agent capable of handling diverse long-horizon tasks from office location research to furniture shopping to data extraction, with users reporting up to 2 hours of daily GPU consumption. The platform launched publicly in March 2024 after five months of development and reportedly spent $1 million on Claude API usage in its first 14 days.

Building an AI API Gateway for Streamlined GenAI Service Development

DeliveryHero

DeliveryHero's Woowa Brothers division developed an AI API Gateway to address the challenges of managing multiple GenAI providers and streamlining development processes. The gateway serves as a central infrastructure component to handle credential management, prompt management, and system stability while supporting various GenAI services like AWS Bedrock, Azure OpenAI, and GCP Imagen. The initiative was driven by extensive user interviews and aims to democratize AI usage across the organization while maintaining security and efficiency.

Building an AI Co-Pilot Application: Patterns and Best Practices

Thoughtworks

Thoughtworks built Boba, an experimental AI co-pilot for product strategy and ideation, to learn about building generative AI experiences beyond chat interfaces. The team implemented several key patterns including templated prompts, structured responses, real-time progress streaming, context management, and external knowledge integration. The case study provides detailed insights into practical LLMOps patterns for building production LLM applications with enhanced user experiences.

Building an Asynchronous Event-Driven Agentic Framework for AI-Powered App Building

Airtable

Airtable built a custom agentic framework to power AI features including Omni (conversational app builder) and Field Agents (AI-powered fields). The problem was that early AI capabilities couldn't handle complex tasks requiring dynamic decision-making, data retrieval, or multi-step reasoning. The solution was an asynchronous event-driven state machine architecture with three core components: a context manager for maintaining information, a tool dispatcher for executing predefined actions, and a decision engine (LLM-powered) for autonomous planning. The framework enables agents to reason through complex tasks, self-correct errors, and handle large context windows through trimming and summarization strategies, resulting in production AI agents capable of automating thousands of hours of work.

Building and Debugging Web Automation Agents with LangChain Ecosystem

Airtop

Airtop developed a web automation platform that enables AI agents to interact with websites through natural language commands. They leveraged the LangChain ecosystem (LangChain, LangSmith, and LangGraph) to build flexible agent architectures, integrate multiple LLM models, and implement robust debugging and testing processes. The platform successfully enables structured information extraction and real-time website interactions while maintaining reliability and scalability.

Building and Deploying AI-Powered Visual and Semantic Search in Design Tools

Figma

Figma tackled the challenge of designers spending excessive time searching for existing designs by implementing AI-powered search capabilities. They developed both visual search (using screenshots or sketches) and semantic search features, using RAG and custom embedding systems. The team focused on solving real user workflows, developing systematic quality evaluations, and scaling the infrastructure to handle billions of embeddings while managing costs. The project evolved from an initial autocomplete prototype to a full-featured search system that helps designers find and reuse existing work more efficiently.

Building and Evaluating Legal AI at Scale with Domain Expert Integration

Harvey

Harvey, a legal AI company, has developed a comprehensive approach to building and evaluating AI systems for legal professionals, serving nearly 400 customers including one-third of the largest 100 US law firms. The company addresses the complex challenges of legal document analysis, contract review, and legal drafting through a suite of AI products ranging from general-purpose assistants to specialized workflows for large-scale document extraction. Their solution integrates domain experts (lawyers) throughout the entire product development process, implements multi-layered evaluation systems combining human preference judgments with automated LLM-based evaluations, and has built custom benchmarks and tooling to assess quality in this nuanced domain where mistakes can have career-impacting consequences.

Building and Implementing a Company-Wide GenAI Strategy

Thumbtack

Thumbtack developed and implemented a comprehensive generative AI strategy focusing on three key areas: enhancing their consumer product with LLMs for improved search and data analysis, transforming internal operations through AI-powered business processes, and boosting employee productivity. They established new infrastructure and policies for secure LLM deployment, demonstrated value through early wins in policy violation detection, and successfully drove company-wide adoption through executive sponsorship and careful expectation management.

Building and Optimizing a RAG-based Customer Service Chatbot

HDI

HDI, a German insurance company, implemented a RAG-based chatbot system to help customer service agents quickly find and access information across multiple knowledge bases. The system processes complex insurance documents, including tables and multi-column layouts, using various chunking strategies and vector search optimizations. After 120 experiments to optimize performance, the production system now serves 800+ users across multiple business lines, handling 26 queries per second with 88% recall rate and 6ms query latency.

Building and Scaling AI-Powered Visual Search Infrastructure

Figma

Figma implemented AI-powered search features to help users find designs and components across their organization using text descriptions or visual references. The solution leverages the CLIP multimodal embedding model, with infrastructure built to handle billions of embeddings while keeping costs down. The system combines traditional lexical search with vector similarity search, using AWS services including SageMaker, OpenSearch, and DynamoDB to process and index designs at scale. Key optimizations included vector quantization, software rendering, and cluster autoscaling to manage computational and storage costs.

Building Customer Intelligence MCP Server for AI Agent Integration

Dovetail

Dovetail, a customer intelligence platform, developed an MCP (Model Context Protocol) server to enable AI agents to access and utilize customer feedback data stored in their platform. The solution addresses the challenge of teams wanting to integrate their customer intelligence into internal AI workflows, allowing for automated report generation, roadmap development, and faster decision-making across product management, customer success, and design teams.

Building Deep Research: A Production AI Research Assistant Agent

Google Deepmind

Google Deepmind developed Deep Research, a feature that acts as an AI research assistant using Gemini to help users learn about any topic in depth. The system takes a query, browses the web for about 5 minutes, and outputs a comprehensive research report that users can review and ask follow-up questions about. The system uses iterative planning, transparent research processes, and a sophisticated orchestration backend to manage long-running autonomous research tasks.

Building Personalized Financial and Gardening Experiences with LLMs

Bud Financial / Scotts Miracle-Gro

This case study explores how Bud Financial and Scotts Miracle-Gro leverage Google Cloud's AI capabilities to create personalized customer experiences. Bud Financial developed a conversational AI solution for personalized banking interactions, while Scotts Miracle-Gro implemented an AI assistant called MyScotty for gardening advice and product recommendations. Both companies utilize various Google Cloud services including Vertex AI, GKE, and AI Search to deliver contextual, regulated, and accurate responses to their customers.

Building Production AI Agents with Vector Databases and Automated Data Collection

Devin Kearns

Over 18 months, a company built and deployed autonomous AI agents for business automation, focusing on lead generation and inbox management. They developed a comprehensive approach using vector databases (Pinecone), automated data collection, structured prompt engineering, and custom tools through n8n for deployment. Their solution emphasizes the importance of up-to-date data, proper agent architecture, and tool integration, resulting in scalable AI agent teams that can effectively handle complex business workflows.

Building Production-Grade AI Agents: Overcoming Reasoning and Tool Challenges

Kentauros AI

Kentauros AI presents their experience building production-grade AI agents, detailing the challenges in developing agents that can perform complex, open-ended tasks in real-world environments. They identify key challenges in agent reasoning (big brain, little brain, and tool brain problems) and propose solutions through reinforcement learning, generalizable algorithms, and scalable data approaches. Their evolution from G2 to G5 agent architectures demonstrates practical solutions to memory management, task-specific reasoning, and skill modularity.

Building Production-Grade Heterogeneous RAG Systems

AWS GenAIIC

AWS GenAIIC shares practical insights from implementing RAG systems with heterogeneous data formats in production. The case study explores using routers for managing diverse data sources, leveraging LLMs' code generation capabilities for structured data analysis, and implementing multimodal RAG solutions that combine text and image data. The solutions include modular components for intent detection, data processing, and retrieval across different data types with examples from multiple industries.

Building Production-Ready Agentic AI Systems in Financial Services

Fitch Group

Jayeeta Putatunda, Director of AI Center of Excellence at Fitch Group, shares lessons learned from deploying agentic AI systems in the financial services industry. The discussion covers the challenges of moving from proof-of-concept to production, emphasizing the importance of evaluation frameworks, observability, and the "data prep tax" required for reliable AI agent deployments. Key insights include the need to balance autonomous agents with deterministic workflows, implement comprehensive logging at every checkpoint, combine LLMs with traditional predictive models for numerical accuracy, and establish strong business-technical partnerships to define success metrics. The conversation highlights that while agentic frameworks enable powerful capabilities, production success requires careful system design, multi-layered evaluation, human-in-the-loop validation patterns, and a focus on high-ROI use cases rather than chasing the latest model architectures.

Building Production-Scale AI Search with Knowledge Graphs, MCP, and DSPy

Dropbox

Dropbox faced the challenge of enabling users to search and query their work content scattered across 50+ SaaS applications and tabs, which proprietary LLMs couldn't access. They built Dash, an AI-powered universal search and agent platform using a sophisticated context engine that combines custom connectors, content understanding, knowledge graphs, and index-based retrieval (primarily BM25) over federated approaches. The system addresses MCP scalability challenges through "super tools," uses LLM-as-a-judge for relevancy evaluation (achieving high agreement with human evaluators), and leverages DSPy for prompt optimization across 30+ prompts in their stack. This infrastructure enables cross-app intelligence with fast, accurate, and ACL-compliant retrieval for agentic queries at enterprise scale.

Building Robust Enterprise Search with LLMs and Traditional IR

Glean

Glean tackles enterprise search by combining traditional information retrieval techniques with modern LLMs and embeddings. Rather than relying solely on AI techniques, they emphasize the importance of rigorous ranking algorithms, personalization, and hybrid approaches that combine classical IR with vector search. The company has achieved unicorn status and serves major enterprises by focusing on holistic search solutions that include personalization, feed recommendations, and cross-application integrations.

Building Synthetic Filesystems for AI Agent Navigation Across Enterprise Data Sources

Dust.tt

Dust.tt observed that their AI agents were attempting to navigate company data using filesystem-like syntax, prompting them to build synthetic filesystems that map disparate data sources (Notion, Slack, Google Drive, GitHub) into Unix-inspired navigable structures. They implemented five filesystem commands (list, find, cat, search, locate_in_tree) that allow agents to both structurally explore and semantically search across organizational data, transforming agents from search engines into knowledge workers capable of complex multi-step information tasks.

Climate Tech Foundation Models for Environmental AI Applications

Various

Climate tech startups are leveraging Amazon SageMaker HyperPod to build specialized foundation models that address critical environmental challenges including weather prediction, sustainable material discovery, ecosystem monitoring, and geological modeling. Companies like Orbital Materials and Hum.AI are training custom models from scratch on massive environmental datasets, achieving significant breakthroughs such as tenfold performance improvements in carbon capture materials and the ability to see underwater from satellite imagery. These startups are moving beyond traditional LLM fine-tuning to create domain-specific models with billions of parameters that process multimodal environmental data including satellite imagery, sensor networks, and atmospheric measurements at scale.

Context Engineering Platform for Multi-Domain RAG and Agentic Systems

Contextual

Contextual has developed an end-to-end context engineering platform designed to address the challenges of building production-ready RAG and agentic systems across multiple domains including e-commerce, code generation, and device testing. The platform combines multimodal ingestion, hierarchical document processing, hybrid search with reranking, and dynamic agents to enable effective reasoning over large document collections. In a recent context engineering hackathon, Contextual's dynamic agent achieved competitive results on a retail dataset of nearly 100,000 documents, demonstrating the value of constrained sub-agents, turn limits, and intelligent tool selection including MCP server management.

Context Rot: Evaluating LLM Performance Degradation with Increasing Input Tokens

ChromaDB

ChromaDB's technical report examines how large language models (LLMs) experience performance degradation as input context length increases, challenging the assumption that models process context uniformly. Through evaluation of 18 state-of-the-art models including GPT-4.1, Claude 4, Gemini 2.5, and Qwen3 across controlled experiments, the research reveals that model reliability decreases significantly with longer inputs, even on simple tasks like retrieval and text replication. The study demonstrates that factors like needle-question similarity, presence of distractors, haystack structure, and semantic relationships all impact performance non-uniformly as context length grows, suggesting that current long-context benchmarks may not adequately reflect real-world performance challenges.

Custom RAG Implementation for Enterprise Technology Research and Knowledge Management

Trace3

Trace3's Innovation Team developed Innovation-GPT, a custom solution to streamline their technology research and knowledge management processes. The system uses LLMs and RAG architecture to automate the collection and analysis of data about enterprise technology companies, combining web scraping, structured data generation, and natural language querying capabilities. The solution addresses the challenges of managing large volumes of company research data while maintaining human oversight for quality control.

Data Engineering Challenges and Best Practices in LLM Production

QuantumBlack

Data engineers from QuantumBlack discuss the evolving landscape of data engineering with the rise of LLMs, highlighting key challenges in handling unstructured data, maintaining data quality, and ensuring privacy. They share experiences dealing with vector databases, data freshness in RAG applications, and implementing proper guardrails when deploying LLM solutions in enterprise settings.

Developing and Deploying Domain-Adapted LLMs for E-commerce Through Continued Pre-training

eBay

eBay tackled the challenge of incorporating LLMs into their e-commerce platform by developing e-Llama, a domain-adapted version of Llama 3.1. Through continued pre-training on a mix of e-commerce and general domain data, they created 8B and 70B parameter models that achieved 25% improvement in e-commerce tasks while maintaining strong general performance. The training was completed efficiently using 480 NVIDIA H100 GPUs and resulted in production-ready models aligned with human feedback and safety requirements.

Domain-Specific AI Platform for Manufacturing and Supply Chain Optimization

Articul8

Articul8 developed a generative AI platform to address enterprise challenges in manufacturing and supply chain management, particularly for a European automotive manufacturer. The platform combines public AI models with domain-specific intelligence and proprietary data to create a comprehensive knowledge graph from vast amounts of unstructured data. The solution reduced incident response time from 90 seconds to 30 seconds (3x improvement) and enabled automated root cause analysis for manufacturing defects, helping experts disseminate daily incidents and optimize production processes that previously required manual analysis by experienced engineers.

Email Classification System Using Foundation Models and Prompt Engineering

Travelers Insurance

Travelers Insurance developed an automated email classification system using Amazon Bedrock and Anthropic's Claude models to categorize millions of service request emails into 13 different categories. Through advanced prompt engineering techniques and without model fine-tuning, they achieved 91% classification accuracy, potentially saving tens of thousands of manual processing hours. The system combines email text analysis, PDF processing using Amazon Textract, and foundation model-based classification in a serverless architecture.

Enhancing E-commerce Search with Vector Embeddings and Generative AI

Mercado Libre / Grupo Boticario

Mercado Libre, Latin America's largest e-commerce platform, addressed the challenge of handling complex search queries by implementing vector embeddings and Google's Vector Search database. Their traditional word-matching search system struggled with contextual queries, leading to irrelevant results. The new system significantly improved search quality for complex queries, which constitute about half of all search traffic, resulting in increased click-through and conversion rates.

Enhancing Workplace Assessment Tools with RAG and Vector Search

Thomas

Thomas, a company specializing in workplace behavioral assessments, transformed their traditional paper-based psychometric assessment system by implementing generative AI solutions through Databricks. They leveraged RAG and Vector Search to make their extensive content database more accessible and interactive, enabling automated personalized insights generation from unstructured data while maintaining data security. This modernization allowed them to integrate their services into platforms like Microsoft Teams and develop their new "Perform" product, significantly improving user experience and scaling capabilities.

Enterprise Agent Orchestration Platform for Secure LLM Deployment

Airia

This case study explores how Airia developed an orchestration platform to help organizations deploy AI agents in production environments. The problem addressed is the significant complexity and security challenges that prevent businesses from moving beyond prototype AI agents to production-ready systems. The solution involves a comprehensive platform that provides agent building capabilities, security guardrails, evaluation frameworks, red teaming, and authentication controls. Results include successful deployments across multiple industries including hospitality (customer profiling across hotel chains), HR, legal (contract analysis), marketing (personalized content generation), and operations (real-time incident response through automated data aggregation), with customers reporting significant efficiency gains while maintaining enterprise security standards.

Enterprise Data Extraction Evolution from Simple RAG to Multi-Agent Architecture

Box

Box, a B2B unstructured data platform serving Fortune 500 companies, initially built a straightforward LLM-based metadata extraction system that successfully processed 10 million pages but encountered limitations with complex documents, OCR challenges, and scale requirements. They evolved from a simple pre-process-extract-post-process pipeline to a sophisticated multi-agent architecture that intelligently handles document complexity, field grouping, and quality feedback loops, resulting in a more robust and easily evolving system that better serves enterprise customers' diverse document processing needs.

Enterprise Document Data Extraction Using Agentic AI Workflows

Box

Box, an enterprise content platform serving over 115,000 customers including two-thirds of the Fortune 500, transformed their document data extraction capabilities by evolving from simple single-shot LLM prompting to sophisticated agentic AI workflows. Initially successful with basic document extraction using off-the-shelf models like GPT, Box encountered significant challenges when customers demanded extraction from complex 300-page documents with hundreds of fields, multilingual content, and poor OCR quality. The company implemented an agentic architecture using directed graphs that orchestrate multiple AI models, tools for validation and cross-checking, and iterative refinement processes. This approach dramatically improved accuracy and reliability while maintaining the flexibility to handle diverse document types and complex extraction requirements across their enterprise customer base.

Enterprise LLM Implementation Panel: Lessons from Box, Glean, Tyace, Security AI and Citibank

Various

A panel discussion featuring leaders from multiple enterprises sharing their experiences implementing LLMs in production. The discussion covers key challenges including data privacy, security, cost management, and enterprise integration. Speakers from Box discuss content management challenges, Glean covers enterprise search implementations, Tyace shares content generation experiences, Security AI addresses data safety, and Citibank provides CIO perspective on enterprise-wide AI deployment. The panel emphasizes the importance of proper data governance, security controls, and the need for systematic approach to move from POCs to production.

Enterprise LLM Playground Development for Internal AI Experimentation

Thomson Reuters

Thomson Reuters developed Open Arena, an enterprise-wide LLM playground, in under 6 weeks using AWS services. The platform enables non-technical employees to experiment with various LLMs in a secure environment, combining open-source and in-house models with company data. The solution saw rapid adoption with over 1,000 monthly users and helped drive innovation across the organization by allowing safe experimentation with generative AI capabilities.

Enterprise RAG System with Coveo Passage Retrieval and Amazon Bedrock Agents

Coveo

Coveo addresses the challenge of LLM accuracy and trustworthiness in enterprise environments by integrating their AI-Relevance Platform with Amazon Bedrock Agents. The solution uses Coveo's Passage Retrieval API to provide contextually relevant, permission-aware enterprise knowledge to LLMs through a two-stage retrieval process. This RAG implementation combines semantic and lexical search with machine learning-driven relevance tuning, unified indexing across multiple data sources, and enterprise-grade security to deliver grounded responses while maintaining data protection and real-time performance.

Enterprise Unstructured Data Quality Management for Production AI Systems

Anomalo

Anomalo addresses the critical challenge of unstructured data quality in enterprise AI deployments by building an automated platform on AWS that processes, validates, and cleanses unstructured documents at scale. The solution automates OCR and text parsing, implements continuous data observability to detect anomalies, enforces governance and compliance policies including PII detection, and leverages Amazon Bedrock for scalable LLM-based document quality analysis. This approach enables enterprises to transform their vast collections of unstructured text data into trusted assets for production AI applications while reducing operational burden, optimizing costs, and maintaining regulatory compliance.

Enterprise-Grade Memory Agents for Patent Processing with Deep Lake

Activeloop

Activeloop developed a solution for processing and generating patents using enterprise-grade memory agents and their Deep Lake vector database. The system handles 600,000 annual patent filings and 80 million total patents, reducing the typical 2-4 week patent generation process through specialized AI agents for different tasks like claim search, abstract generation, and question answering. The solution combines vector search, lexical search, and their proprietary Deep Memory technology to improve information retrieval accuracy by 5-10% without changing the underlying vector search architecture.

Enterprise-Grade RAG System for Internal Knowledge Management

PDI

PDI Technologies, a global leader in convenience retail and petroleum wholesale, built PDIQ (PDI Intelligence Query), an AI-powered internal knowledge assistant to address the challenge of fragmented information across websites, Confluence, SharePoint, and other enterprise systems. The solution implements a custom Retrieval Augmented Generation (RAG) system on AWS using serverless technologies including Lambda, ECS, DynamoDB, S3, Aurora PostgreSQL, and Amazon Bedrock models (Nova Pro, Nova Micro, Nova Lite, and Titan Embeddings V2). The system features sophisticated document processing with image captioning, dynamic token management for chunking (70% content, 10% overlap, 20% summary), and role-based access control. PDIQ improved customer satisfaction scores, reduced resolution times, increased accuracy approval rates from 60% to 79%, and enabled cost-effective scaling through serverless architecture while supporting multiple business units with configurable data sources.

Enterprise-Grade RAG Systems for Legal AI Platform

Harvey

Harvey, a legal AI platform serving professional services firms, addresses the complex challenge of building enterprise-grade Retrieval-Augmented Generation (RAG) systems that can handle sensitive legal documents while maintaining high performance, accuracy, and security. The company leverages specialized vector databases like LanceDB Enterprise and Postgres with PGVector to power their RAG systems across three key data sources: user-uploaded files, long-term vault projects, and third-party legal databases. Through careful evaluation of vector database options and collaboration with domain experts, Harvey has built a system that achieves 91% preference over ChatGPT in tax law applications while serving users in 45 countries with strict privacy and compliance requirements.

Enterprise-Scale Healthcare LLM System for Unified Patient Journeys

John Snow Labs

John Snow Labs developed a comprehensive healthcare LLM system that integrates multimodal medical data (structured, unstructured, FHIR, and images) into unified patient journeys. The system enables natural language querying across millions of patient records while maintaining data privacy and security. It uses specialized healthcare LLMs for information extraction, reasoning, and query understanding, deployed on-premises via Kubernetes. The solution significantly improves clinical decision support accuracy and enables broader access to patient data analytics while outperforming GPT-4 in medical tasks.

Enterprise-Scale RAG Implementation for E-commerce Product Discovery

Grainger

Grainger, managing 2.5 million MRO products, faced challenges with their e-commerce product discovery and customer service efficiency. They implemented a RAG-based search system using Databricks Mosaic AI and Vector Search to handle 400,000 daily product updates and improve search accuracy. The solution enabled better product discovery through conversational interfaces and enhanced customer service capabilities while maintaining real-time data synchronization.

Evaluation-Driven LLM Production Workflows with Morgan Stanley and Grab Case Studies

OpenAI

OpenAI's applied evaluation team presented best practices for implementing LLMs in production through two case studies: Morgan Stanley's internal document search system for financial advisors and Grab's computer vision system for Southeast Asian mapping. Both companies started with simple evaluation frameworks using just 5 initial test cases, then progressively scaled their evaluation systems while maintaining CI/CD integration. Morgan Stanley improved their RAG system's document recall from 20% to 80% through iterative evaluation and optimization, while Grab developed sophisticated vision fine-tuning capabilities for recognizing road signs and lane counts in Southeast Asian contexts. The key insight was that effective evaluation systems enable rapid iteration cycles and clear communication between teams and external partners like OpenAI for model improvement.

Evolving LLMOps Architecture for Enterprise Supplier Discovery

Various

A detailed case study of implementing LLMs in a supplier discovery product at Scoutbee, evolving from simple API integration to a sophisticated LLMOps architecture. The team tackled challenges of hallucinations, domain adaptation, and data quality through multiple stages: initial API integration, open-source LLM deployment, RAG implementation, and finally a comprehensive data expansion phase. The result was a production-ready system combining knowledge graphs, Chain of Thought prompting, and custom guardrails to provide reliable supplier discovery capabilities.

Evolving ML Infrastructure for Production Systems: From Traditional ML to LLMs

Doordash

A comprehensive overview of ML infrastructure evolution and LLMOps practices at major tech companies, focusing on Doordash's approach to integrating LLMs alongside traditional ML systems. The discussion covers how ML infrastructure needs to adapt for LLMs, the importance of maintaining guard rails, and strategies for managing errors and hallucinations in production systems, while balancing the trade-offs between traditional ML models and LLMs in production environments.

Fact-Centric Legal Document Review with Custom AI Pipeline

Mary Technology

Mary Technology, a Sydney-based legal tech firm, developed a specialized AI platform to automate document review for law firms handling dispute resolution cases. Recognizing that standard large language models (LLMs) with retrieval-augmented generation (RAG) are insufficient for legal work due to their compression nature, lack of training data access for sensitive documents, and inability to handle the nuanced fact extraction required for litigation, Mary built a custom "fact manufacturing pipeline" that treats facts as first-class citizens. This pipeline extracts entities, events, actors, and issues with full explainability and metadata, allowing lawyers to verify information before using downstream AI applications. Deployed across major firms including A&O Shearman, the platform has achieved a 75-85% reduction in document review time and a 96/100 Net Promoter Score.

Field AI Assistant for Sales Team Automation

Databricks

Databricks developed an AI-powered assistant to transform their sales operations by automating routine tasks and improving data access. The Field AI Assistant, built on their Mosaic AI agent framework, integrates multiple data sources including their Lakehouse, CRM, and collaboration platforms to provide conversational interactions, automate document creation, and execute actions based on data insights. The solution streamlines workflows for sales teams, allowing them to focus on high-value activities while ensuring proper governance and security measures.

Fine-Tuning and Quantizing LLMs for Dynamic Attribute Extraction

Mercari

Mercari tackled the challenge of extracting dynamic attributes from user-generated marketplace listings by fine-tuning a 2B parameter LLM using QLoRA. The team successfully created a model that outperformed GPT-3.5-turbo while being 95% smaller and 14 times more cost-effective. The implementation included careful dataset preparation, parameter efficient fine-tuning, and post-training quantization using llama.cpp, resulting in a production-ready model with better control over hallucinations.

Fine-tuning Custom Embedding Models for Enterprise Search

Glean

Glean implements enterprise search and RAG systems by developing custom embedding models for each customer. They tackle the challenge of heterogeneous enterprise data by using a unified data model and fine-tuning embedding models through continued pre-training and synthetic data generation. Their approach combines traditional search techniques with semantic search, achieving a 20% improvement in search quality over 6 months through continuous learning from user feedback and company-specific language adaptation.

From Simple RAG to Multi-Agent Architecture for Document Data Extraction

Box

Box evolved their document data extraction system from a simple single-model approach to a sophisticated multi-agent architecture to handle enterprise-scale unstructured data processing. The initial straightforward approach of preprocessing documents and feeding them to an LLM worked well for basic use cases but failed when customers presented complex challenges like 300-page documents, poor OCR quality, hundreds of extraction fields, and confidence scoring requirements. By redesigning the system using an agentic approach with specialized sub-agents for different tasks, Box achieved better accuracy, easier system evolution, and improved maintainability while processing millions of pages for enterprise customers.

GenAI Transformation of Manufacturing and Supply Chain Operations

Jabil

Jabil, a global manufacturing company with $29B in revenue and 140,000 employees, implemented Amazon Q to transform their manufacturing and supply chain operations. They deployed GenAI solutions across three key areas: shop floor operations assistance (Ask Me How), procurement intelligence (PIP), and supply chain management (V-command). The implementation helped reduce downtime, improve operator efficiency, enhance procurement decisions, and accelerate sales cycles for their supply chain services. The company established robust governance through AI and GenAI councils while ensuring responsible AI usage and clear value creation.

Generative AI-Powered Knowledge Sharing System for Travel Expertise

Hotelplan Suisse

Hotelplan Suisse implemented a generative AI solution to address the challenge of sharing travel expertise across their 500+ travel experts. The system integrates multiple data sources and uses semantic search to provide instant, expert-level travel recommendations to sales staff. The solution reduced response time from hours to minutes and includes features like chat history management, automated testing, and content generation capabilities for marketing materials.

Global News Organization's AI-Powered Content Production and Verification System

Reuters

Reuters has implemented a comprehensive AI strategy to enhance its global news operations, focusing on reducing manual work, augmenting content production, and transforming news delivery. The organization developed three key tools: a press release fact extraction system, an AI-integrated CMS called Leon, and a content packaging tool called LAMP. They've also launched the Reuters AI Suite for clients, offering transcription and translation capabilities while maintaining strict ethical guidelines around AI-generated imagery and maintaining journalistic integrity.

Healthcare Patient Journey Analysis Platform with Multimodal LLMs

John Snow Labs

John Snow Labs developed a comprehensive healthcare analytics platform that uses specialized medical LLMs to process and analyze patient data across multiple modalities including unstructured text, structured EHR data, FIR resources, and images. The platform enables healthcare professionals to query patient histories and build cohorts using natural language, while handling complex medical terminology mapping and temporal reasoning. The system runs entirely within the customer's infrastructure for security, uses Kubernetes for deployment, and significantly outperforms GPT-4 on medical tasks while maintaining consistency and explainability in production.

Human-AI Co-Annotation System for Efficient Data Labeling

Appen

Appen developed a hybrid approach combining LLMs with human annotators to address the growing challenges in data annotation for AI models. They implemented a co-annotation engine that uses model uncertainty metrics to efficiently route annotation tasks between LLMs and human annotators. Using GPT-3.5 Turbo for initial annotations and entropy-based confidence scoring, they achieved 87% accuracy while reducing costs by 62% and annotation time by 63% compared to purely human annotation, demonstrating an effective balance between automation and human expertise.

Implementing RAG for Call Center Operations with Hybrid Data Sources

Manulife

Manulife implemented a Retrieval Augmented Generation (RAG) system in their call center to help customer service representatives quickly access and utilize information from both structured and unstructured data sources. They developed an innovative approach combining document chunks and structured data embeddings, achieving an optimized response time of 7.33 seconds in production. The system successfully handles both policy documents and database information, using GPT-3.5 for answer generation with additional validation from Llama 3 or GPT-4.

Improving Local Search with Multimodal LLMs and Vector Search

OfferUp

OfferUp transformed their traditional keyword-based search system to a multimodal search solution using Amazon Bedrock's Titan Multimodal Embeddings and Amazon OpenSearch Service. The new system processes both text and images to generate vector embeddings, enabling more contextually relevant search results. The implementation led to significant improvements, including a 27% increase in relevance recall, 54% reduction in geographic spread for more local results, and a 6.5% increase in search depth.

Intelligent Document Processing at Scale with AI-Powered Tax Compliance and Invoice Analysis

Syngenta

Syngenta, a global agricultural company processing over one million invoices annually across 90 countries, implemented "Wingman," an AI-powered intelligent document processing system to automate complex document analysis tasks. The solution leverages Amazon Bedrock Data Automation (BDA) for document parsing and LLMs (primarily Anthropic Claude) for intelligent content extraction and policy comparison. Starting with tax compliance in Argentina, where complex regional tax laws required manual verification of 4,000 invoices monthly, Wingman automatically extracts invoice content, compares it against tax policies, and identifies discrepancies with human-readable explanations. The system achieved near-perfect accuracy and is being scaled to additional use cases including indirect spend reduction, vendor master data accuracy, and expense compliance across multiple countries.

Large Recommender Models: Adapting Gemini for YouTube Video Recommendations

Google / YouTube

YouTube developed Large Recommender Models (LRM) by adapting Google's Gemini LLM for video recommendations, addressing the challenge of serving personalized content to billions of users. The solution involved creating semantic IDs to tokenize videos, continuous pre-training to teach the model both English and YouTube-specific video language, and implementing generative retrieval systems. While the approach delivered significant improvements in recommendation quality, particularly for challenging cases like new users and fresh content, the team faced substantial serving cost challenges that required 95%+ cost reductions and offline inference strategies to make production deployment viable at YouTube's scale.

Large-Scale Legal RAG Implementation with Multimodal Data Infrastructure

Harvey / Lance

Harvey, a legal AI assistant company, partnered with LanceDB to address complex retrieval-augmented generation (RAG) challenges across massive datasets of legal documents. The case study demonstrates how they built a scalable system to handle diverse legal queries ranging from small on-demand uploads to large data corpuses containing millions of documents from various jurisdictions. Their solution combines advanced vector search capabilities with a multimodal lakehouse architecture, emphasizing evaluation-driven development and flexible infrastructure to support the complex, domain-specific nature of legal AI applications.

Large-Scale Personalization and Product Knowledge Graph Enhancement Through LLM Integration

DoorDash

DoorDash faced challenges in scaling personalization and maintaining product catalogs as they expanded beyond restaurants into new verticals like grocery, retail, and convenience stores, dealing with millions of SKUs and cold-start scenarios for new customers and products. They implemented a layered approach combining traditional machine learning with fine-tuned LLMs, RAG systems, and LLM agents to automate product knowledge graph construction, enable contextual personalization, and provide recommendations even without historical user interaction data. The solution resulted in faster, more cost-effective catalog processing, improved personalization for cold-start scenarios, and the foundation for future agentic shopping experiences that can adapt to real-time contexts like emergency situations.

Large-Scale Personalization System Using LLMs for Buyer Profile Generation

Etsy

Etsy tackled the challenge of personalizing shopping experiences for nearly 90 million buyers across 100+ million listings by implementing an LLM-based system to generate detailed buyer profiles from browsing and purchasing behaviors. The system analyzes user session data including searches, views, purchases, and favorites to create structured profiles capturing nuanced interests like style preferences and shopping missions. Through significant optimization efforts including data source improvements, token reduction, batch processing, and parallel execution, Etsy reduced profile generation time from 21 days to 3 days for 10 million users while cutting costs by 94% per million users, enabling economically viable large-scale personalization for search query rewriting and refinement pills.

LLM-Enhanced Topic Modeling System for Qualitative Text Analysis

QualIT

QualIT developed a novel topic modeling system that combines large language models with traditional clustering techniques to analyze qualitative text data more effectively. The system uses LLMs to extract key phrases and employs a two-stage hierarchical clustering approach, demonstrating significant improvements over baseline methods with 70% topic coherence (vs 65% and 57% for benchmarks) and 95.5% topic diversity (vs 85% and 72%). The system includes safeguards against LLM hallucinations and has been validated through human evaluation.

LLM-Generated Entity Profiles for Personalized Food Delivery Platform

DoorDash

DoorDash evolved from traditional numerical embeddings to LLM-generated natural language profiles for representing consumers, merchants, and food items to improve personalization and explainability. The company built an automated system that generates detailed, human-readable profiles by feeding structured data (order history, reviews, menu metadata) through carefully engineered prompts to LLMs, enabling transparent recommendations, editable user preferences, and richer input for downstream ML models. While the approach offers scalability and interpretability advantages over traditional embeddings, the implementation requires careful evaluation frameworks, robust serving infrastructure, and continuous iteration cycles to maintain profile quality in production.

LLM-Powered Analysis of User Bug Reports for Product Quality Improvement

Meta

Meta's Facebook product team faced challenges in analyzing large volumes of unstructured user bug reports at scale using traditional methods. They developed an LLM-based system that classifies user feedback into predefined categories, monitors trends through automated dashboards, and performs root cause analysis to identify product issues. Through iterative prompt engineering and integration with data pipelines, the system successfully detected major outages in real-time, identified less visible bugs that might have been missed, and contributed to reducing overall bug reports by double digits over several months by enabling targeted product improvements and cross-functional collaboration.

LLM-Powered Information Extraction from Pediatric Cardiac MRI Reports

UK National Health Service (NHS)

Great Ormond Street Hospital NHS Trust developed a solution to extract information from 15,000 unstructured cardiac MRI reports spanning 10 years. They implemented a hybrid approach using small LLMs for entity extraction and few-shot learning for table structure classification. The system successfully extracted patient identifiers and clinical measurements from heterogeneous reports, enabling linkage with structured data and improving clinical research capabilities. The solution demonstrated significant improvements in extraction accuracy when using contextual prompting with models like FLAN-T5 and RoBERTa, while operating within NHS security constraints.

LLM-Powered Multi-Tool Architecture for Oil & Gas Data Exploration

DXC

DXC developed an AI assistant to accelerate oil and gas data exploration by integrating multiple specialized LLM-powered tools. The solution uses a router to direct queries to specialized tools optimized for different data types including text, tables, and industry-specific formats like LAS files. Built using Anthropic's Claude on Amazon Bedrock, the system includes conversational capabilities and semantic search to help users efficiently analyze complex datasets, reducing exploration time from hours to minutes.

LLM-Powered Product Attribute Extraction from Unstructured Marketplace Data

Etsy

Etsy faced the challenge of understanding and categorizing over 100 million unique, handmade items listed by 5 million sellers, where most product information existed only as unstructured text and images rather than structured attributes. The company deployed large language models to extract product attributes at scale from listing titles, descriptions, and photos, transforming unstructured data into structured attributes that could power search filters and product comparisons. The implementation increased complete attribute coverage from 31% to 91% in target categories, improved engagement with search filters, and increased overall post-click conversion rates, while establishing robust evaluation frameworks using both human-annotated ground truth and LLM-generated silver labels.

LLM-Powered Search Relevance Re-Ranking System

LeBonCoin

leboncoin, France's largest second-hand marketplace, implemented a neural re-ranking system using large language models to improve search relevance across their 60 million classified ads. The system uses a two-tower architecture with separate Ad and Query encoders based on fine-tuned LLMs, achieving up to 5% improvement in click and contact rates and 10% improvement in user experience KPIs while maintaining strict latency requirements for their high-throughput search system.

LLM-Powered User Feedback Analysis for Bug Report Classification and Product Improvement

Meta

Meta (Facebook) developed an LLM-based system to analyze unstructured user bug reports at scale, addressing the challenge of processing free-text feedback that was previously resource-intensive and difficult to analyze with traditional methods. The solution uses prompt engineering to classify bug reports into predefined categories, enabling automated monitoring through dashboards, trend detection, and root cause analysis. This approach successfully identified critical issues during outages, caught less visible bugs that might have been missed, and resulted in double-digit reductions in topline bug reports over several months by enabling cross-functional teams to implement targeted fixes and product improvements.

Migrating from Elasticsearch to Vespa for Large-Scale Search Platform

Vinted

Vinted, a major e-commerce platform, successfully migrated their search infrastructure from Elasticsearch to Vespa to handle their growing scale of 1 billion searchable items. The migration resulted in halving their server count, improving search latency by 2.5x, reducing indexing latency by 3x, and decreasing visibility time for changes from 300 to 5 seconds. The project, completed between May 2023 and April 2024, demonstrated significant improvements in search relevance and operational efficiency through careful architectural planning and phased implementation.

Migrating LLM Fine-tuning Workflows from Slurm to Kubernetes Using Metaflow and Argo

Adept.ai

Adept.ai, building an AI model for computer interaction, faced challenges with complex fine-tuning pipelines running on Slurm. They implemented a migration strategy to Kubernetes using Metaflow and Argo for workflow orchestration, while maintaining existing Slurm workloads through a hybrid approach. This allowed them to improve pipeline management, enable self-service capabilities for data scientists, and establish robust monitoring infrastructure, though complete migration to Kubernetes remains a work in progress.

MLflow's Production-Ready Agent Framework and LLM Tracing

MLflow

MLflow addresses the challenges of moving LLM agents from demo to production by introducing comprehensive tooling for tracing, evaluation, and experiment tracking. The solution includes LLM tracing capabilities to debug black-box agent systems, evaluation tools for retrieval relevance and prompt engineering, and integrations with popular agent frameworks like Autogen and LlamaIndex. This enables organizations to effectively monitor, debug, and improve their LLM-based applications in production environments.

Model Context Protocol (MCP): A Universal Standard for AI Application Extensions

Anthropic

Anthropic developed the Model Context Protocol (MCP) to solve the challenge of extending AI applications with plugins and external functionality in a standardized way. Inspired by the Language Server Protocol (LSP), MCP provides a universal connector that enables AI applications to interact with various tools, resources, and prompts through a client-server architecture. The protocol has gained significant community adoption and contributions from companies like Shopify, Microsoft, and JetBrains, demonstrating its potential as an open standard for AI application integration.

Multi-Agent AI System for Financial Intelligence and Risk Analysis

Moodyโ€™s

Moody's Analytics, a century-old financial institution serving over 1,500 customers across 165 countries, transformed their approach to serving high-stakes financial decision-making by evolving from a basic RAG chatbot to a sophisticated multi-agent AI system on AWS. Facing challenges with unstructured financial data (PDFs with complex tables, charts, and regulatory documents), context window limitations, and the need for 100% accuracy in billion-dollar decisions, they architected a serverless multi-agent orchestration system using Amazon Bedrock, specialized task agents, custom workflows supporting up to 400 steps, and intelligent document processing pipelines. The solution processes over 1 million tokens daily in production, achieving 60% faster insights and 30% reduction in task completion times while maintaining the precision required for credit ratings, risk intelligence, and regulatory compliance across credit, climate, economics, and compliance domains.

Multi-Agent AI System for Investment Thesis Validation Using Devil's Advocate

Linqalpha

LinqAlpha, a Boston-based AI platform serving over 170 institutional investors, developed Devil's Advocate, an AI agent that systematically pressure-tests investment theses by identifying blind spots and generating evidence-based counterarguments. The system addresses the challenge of confirmation bias in investment research by automating the manual process of challenging investment ideas, which traditionally required time-consuming cross-referencing of expert calls, broker reports, and filings. Using a multi-agent architecture powered by Claude Sonnet 3.7 and 4.0 on Amazon Bedrock, integrated with Amazon Textract, Amazon OpenSearch Service, Amazon RDS, and Amazon S3, the solution decomposes investment theses into assumptions, retrieves counterevidence from uploaded documents, and generates structured, citation-linked rebuttals. The system enables investors to conduct rigorous due diligence at 5-10 times the speed of traditional reviews while maintaining auditability and compliance requirements critical to institutional finance.

Multi-Agent Financial Analysis System for Equity Research

Captide

Captide developed a platform to automate and enhance equity research by deploying an intelligent multi-agent system for processing financial documents. Using LangGraph and LangSmith hosted on LangGraph Platform, they implemented parallel document processing capabilities and structured output generation for financial metrics extraction. The system allows analysts to query complex financial data using natural language, significantly improving efficiency in processing regulatory filings and investor relations documents while maintaining high accuracy standards through continuous monitoring and feedback loops.

Multi-Agent Financial Research and Question Answering System

Yahoo! Finance

Yahoo! Finance built a production-scale financial question answering system using multi-agent architecture to address the information asymmetry between retail and institutional investors. The system leverages Amazon Bedrock Agent Core and employs a supervisor-subagent pattern where specialized agents handle structured data (stock prices, financials), unstructured data (SEC filings, news), and various APIs. The solution processes heterogeneous financial data from multiple sources, handles temporal complexities of fiscal years, and maintains context across sessions. Through a hybrid evaluation approach combining human and AI judges, the system achieves strong accuracy and coverage metrics while processing queries in 5-50 seconds at costs of 2-5 cents per query, demonstrating production viability at scale with support for 100+ concurrent users.

Multi-Agent Personalization Engine with Proactive Memory System for Batch Processing

Personize.ai

Personize.ai, a Canadian startup, developed a multi-agent personalization engine called "Cortex" to generate personalized content at scale for emails, websites, and product pages. The company faced challenges with traditional RAG and function calling approaches when processing customer databases autonomously, including inconsistency across agents, context overload, and lack of deep customer understanding. Their solution implements a proactive memory system that infers and synthesizes customer insights into standardized attributes shared across all agents, enabling centralized recall and compressed context. Early testing with 20+ B2B companies showed the system can perform deep research in 5-10 minutes and generate highly personalized, domain-specific content that matches senior-level quality without human-in-the-loop intervention.

Multi-Agent Web Research System with Dynamic Task Generation

Exa

Exa evolved from providing a search API to building a production-ready multi-agent web research system that processes hundreds of research queries daily, delivering structured results in 15 seconds to 3 minutes. Using LangGraph for orchestration and LangSmith for observability, their system employs a three-component architecture with a planner that dynamically generates parallel tasks, independent research units with specialized tools, and an observer maintaining full context across all components. The system intelligently balances between search snippets and full content retrieval to optimize token usage while maintaining research quality, ultimately providing structured JSON outputs specifically designed for API consumption.

Multimodal AI Vector Search for Advanced Video Understanding

Twelve Labs

Twelve Labs developed an integration with Databricks Mosaic AI to enable advanced video understanding capabilities through multimodal embeddings. The solution addresses challenges in processing large-scale video datasets and providing accurate multimodal content representation. By combining Twelve Labs' Embed API for generating contextual vector representations with Databricks Mosaic AI Vector Search's scalable infrastructure, developers can implement sophisticated video search, recommendation, and analysis systems with reduced development time and resource needs.

Multimodal Art Collection Search Using Vector Databases and LLMs

Actum Digital

An art institution implemented a sophisticated multimodal search system for their collection of 40 million art assets using vector databases and LLMs. The system combines text and image-based search capabilities, allowing users to find artworks based on various attributes including style, content, and visual similarity. The solution evolved from using basic cloud services to a more cost-effective and flexible approach, reducing infrastructure costs to approximately $1,000 per region while maintaining high search accuracy.

Multimodal Feature Stores and Research-Engineering Collaboration

Runway

Runway, a leader in generative AI for creative tools, developed a novel approach to managing multimodal training data through what they call a "multimodal feature store". This system enables efficient storage and retrieval of diverse data types (video, images, text) along with their computed features and embeddings, facilitating large-scale distributed training while maintaining researcher productivity. The solution addresses challenges in data management, feature computation, and the research-to-production pipeline, while fostering better collaboration between researchers and engineers.

Multimodal Healthcare Data Integration with Specialized LLMs

John Snow Labs

John Snow Labs developed a comprehensive healthcare data integration system that leverages multiple specialized LLMs to unify and analyze patient data from various sources. The system processes structured, unstructured, and semi-structured medical data (including EHR, PDFs, HL7, FHIR) to create complete patient journeys, enabling natural language querying while maintaining consistency, accuracy, and scalability. The solution addresses key healthcare challenges like terminology mapping, date normalization, and data deduplication, all while operating within secure environments and handling millions of patient records.

Multimodal RAG Architecture Optimization for Production

Microsoft

Microsoft explored optimizing a production Retrieval-Augmented Generation (RAG) system that incorporates both text and image content to answer domain-specific queries. The team conducted extensive experiments on various aspects of the system including prompt engineering, metadata inclusion, chunk structure, image enrichment strategies, and model selection. Key improvements came from using separate image chunks, implementing a classifier for image relevance, and utilizing GPT-4V for enrichment while using GPT-4o for inference. The resulting system achieved better search precision and more relevant LLM-generated responses while maintaining cost efficiency.

Multimodal RAG Solution for Oil and Gas Drilling Data Processing

Infosys

Infosys developed an advanced multimodal Retrieval-Augmented Generation (RAG) solution using Amazon Bedrock to process complex oil and gas drilling documentation containing text, images, charts, and technical diagrams. The solution addresses the challenge of extracting insights from thousands of technical documents including well completion reports, drilling logs, and lithology diagrams that traditional document processing methods struggle to handle effectively. Through iterative development exploring various chunking strategies, embedding models, and search approaches, the team ultimately implemented a hybrid search system with parent-child chunking hierarchy, achieving 92% retrieval accuracy, sub-2-second response times, and delivering significant operational efficiency gains including 40-50% reduction in manual document processing costs and 60% time savings for field engineers and geologists.

On-Device Grammar Correction with Sequence-to-Sequence Models

Google

Google Research developed an on-device grammar correction system for Gboard on Pixel 6 that detects and suggests corrections for grammatical errors as users type. The solution addresses the challenge of implementing neural grammar correction within the constraints of mobile devices (limited memory, computational power, and latency requirements) while preserving user privacy by keeping all processing local. The team built a 20MB hybrid Transformer-LSTM model using hard distillation from a cloud-based system, achieving inference on 60 characters in under 22ms on the Pixel 6 CPU, enabling real-time grammar correction for both complete sentences and partial sentence prefixes across English text in nearly any app using Gboard.

Optimizing Production Vision Pipelines for Planet Image Generation

Prem AI

At Prem AI, they tackled the challenge of generating realistic ethereal planet images at scale with specific constraints like aspect ratio and controllable parameters. The solution involved fine-tuning Stable Diffusion XL with a curated high-quality dataset, implementing custom upscaling pipelines, and optimizing performance through various techniques including LoRA fusion, model quantization, and efficient serving frameworks like Ray Serve.

Optimizing RAG Systems: Lessons from Production

AWS GenAIIC

AWS GenAIIC shares comprehensive lessons learned from implementing Retrieval-Augmented Generation (RAG) systems across multiple industries. The case study covers key challenges in RAG implementation and provides detailed solutions for improving retrieval accuracy, managing context, and ensuring response reliability. Solutions include hybrid search techniques, metadata filtering, query rewriting, and advanced prompting strategies to reduce hallucinations.

Panel Discussion: Real-World LLM Production Use Cases

Various

A panel discussion featuring multiple companies and consultants sharing their experiences with LLMs in production. Key highlights include Resides using LLMs to improve property management customer service (achieving 95-99% question answering rates), applications in sales optimization with 30% improvement in sales through argument analysis, and insights on structured outputs and validation for executive coaching use cases.

Practical Lessons from Deploying LLMs in Production at Scale

Mercado Libre

Mercado Libre explored multiple production applications of Large Language Models across their e-commerce and technology platform, tackling challenges in knowledge retrieval, documentation generation, and natural language processing. The company implemented a RAG system for developer documentation using Llama Index, automated documentation generation for thousands of database tables, and built natural language input interpretation systems using function calling. Through iterative development, they learned critical lessons about the importance of underlying data quality, prompt engineering iteration, quality assurance for generated outputs, and the necessity of simplifying tasks for LLMs through proper data preprocessing and structured output formats.

Production AI Agents for Insurance Policy Management with Amazon Bedrock

CDL

CDL, a UK-based insurtech company, has developed a comprehensive AI agent system using Amazon Bedrock to handle insurance policy management tasks in production. The solution includes a supervisor agent architecture that routes customer intents to specialized domain agents, enabling customers to manage their insurance policies through conversational AI interfaces available 24/7. The implementation addresses critical production concerns through rigorous model evaluation processes, guardrails for safety, and comprehensive monitoring, while preparing their APIs to be AI-ready for future digital assistant integrations.

Production AI Deployment: Lessons from Real-World Agentic AI Systems

Databricks / Various

This case study presents lessons learned from deploying generative AI applications in production, with a specific focus on Flo Health's implementation of a women's health chatbot on the Databricks platform. The presentation addresses common failure points in GenAI projects including poor constraint definition, over-reliance on LLM autonomy, and insufficient engineering discipline. The solution emphasizes deterministic system architecture over autonomous agents, comprehensive observability and tracing, rigorous evaluation frameworks using LLM judges, and proper DevOps practices. Results demonstrate that successful production deployments require treating agentic AI as modular system architectures following established software engineering principles rather than monolithic applications, with particular emphasis on cost tracking, quality monitoring, and end-to-end deployment pipelines.

Production RAG Best Practices: Implementation Lessons at Scale

Kapa.ai

Based on experience with over 100 technical teams including Docker, CircleCI, and Reddit, this case study examines key challenges and solutions in implementing production-grade RAG systems. The analysis covers critical aspects from data curation and refresh pipelines to evaluation frameworks and security practices, highlighting how most RAG implementations fail at the POC stage while providing concrete guidance for successful production deployments.

Production-Scale RAG System for Real-Time News Processing and Analysis

Emergent Methods

Emergent Methods built a production-scale RAG system processing over 1 million news articles daily, using a microservices architecture to deliver real-time news analysis and context engineering. The system combines multiple open-source tools including Quadrant for vector search, VLM for GPU optimization, and their own Flow.app for orchestration, addressing challenges in news freshness, multilingual processing, and hallucination prevention while maintaining low latency and high availability.

RAG-based Chatbot for Utility Operations and Customer Service

Xcel Energy

Xcel Energy implemented a RAG-based chatbot system to streamline operations including rate case reviews, legal contract analysis, and earnings call report processing. Using Databricks' Data Intelligence Platform, they developed a production-grade GenAI system incorporating Vector Search, MLflow, and Foundation Model APIs. The solution reduced rate case review times from 6 months to 2 weeks while maintaining strict security and governance requirements for sensitive utility data.

Real-Time AI Chief of Staff for Product Teams

Earmark

Earmark built a productivity suite for product teams that transforms meeting conversations into finished work in real-time, addressing the problem of endless context-switching and manual follow-up work that plagues modern product development. Founded by Mark Barb and Sandon, who both came from the product management SaaS space, Earmark uses live transcription and multiple parallel AI agents to generate product specs, tickets, summaries, and other artifacts during meetings rather than after them. The company pivoted from an Apple Vision Pro communication training tool to a web-based real-time meeting assistant after discovering through 60 customer interviews that few people actually prepare for presentations. With 78% of survey respondents saying they'd be "super bummed" if the product disappeared, Earmark has achieved strong product-market fit by focusing specifically on product managers, engineering leaders, and adjacent roles who spend most of their time in back-to-back meetings with different audiences and deliverables.

Responsible LLM Adoption for Fraud Detection with RAG Architecture

Mastercard

Mastercard successfully implemented LLMs in their fraud detection systems, achieving up to 300% improvement in detection rates. They approached this by focusing on responsible AI adoption, implementing RAG (Retrieval Augmented Generation) architecture to handle their large amounts of unstructured data, and carefully considering access controls and security measures. The case study demonstrates how enterprise-scale LLM deployment requires careful consideration of technical debt, infrastructure scaling, and responsible AI principles.

Revenue Intelligence Platform with Ambient AI Agents

Tabs

Tabs, a vertical AI company in the finance space, has built a revenue intelligence platform for B2B companies that uses ambient AI agents to automate financial workflows. The company extracts information from sales contracts to create a "commercial graph" and deploys AI agents that work autonomously in the background to handle billing, collections, and reporting tasks. Their approach moves beyond traditional guided AI experiences toward fully ambient agents that monitor communications and trigger actions automatically, with the goal of creating "beautiful operational software that no one ever has to go into."

Scaling AI Applications with LLMs: Dynamic Context Injection and Few-Shot Learning for Order Processing

Choco

Choco built a comprehensive AI system to automate food supply chain order processing, addressing challenges with diverse order formats across text messages, PDFs, and voicemails. The company developed a production LLM system using few-shot learning with dynamically retrieved examples, semantic embedding-based retrieval, and context injection techniques to improve information extraction accuracy. Their approach prioritized prompt-based improvements over fine-tuning, enabling faster iteration and model flexibility while building towards more autonomous AI systems through continuous learning from human annotations.

Scaling AI Development with DGX Cloud: ServiceNow and SLB Production Deployments

Nvidia

ServiceNow and SLB (formerly Schlumberger) leveraged Nvidia DGX Cloud on AWS to develop and deploy foundation models for their respective industries. ServiceNow focused on building efficient small language models (5B-15B parameters) for enterprise process automation and agentic systems that match frontier model performance at a fraction of the cost and size, achieving nearly 100% GPU utilization through Run AI orchestration. SLB developed domain-specific multi-modal foundation models for seismic and petrophysical data to assist geoscientists and engineers in the energy sector, accelerating time-to-market for two major product releases over two years. Both organizations benefited from the fully optimized, turnkey infrastructure stack combining high-performance GPUs, networking, Lustre storage, EKS optimization, and enterprise-grade support, enabling them to focus on model development rather than infrastructure management while achieving zero or near-zero downtime.

Scaling AI Product Development with Rigorous Evaluation and Observability

Notion

Notion AI, serving over 100 million users with multiple AI features including meeting notes, enterprise search, and deep research tools, demonstrates how rigorous evaluation and observability practices are essential for scaling AI product development. The company uses Brain Trust as their evaluation platform to manage the complexity of supporting multilingual workspaces, rapid model switching, and maintaining product polish while building at the speed of AI industry innovation. Their approach emphasizes that 90% of AI development time should be spent on evaluation and observability rather than prompting, with specialized data specialists creating targeted datasets and custom LLM-as-a-judge scoring functions to ensure consistent quality across their diverse AI product suite.

Scaling AI Systems for Unstructured Data Processing: Logical Data Models and Embedding Optimization

CoActive AI

CoActive AI addresses the challenge of processing unstructured data at scale through AI systems. They identified two key lessons: the importance of logical data models in bridging the gap between data storage and AI processing, and the strategic use of embeddings for cost-effective AI operations. Their solution involves creating data+AI hybrid teams to resolve impedance mismatches and optimizing embedding computations to reduce redundant processing, ultimately enabling more efficient and scalable AI operations.

Scaling AI-Powered File Understanding with Efficient Embedding and LLM Architecture

Dropbox

Dropbox implemented AI-powered file understanding capabilities for previews on the web, enabling summarization and Q&A features across multiple file types. They built a scalable architecture using their Riviera framework for text extraction and embeddings, implemented k-means clustering for efficient summarization, and developed an intelligent chunk selection system for Q&A. The system achieved significant improvements with a 93% reduction in cost-per-summary, 64% reduction in cost-per-query, and latency improvements from 115s to 4s for summaries and 25s to 5s for queries.

Scaling Data Infrastructure for AI Features and RAG

Notion

Notion faced challenges with rapidly growing data volume (10x in 3 years) and needed to support new AI features. They built a scalable data lake infrastructure using Apache Hudi, Kafka, Debezium CDC, and Spark to handle their update-heavy workload, reducing costs by over a million dollars and improving data freshness from days to minutes/hours. This infrastructure became crucial for successfully rolling out Notion AI features and their Search and AI Embedding RAG infrastructure.

Scaling Enterprise RAG with Advanced Vector Search Migration

Danswer

Danswer, an enterprise search solution, migrated their core search infrastructure to Vespa to overcome limitations in their previous vector database setup. The migration enabled them to better handle team-specific terminology, implement custom boost and decay functions, and support multiple vector embeddings per document while maintaining performance at scale. The solution improved search accuracy and resource efficiency for their RAG-based enterprise search product.

Scaling Generative AI for Manufacturing Operations with RAG and Multi-Model Architecture

Georgia-Pacific

Georgia-Pacific, a forest products manufacturing company with 30,000+ employees and 140+ facilities, deployed generative AI to address critical knowledge transfer challenges as experienced workers retire and new employees struggle with complex equipment. The company developed an "Operator Assistant" chatbot using AWS Bedrock, RAG architecture, and vector databases to provide real-time troubleshooting guidance to factory operators. Starting with a 6-8 week MVP deployment in December 2023, they scaled to 45 use cases across multiple facilities within 7-8 months, serving 500+ users daily with improved operational efficiency and reduced waste.

Scaling Local News Coverage with AI-Powered Newsletter Generation

Patch

Patch transformed its local news coverage by implementing AI-powered newsletter generation, enabling them to expand from 1,100 to 30,000 communities while maintaining quality and trust. The system combines curated local data sources, weather information, event calendars, and social media content, processed through AI to create relevant, community-specific newsletters. This approach resulted in over 400,000 new subscribers and a 93.6% satisfaction rating, while keeping costs manageable and maintaining editorial standards.

Scaling Meta AI's Feed Deep Dive from Launch to Product-Market Fit

Meta

Meta launched Feed Deep Dive as an AI-powered feature on Facebook in April 2024 to address information-seeking and context enrichment needs when users encounter posts they want to learn more about. The challenge was scaling from launch to product-market fit while maintaining high-quality responses at Meta scale, dealing with LLM hallucinations and refusals, and providing more value than users would get from simply scrolling Facebook Feed. Meta's solution involved evolving from traditional orchestration to agentic models with planning, tool calling, and reflection capabilities; implementing auto-judges for online quality evaluation; using smart caching strategies focused on high-traffic posts; and leveraging ML-based user cohort targeting to show the feature to users who derived the most value. The results included achieving product-market fit through improved quality and engagement, with the team now moving toward monetization and expanded use cases.

Scaling Multimedia Search with Metadata-First Indexing and On-Demand Preview Generation

Dropbox

Dropbox Dash faced the challenge of enabling fast, accurate search across multimedia content (images, videos, audio) that typically lacks meaningful metadata and requires significantly more compute and storage resources than text documents. The team built a scalable multimedia search solution by implementing metadata-first indexing (extracting lightweight features like file paths, titles, and EXIF data), just-in-time preview generation to minimize upfront costs, location-aware query logic with reverse geocoding, and intelligent caching strategies. This infrastructure leveraged Dropbox's existing Riviera compute framework and preview services, enabling parallel processing and reducing latency while balancing cost with user value. The result is a system that makes visual content as searchable as text documents within the Dash universal search product.

Scaling Order Processing Automation Using Modular LLM Architecture

Choco

Choco developed an AI system to automate the order intake process for food and beverage distributors, handling unstructured orders from various channels (email, voicemail, SMS, WhatsApp). By implementing a modular LLM architecture with specialized components for transcription, information extraction, and product matching, along with comprehensive evaluation pipelines and human feedback loops, they achieved over 95% prediction accuracy. One customer reported 60% reduction in manual order entry time and 50% increase in daily order processing capacity without additional staffing.

Semantic Caching for E-commerce Search Optimization

Walmart

Walmart implemented semantic caching to enhance their e-commerce search functionality, moving beyond traditional exact-match caching to understand query intent and meaning. The system achieved unexpectedly high cache hit rates of around 50% for tail queries (compared to anticipated 10-20%), while handling the challenges of latency and cost optimization in a production environment. The solution enables more relevant product recommendations and improves the overall customer search experience.

Semantic Data Processing at Scale with AI-Powered Query Optimization

DocETL

Shreyaa Shankar presents DocETL, an open-source system for semantic data processing that addresses the challenges of running LLM-powered operators at scale over unstructured data. The system tackles two major problems: how to make semantic operator pipelines scalable and cost-effective through novel query optimization techniques, and how to make them steerable through specialized user interfaces. DocETL introduces rewrite directives that decompose complex tasks and data to improve accuracy and reduce costs, achieving up to 86% cost reduction while maintaining target accuracy. The companion tool Doc Wrangler provides an interactive interface for iteratively authoring and debugging these pipelines. Real-world applications include public defenders analyzing court transcripts for racial bias and medical analysts extracting information from doctor-patient conversations, demonstrating significant accuracy improvements (2x in some cases) compared to baseline approaches.

Semantic Search for Aviation Safety Reports Using Embeddings and Hybrid Search

Beams

Beams, a startup operating in aviation safety, built a semantic search system to help airlines analyze thousands of safety reports written daily by pilots and ground crew. The problem they addressed was the manual, time-consuming process of reading through unstructured, technical, jargon-filled free-text reports to identify trends and manage risks. Their solution combined vector embeddings (using Azure OpenAI's text-embedding-3-large model) with PostgreSQL and PG Vector for similarity search, alongside a two-stage retrieval and reranking pipeline. They also integrated structured filtering with semantic search to create a hybrid search system. The system was deployed on AWS using Lambda functions, RDS with PostgreSQL, and SQS for event-driven orchestration. Results showed that users could quickly search through hundreds of thousands of reports using natural language queries, finding semantically similar incidents even when terminology varied, significantly improving efficiency in safety analysis workflows.

Source-Grounded LLM Assistant with Multi-Modal Output Capabilities

Google / NotebookLLM

Google's NotebookLM tackles the challenge of making large language models more focused and personalized by introducing source grounding - allowing users to upload their own documents to create a specialized AI assistant. The system combines Gemini 1.5 Pro with sophisticated audio generation to create human-like podcast-style conversations about user content, complete with natural speech patterns and disfluencies. The solution includes built-in safety features, privacy protections through transient context windows, and content watermarking, while enabling users to generate insights from personal documents without contributing to model training data.

Strategic LLM Implementation in Chemical Manufacturing with Focus on Documentation and Virtual Agents

Chevron Philips Chemical

Chevron Phillips Chemical is implementing generative AI with a focus on virtual agents and document processing, taking a measured approach to deployment. They formed a cross-functional team including legal, IT security, and data science to educate leadership and identify appropriate use cases. The company is particularly focusing on processing unstructured documents and creating virtual agents for specific topics, while carefully considering bias, testing challenges, and governance in their implementation strategy.

Supervised Fine-Tuning for AI-Powered Travel Recommendations

Booking.com

Booking.com built an AI Trip Planner to handle unstructured, natural language queries from travelers seeking personalized recommendations. The challenge was combining LLMs' ability to understand conversational requests with years of structured behavioral data (searches, clicks, bookings). Instead of relying solely on prompt engineering with external APIs, they used supervised fine-tuning on open-source LLMs with parameter-efficient methods. This approach delivered superior recommendation metrics while achieving 3x faster inference compared to prompt-based solutions, while maintaining data privacy and security by keeping all processing internal.

Systematic Approach to Building Reliable LLM Data Processing Pipelines Through Iterative Development

DocETL

UC Berkeley researchers studied how organizations struggle with building reliable LLM pipelines for unstructured data processing, identifying two critical gaps: data understanding and intent specification. They developed DocETL, a research framework that helps users systematically iterate on LLM pipelines by first understanding failure modes in their data, then clarifying prompt specifications, and finally applying accuracy optimization strategies, moving beyond the common advice of simply "iterate on your prompts."

Two-Stage Fine-Tuning of Language Models for Hyperlocal Food Search

Swiggy

Swiggy, a major food delivery platform in India, implemented a novel two-stage fine-tuning approach for language models to improve search relevance in their hyperlocal food delivery service. They first performed unsupervised fine-tuning using historical search queries and order data, followed by supervised fine-tuning with manually curated query-item pairs. The solution leverages TSDAE and Multiple Negatives Ranking Loss approaches, achieving superior search relevance metrics compared to baseline models while meeting strict latency requirements of 100ms.

Unified Property Management Search and Digital Assistant Using Amazon Bedrock

CBRE

CBRE, the world's largest commercial real estate services firm, faced challenges with fragmented property data scattered across 10 distinct sources and four separate databases, forcing property management professionals to manually search through millions of documents and switch between multiple systems. To address this, CBRE partnered with AWS to build a next-generation unified search and digital assistant experience within their PULSE system using Amazon Bedrock, Amazon OpenSearch Service, and other AWS services. The solution combines retrieval augmented generation (RAG), multiple foundation models (Amazon Nova Pro for SQL generation and Claude Haiku for document interaction), and advanced prompt engineering to provide natural language query capabilities across both structured and unstructured data. The implementation achieved significant results including a 67% reduction in SQL query generation time (from 12 seconds to 4 seconds with Amazon Nova Pro), 80% improvement in database query performance, 60% reduction in token usage through optimized prompt architecture, and 95% accuracy in search results, ultimately enhancing operational efficiency and enabling property managers to make faster, more informed decisions.

Vector Search and RAG Implementation for Enhanced User Search Experience

Couchbase

This case study explores how vector search and RAG (Retrieval Augmented Generation) are being implemented to improve search experiences across different applications. The presentation covers two specific implementations: Revolut's Sherlock fraud detection system using vector search to identify dissimilar transactions, saving customers over $3 million in one year, and Seen.it's video clip search system enabling natural language search across half a million video clips for marketing campaigns.

Video Content Summarization and Metadata Enrichment for Streaming Platform

Paramount+

Paramount+ partnered with Google Cloud Consulting to develop two key AI use cases: video summarization and metadata extraction for their streaming platform containing over 50,000 videos. The project used Gen AI jumpstarts to prototype solutions, implementing prompt chaining, embedding generation, and fine-tuning approaches. The system was designed to enhance content discoverability and personalization while reducing manual labor and third-party costs. The implementation included a three-component architecture handling transcription creation, content generation, and personalization integration.