ZenML

Industry: Healthcare

95 tools in this industry

← Back to LLMOps Database

Accelerating Drug Development with AI-Powered Clinical Trial Transformation

Novartis

Novartis partnered with AWS Professional Services and Accenture to modernize their drug development infrastructure and integrate AI across clinical trials with the ambitious goal of reducing trial development cycles by at least six months. The initiative involved building a next-generation GXP-compliant data platform on AWS that consolidates fragmented data from multiple domains, implements data mesh architecture with self-service capabilities, and enables AI use cases including protocol generation and an intelligent decision system (digital twin). Early results from the patient safety domain showed 72% query speed improvements, 60% storage cost reduction, and 160+ hours of manual work eliminated. The protocol generation use case achieved 83-87% acceleration in producing compliant protocols, demonstrating significant progress toward their goal of bringing life-saving medicines to patients faster.

Advancing Patient Experience and Business Operations Analytics with Generative AI in Healthcare

Huron

Huron Consulting Group implemented generative AI solutions to transform healthcare analytics across patient experience and business operations. The consulting firm faced challenges with analyzing unstructured data from patient rounding sessions and revenue cycle management notes, which previously required manual review and resulted in delayed interventions due to the 3-4 month lag in traditional HCAHPS survey feedback. Using AWS services including Amazon Bedrock with the Nova LLM model, Redshift, and S3, Huron built sentiment analysis capabilities that automatically process survey responses, staff interactions, and financial operation notes. The solution achieved 90% accuracy in sentiment classification (up from 75% initially) and now processes over 10,000 notes per week automatically, enabling real-time identification of patient dissatisfaction, revenue opportunities, and staff coaching needs that directly impact hospital funding and operational efficiency.

Agentic AI Platform for Clinical Development and Commercial Operations in Pharmaceutical Drug Development

AstraZeneca

AstraZeneca partnered with AWS to deploy agentic AI systems across their clinical development and commercial operations to accelerate their goal of delivering 20 new medicines by 2030. The company built two major production systems: a Development Assistant serving over 1,000 users across 21 countries that integrates 16 data products with 9 agents to enable natural language queries across clinical trials, regulatory submissions, patient safety, and quality domains; and an AZ Brain commercial platform that uses 500+ AI models and agents to provide precision insights for patient identification, HCP engagement, and content generation. The implementation reduced time-to-market for various workflows from months to weeks, with field teams using the commercial assistant generating 2x more prescriptions, and reimbursement dossier authoring timelines dramatically shortened through automated agent workflows.

AI-Driven Clinical Trial Transformation with Next-Generation Data Platform

Novartis

Novartis embarked on a comprehensive data and AI modernization journey to accelerate drug development by at least 6 months per clinical trial. The company partnered with AWS Professional Services and Accenture to build a next-generation, GXP-compliant data platform that integrates fragmented data across multiple domains (including patient safety, medical imaging, and regulatory data), enabling both operational AI use cases and ambitious moonshot projects like a digital twin for clinical trial simulation. The initial implementation with the patient safety domain achieved significant results: 16 data pipelines processing 17 terabytes of data, 72% faster query speeds, 60% storage cost reduction, and over 160 hours of manual work eliminated, while protocol generation use cases demonstrated 83-87% acceleration in generating compliance-acceptable protocols.

AI-Powered Call Center Agents for Healthcare Operations

HeyRevia

HeyRevia has developed an AI call center solution specifically for healthcare operations, where over 30% of operations run on phone calls. Their system uses AI agents to handle complex healthcare-related calls, including insurance verifications, prior authorizations, and claims processing. The solution incorporates real-time audio processing, context understanding, and sophisticated planning capabilities to achieve performance that reportedly exceeds human operators while maintaining compliance with healthcare regulations.

AI-Powered Clinical Decision Support Platform for Healthcare Providers

Healio

Healio, a medical information platform serving healthcare providers across 20+ specialties for 125 years, developed Healio AI to address the challenge of physicians experiencing information overload while working under extreme time pressure. The solution uses a RAG-based system that combines Healio's proprietary clinical content with trusted sources like PubMed journals to provide physicians with accurate, contextual, and trustworthy answers at point of care. Through extensive user testing with over 300 healthcare professionals, the team discovered physicians primarily used the tool to prepare for patient interactions and improve patient communication rather than just diagnostic queries. The product launched successfully with predominantly positive feedback, featuring HIPAA compliance, citation transparency, and contextual advertising for monetization.

AI-Powered Clinical Documentation and Data Infrastructure for Point-of-Care Transformation

Veradigm

Veradigm, a healthcare IT company, partnered with AWS to integrate generative AI into their Practice Fusion electronic health record (EHR) system to address clinician burnout caused by excessive documentation tasks. The solution leverages AWS HealthScribe for autonomous AI scribing that generates clinical notes from patient-clinician conversations, and AWS HealthLake as a FHIR-based data foundation to provide patient context at scale. The implementation resulted in clinicians saving approximately 2 hours per day on charting, 65% of users requiring no training to adopt the technology, and high satisfaction with note quality. The system processes 60 million patient visits annually and enables ambient documentation that allows clinicians to focus on patient care rather than typing, with a clear path toward zero-edit note generation.

AI-Powered Clinical Documentation with Multi-Region Healthcare Compliance

Heidi Health

Heidi Health developed an ambient AI scribe to reduce the administrative burden on healthcare clinicians by automatically generating clinical notes from patient consultations. The company faced significant LLMOps challenges including building confidence in non-deterministic AI outputs through "clinicians in the loop" evaluation processes, scaling clinical validation beyond small teams using synthetic data generation and LLM-as-judge approaches, and managing global expansion across regions with different data sovereignty requirements, model availability constraints, and regulatory compliance needs. Their solution involved standardizing infrastructure-as-code deployments across AWS regions, using a hybrid approach of Amazon Bedrock for immediate availability and EKS for self-hosted model control, and integrating clinical ambassadors in each region to validate medical accuracy and local practice patterns. The platform now serves over 370,000 clinicians processing 10 million consultations per month globally.

AI-Powered Clinical Outcome Assessment Review Using Generative AI

Clario

Clario, a clinical trials endpoint data provider, developed an AI-powered solution to automate the analysis of Clinical Outcome Assessment (COA) interviews in clinical trials for psychosis, anxiety, and mood disorders. The traditional approach of manually reviewing audio-video recordings was time-consuming, logistically complex, and introduced variability that could compromise trial reliability. Using Amazon Bedrock and other AWS services, Clario built a system that performs speaker diarization, multi-lingual transcription, semantic search, and agentic AI-powered quality review to evaluate interviews against standardized criteria. The solution demonstrates potential for reducing manual review effort by over 90%, providing 100% data coverage versus subset sampling, and decreasing review turnaround time from weeks to hours, while maintaining regulatory compliance and improving data quality for submissions.

AI-Powered Clinical Trial Software Configuration Automation

Clario

Clario, a leading provider of endpoint data solutions for clinical trials, faced significant challenges with their manual software configuration process, which involved extracting data from multiple sources including PDF forms, study databases, and standardized protocols. The manual process was time-consuming, prone to transcription errors, and created version control challenges. To address this, Clario developed the Genie AI Service powered by Amazon Bedrock using Anthropic's Claude 3.7 Sonnet, orchestrated through Amazon ECS. The solution automates data extraction from transmittal forms, centralizes information from multiple sources, provides an interactive review dashboard for validation, and automatically generates Software Configuration Specification documents and XML configurations for their medical imaging software. This has reduced study configuration execution time while improving quality, minimizing transcription errors, and allowing teams to focus on higher-value activities like study design optimization.

AI-Powered Conversational Contact Center for Healthcare Patient Communication

Clarus Care

Clarus Care, a healthcare contact center solutions provider serving over 16,000 users and handling 15 million patient calls annually, partnered with AWS Generative AI Innovation Center to transform their traditional menu-driven IVR system into a generative AI-powered conversational contact center. The solution uses Amazon Connect, Amazon Lex, and Amazon Bedrock (with Claude 3.5 Sonnet and Amazon Nova models) to enable natural language interactions that can handle multiple patient intents in a single conversation—such as appointment scheduling, prescription refills, and billing inquiries. The system achieves sub-3-second latency requirements, maintains 99.99% availability SLA, supports both voice and web chat interfaces, and includes smart transfer capabilities for urgent cases. The architecture leverages multi-model selection through Bedrock to optimize for specific tasks based on accuracy and latency requirements, with comprehensive analytics pipelines for monitoring system performance and patient interactions.

AI-Powered Customer Service Agent for Healthcare Navigation

Alan

Alan, a healthcare company supporting 1 million members, built AI agents to help members navigate complex healthcare questions and processes. The company transitioned from traditional workflows to playbook-based agent architectures, implementing a multi-agent system with classification and specialized agents (particularly for claims handling) that uses a ReAct loop for tool calling. The solution achieved 30-35% automation of customer service questions with quality comparable to human care experts, with 60% of reimbursements processed in under 5 minutes. Critical to their success was building custom orchestration frameworks and extensive internal tooling that empowered domain experts (customer service operators) to configure, debug, and maintain agents without engineering bottlenecks.

AI-Powered Epilepsy Diagnosis Platform Reducing Diagnostic Time Through Multimodal Data Processing

Australian Epilepsy Project

The Australian Epilepsy Project (AEP) developed a cloud-based precision medicine platform on AWS that integrates multimodal patient data (MRI scans, neuropsychological assessments, genetic data, and medical histories) to support epilepsy diagnosis and treatment planning. The platform leverages various AI/ML techniques including machine learning models for automated brain region analysis, large language models for medical text processing through RAG approaches, and generative AI for patient summaries. This resulted in a 70% reduction in diagnosis time for language area mapping prior to surgery, 10% higher lesion detection rates, and improved patient outcomes including 9% better work productivity and 8% reduction in seizures over two years.

AI-Powered Fax Processing Automation for Healthcare Referrals

Providence

Providence Health System automated the processing of over 40 million annual faxes using GenAI and MLflow on Databricks to transform manual referral workflows into real-time automated triage. The system combines OCR with GPT-4.0 models to extract referral data from diverse document formats and integrates seamlessly with Epic EHR systems, eliminating months-long backlogs and freeing clinical staff to focus on patient care across 1,000+ clinics.

AI-Powered Healthcare: Building Reliable Care Agents in Production

Sword Health

Sword Health, a digital health company specializing in remote physical therapy, developed Phoenix, an AI care agent that provides personalized support to patients during and after rehabilitation sessions while acting as a co-pilot for physical therapists. The company faced challenges deploying LLMs in a highly regulated healthcare environment, requiring robust guardrails, evaluation frameworks, and human oversight. Through iterative development focusing on prompt engineering, RAG for domain knowledge, comprehensive evaluation systems combining human and LLM-based ratings, and continuous data monitoring, Sword Health successfully shipped AI-powered features that improve care accessibility and efficiency while maintaining clinical safety through human-in-the-loop validation for all clinical decisions.

AI-Powered Hormonal Health Platform Built in 8 Weeks

FemmFlo

FemmFlo, a women's health tech startup, developed an LLM-powered platform to address the massive data gap in women's hormonal health, where millions of women wait over seven years for accurate diagnoses. Working with Millio AI and leveraging AWS services, they built a full MVP in just eight weeks that integrates hormonal tracking, lab diagnostics, mental health support, and personalized care recommendations through an AI agent named Gabby. The platform was designed for rapid deployment with beta users, lab integrations, and partnerships, specifically targeting underserved women with culturally relevant, localized healthcare guidance. The solution uses AWS Bedrock agents, API Gateway, DynamoDB, S3, and other managed services to deliver a scalable, cost-effective system that translates complex lab results into actionable health insights while maintaining clinical rigor through a controlled testing environment.

AI-Powered Medical Content Review and Revision at Scale

Flo Health

Flo Health, a leading women's health app, partnered with AWS Generative AI Innovation Center to develop MACROS (Medical Automated Content Review and Revision Optimization Solution), an AI-powered system for verifying and maintaining the accuracy of thousands of medical articles. The solution uses Amazon Bedrock foundation models to automatically review medical content against established guidelines, identify outdated or inaccurate information, and propose evidence-based revisions while maintaining Flo's editorial style. The proof of concept achieved 80% accuracy and over 90% recall in identifying content requiring updates, significantly reduced processing time from hours to minutes per guideline, and demonstrated more consistent application of medical guidelines compared to manual reviews while reducing the workload on medical experts.

AI-Powered Neurosurgery: From Brain Tumor Classification to Surgical Planning

Cedars Sinai

Cedars Sinai and various academic institutions have implemented AI and machine learning solutions to improve neurosurgical outcomes across multiple areas. The applications include brain tumor classification using CNNs achieving 95% accuracy (surpassing traditional radiologists), hematoma prediction and management using graph neural networks with 80%+ accuracy, and AI-assisted surgical planning and intraoperative guidance. The implementations demonstrate significant improvements in patient outcomes while highlighting the importance of balanced innovation with appropriate regulatory oversight.

AI-Powered Nutrition Guidance with Fine-Tuned Llama Models

Omada Health

Omada Health, a virtual healthcare provider, developed OmadaSpark, an AI-powered nutrition education feature that provides real-time motivational interviewing and personalized nutritional guidance to members in their chronic condition management programs. The solution uses a fine-tuned Llama 3.1 8B model deployed on Amazon SageMaker AI, trained on 1,000 question-answer pairs derived from internal care protocols and peer-reviewed medical literature. The implementation was completed in 4.5 months and resulted in members who used the tool being three times more likely to return to the Omada app, while reducing response times from days to seconds. The solution maintains strict HIPAA compliance and includes human-in-the-loop review by registered dietitians for quality assurance.

AI-Powered Personal Health Coach Using Gemini Models

Fitbit

Fitbit developed an AI-powered personal health coach to address the fragmented and generic nature of traditional health and fitness guidance. Using Gemini models within a multi-agent framework, the system provides proactive, personalized, and adaptive coaching grounded in behavioral science and individual health metrics such as sleep and activity data. The solution employs a conversational agent for orchestration, a data science agent for numerical reasoning on physiological time series, and domain expert agents for specialized guidance. The system underwent extensive validation through the SHARP evaluation framework, involving over 1 million human annotations and 100k hours of expert evaluation across multiple health disciplines. The health coach entered public preview for eligible US-based Fitbit Premium users, providing personalized insights, goal setting, and adaptive plans to build sustainable health habits.

AI-Powered Sleep Coach for CBTI Protocol Delivery

Rest

Rest, a company that evolved from developing a podcast player app, built an AI sleep coach to help people solve chronic sleep problems through an 8-week protocol based on Cognitive Behavioral Therapy for Insomnia (CBTI). The problem they identified was that while CBTI is clinically proven to be effective for 80% of people with insomnia, it typically costs thousands of dollars, requires specialized practitioners who have year-long waitlists, and isn't accessible to most people. Rest's solution uses voice-first AI agents powered by OpenAI's GPT-4 and integrated with Vapi for voice capabilities, creating daily check-ins where the AI coaches users through the CBTI protocol with personalized guidance based on their sleep logs, behavioral patterns, and personal context stored in a custom memory system. The product evolved iteratively from a text-based chatbot to a sophisticated voice agent with RAG for knowledge retrieval, dynamic agenda generation tailored to each user's program stage and recent sleep data, and multi-layered memory systems that track user context over time. The company now logs hundreds of hours of voice conversations monthly with users preferring voice interactions for the intimacy and ease it provides in discussing sleep challenges.

AI-Powered Social Intelligence for Life Sciences

Indegene

Indegene developed an AI-powered social intelligence solution to help pharmaceutical companies extract insights from digital healthcare conversations on social media. The solution addresses the challenge that 52% of healthcare professionals now prefer receiving medical content through social channels, while the life sciences industry struggles with analyzing complex medical discussions at scale. Using Amazon Bedrock, SageMaker, and other AWS services, the platform provides healthcare-focused analytics including HCP identification, sentiment analysis, brand monitoring, and adverse event detection. The layered architecture delivers measurable improvements in time-to-insight generation and operational cost savings while maintaining regulatory compliance.

AI-Powered Text Message-Based Healthcare Treatment Management System

Stride

Stride developed an AI-powered text message-based healthcare treatment management system for Aila Science to assist patients through self-administered telemedicine regimens, particularly for early pregnancy loss treatment. The system replaced manual human operators with LLM-powered agents that can interpret patient responses, provide medically-approved guidance, schedule messages, and escalate complex situations to human reviewers. The solution achieved approximately 10x capacity improvement while maintaining treatment quality and safety through a hybrid human-in-the-loop approach.

Automated Clinical Document Generation Platform for Pharmaceutical R&D

AbbVie

AbbVie developed Gaia, a generative AI platform to automate the creation of clinical and regulatory documents in their R&D organization. The platform addresses the challenge of producing hundreds of complex, regulated documents required throughout the clinical trial lifecycle, from study startup through regulatory submissions. By the end of 2024, Gaia automated 26 document types, saving 20,000 hours annually, with plans to scale to over 350 document types by 2030, targeting 115,000+ hours in annual savings. The platform uses a modular "Lego block" approach with reusable components, integrates with over 90 data sources, employs AWS Bedrock for LLM access, and implements human-in-the-loop workflows to maintain quality standards while being "GXP-ready" for future validation in life sciences regulatory environments.

Automated HCC Code Extraction from Clinical Notes Using Healthcare NLP

WVU Medicine

WVU Medicine implemented an automated system for extracting Hierarchical Condition Category (HCC) codes from clinical notes using John Snow Labs' Healthcare NLP models. The system processes radiology notes for upcoming patient appointments, extracts relevant diagnoses, converts them to CPT codes, and then maps them to HCC codes. The solution went live in December 2023 and has processed over 27,000 HCC codes with an 18.4% acceptance rate by providers, positively impacting over 5,000 patients.

Automated Medical Literature Review System Using Domain-Specific LLMs

John Snow Labs

John Snow Labs developed a medical chatbot system that automates the traditionally time-consuming process of medical literature review. The solution combines proprietary medical-domain-tuned LLMs with a comprehensive medical research knowledge base, enabling researchers to analyze hundreds of papers in minutes instead of weeks or months. The system includes features for custom knowledge base integration, intelligent data extraction, and automated filtering based on user-defined criteria, while maintaining explainability and citation tracking.

Automating Enterprise Workflows with Foundation Models in Healthcare

Various

The researchers present aCLAr (Demonstrate, Execute, Validate framework), a system that uses multimodal foundation models to automate enterprise workflows, particularly in healthcare settings. The system addresses limitations of traditional RPA by enabling passive learning from demonstrations, human-like UI navigation, and self-monitoring capabilities. They successfully demonstrated the system automating a real healthcare workflow in Epic EHR, showing how foundation models can be leveraged for complex enterprise automation without requiring API integration.

Automating Healthcare Documentation and Rule Management with GenAI

Orizon

Orizon, a healthcare tech company, faced challenges with manual code documentation and rule interpretation for their medical billing fraud detection system. They implemented a GenAI solution using Databricks' platform to automate code documentation and rule interpretation, resulting in 63% of tasks being automated and reducing documentation time to under 5 minutes. The solution included fine-tuned Llama2-code and DBRX models deployed through Mosaic AI Model Serving, with strict governance and security measures for protecting sensitive healthcare data.

Automating Healthcare Procedure Code Selection Through Domain-Specific LLM Platform

Hasura / PromptQL

A large public healthcare company specializing in radiology software deployed an AI-powered automation solution to streamline the complex process of procedure code selection during patient appointment scheduling. The traditional manual process took 12-15 minutes per call, requiring operators to navigate complex UIs and select from hundreds of procedure codes that varied by clinic, regulations, and patient circumstances. Using PromptQL's domain-specific LLM platform, non-technical healthcare administrators can now write automation logic in natural language that gets converted into executable code, reducing call times and potentially delivering $50-100 million in business impact through increased efficiency and reduced training costs.

Automating Radiology Report Generation with Fine-tuned LLMs

Heidelberg University

Researchers at Heidelberg University developed a novel approach to address the growing workload of radiologists by automating the generation of detailed radiology reports from medical images. They implemented a system using Vision Transformers for image analysis combined with a fine-tuned Llama 3 model for report generation. The solution achieved promising results with a training loss of 0.72 and validation loss of 1.36, demonstrating the potential for efficient, high-quality report generation while running on a single GPU through careful optimization techniques.

Best Practices for Implementing LLMs in High-Stakes Applications

Moonhub

The presentation discusses implementing LLMs in high-stakes use cases, particularly in healthcare and therapy contexts. It addresses key challenges including robustness, controllability, bias, and fairness, while providing practical solutions such as human-in-the-loop processes, task decomposition, prompt engineering, and comprehensive evaluation strategies. The speaker emphasizes the importance of careful consideration when implementing LLMs in sensitive applications and provides a framework for assessment and implementation.

Building a Comprehensive LLM Platform for Healthcare Applications

IncludedHealth

IncludedHealth built Wordsmith, a comprehensive platform for GenAI applications in healthcare, starting in early 2023. The platform includes a proxy service for multi-provider LLM access, model serving capabilities, training and evaluation libraries, and prompt engineering tools. This enabled multiple production applications including automated documentation, coverage checking, and clinical documentation, while maintaining security and compliance in a regulated healthcare environment.

Building a Healthcare Copilot for Biology and Life Science Research

Owkin

Owkin, a company focused on drug discovery and AI for healthcare, developed a copilot system in four months to help biology and life science researchers navigate complex healthcare data and answer scientific questions. The system addresses challenges unique to healthcare including strict regulations, semantic complexity, and data sensitivity by implementing two main tools: a text-to-SQL system that queries structured biological databases (using natural language to SQL translation with Polars), and a RAG-based literature search tool that retrieves relevant information from PubMed's 26 million abstracts. The copilot was deployed for academic researchers with monitoring via LangFuse and OpenTelemetry, though the team faced challenges with evaluation in a domain where questions rarely have binary answers, and noted that frameworks and models change rapidly in the LLM space.

Building a Multi-Agent Healthcare Analytics Assistant with LLM-Powered Natural Language Queries

Komodo Health

Komodo Health, a company with a large database of anonymized American patient medical events, developed an AI assistant over two years to answer complex healthcare analytics queries through natural language. The system evolved from a simple chaining architecture with fine-tuned models to a sophisticated multi-agent system using a supervisor pattern, where an intelligent agent-based supervisor routes queries to either deterministic workflows or sub-agents as needed. The architecture prioritizes trust by ensuring raw database outputs are presented directly to users rather than LLM-generated content, with LLMs primarily handling natural language to structured query conversion and explanations. The production system balances autonomous AI capabilities with control, avoiding the cost and latency issues of pure agentic approaches while maintaining flexibility for unexpected user queries.

Building an Agentic AI System for Healthcare Support Using LangGraph

Doctolib

Doctolib developed an agentic AI system called Alfred to handle customer support requests for their healthcare platform. The system uses multiple specialized AI agents powered by LLMs, working together in a directed graph structure using LangGraph. The initial implementation focused on managing calendar access rights, combining RAG for knowledge base integration with careful security measures and human-in-the-loop confirmation for sensitive actions. The system was designed to maintain high customer satisfaction while managing support costs efficiently.

Building an On-Premise Health Insurance Appeals Generation System

HealthInsuranceLLM

Development of an LLM-based system to help generate health insurance appeals, deployed on-premise with limited resources. The system uses fine-tuned models trained on publicly available medical review board data to generate appeals for insurance claim denials. The implementation includes Kubernetes deployment, GPU inference, and a Django frontend, all running on personal hardware with multiple internet providers for reliability.

Building and Evaluating a RAG-based Menopause Information Chatbot

Vira Health

Vira Health developed and evaluated an AI chatbot to provide reliable menopause information using peer-reviewed position statements from The Menopause Society. They implemented a RAG (Retrieval Augmented Generation) architecture using GPT-4, with careful attention to clinical safety and accuracy. The system was evaluated using both AI judges and human clinicians across four criteria: faithfulness, relevance, harmfulness, and clinical correctness, showing promising results in terms of safety and effectiveness while maintaining strict adherence to trusted medical sources.

Building and Evaluating Production Voice Agents: From Custom Infrastructure to Platform Solutions

Nomore Engineering

A team explored building a phone agent system for handling doctor appointments in Polish primary care, initially attempting to build their own infrastructure before evaluating existing platforms. They implemented a complex system involving speech-to-text, LLMs, text-to-speech, and conversation orchestration, along with comprehensive testing approaches. After building the complete system, they ultimately decided to use a third-party platform (Vapi.ai) due to the complexities of maintaining their own infrastructure, while gaining valuable insights into voice agent architecture and testing methodologies.

Building Custom AI Review Dashboards for Production LLM Monitoring

Anterior

Anterior developed "Scalpel," a custom review dashboard enabling a small team of clinicians to review over 100,000 medical decisions made by their AI system. The dashboard was built around three core principles: optimizing context surfacing for high-quality reviews, streamlining review flow sequences to minimize time per review, and designing reviews to generate actionable data for AI system improvements. This approach allowed domain experts to efficiently evaluate AI outputs while providing structured feedback that could be directly translated into system enhancements, demonstrating how custom tooling can bridge the gap between production AI performance and iterative improvement processes.

Building Evaluation Systems for AI-Powered Healthcare at Scale

Sword Health

Sword Health developed Phoenix, an AI care specialist that provides clinical support to patients during physical therapy sessions and between appointments. The company addressed the challenge of deploying large language models safely in healthcare by implementing a comprehensive evaluation framework combining offline and online assessments. Their approach includes building diverse evaluation datasets through strategic sampling and synthetic data generation, developing multiple types of evaluators (human-based, code-based, and LLM-as-judge), conducting vibe checks before release, and maintaining continuous monitoring in production through guardrails, A/B testing, manual audits, and automated evaluation of production traces. This eval-driven development process enables iterative improvement, quality assurance, objective model comparison, and cost optimization while ensuring patient safety.

Building Healthcare-Specific LLM Pipelines for Oncology Patient Timelines

Roche Diagnostics / John Snow Labs

Roche Diagnostics developed an AI-assisted data abstraction solution using healthcare-specific LLMs to extract and structure oncology patient timelines from unstructured clinical notes. The system leverages natural language processing and machine learning to automatically detect medical concepts, focusing particularly on chemotherapy treatment timelines. The solution addresses the challenge of processing diverse, unstructured healthcare data formats while maintaining high accuracy through domain-specific LLMs and carefully engineered prompts.

Building Production-Ready Healthcare AI That Scales With Model Progress

Anterior

This case study examines Anterior's experience building LLM-powered products for healthcare prior authorization over three years. The company faced the challenge of building production systems around rapidly evolving AI capabilities, where approaches designed around current model limitations could quickly become obsolete. Through experimentation with techniques like hierarchical query reasoning, finetuning, domain knowledge injection, and expert review systems, they learned which approaches compound with model progress versus those that compete with it. The result was a framework for "Sour Lesson-pilled" product development that emphasizes building systems that benefit from model improvements rather than being made redundant by them, with key surviving techniques including dynamic domain knowledge injection and scalable expert review infrastructure.

Building Reliable LLM Workflows in Biotech Research

Moderna

Moderna Therapeutics applies large language models primarily for document reformatting and regulatory submission preparation within their research organization, deliberately avoiding autonomous agents in favor of highly structured workflows. The team, led by Eric Maher in research data science, focuses on automating what they term "intellectual drudgery" - reformatting laboratory records and experiment documentation into regulatory-compliant formats. Their approach prioritizes reliability over novelty, implementing rigorous evaluation processes matched to consequence levels, with particular emphasis on navigating the complex security and permission mapping challenges inherent in regulated biotech environments. The team employs a "non-LLM filter" methodology, only reaching for generative AI after exhausting simpler Python or traditional ML approaches, and leverages serverless infrastructure like Modal and reactive notebooks with Marimo to enable rapid experimentation and deployment.

Building Scalable LLM Evaluation Systems for Healthcare Prior Authorization

Anterior

Anterior, a healthcare AI company, developed a scalable evaluation system for their LLM-powered prior authorization decision support tool. They faced the challenge of maintaining accuracy while processing over 100,000 medical decisions daily, where errors could have serious consequences. Their solution combines real-time reference-free evaluation using LLMs as judges with targeted human expert review, achieving an F1 score of 96% while keeping their clinical review team under 10 people, compared to competitors who employ hundreds of nurses.

Clinical-Grade Patient Education Agent with LangGraph and LangSmith

Lubu Labs

Lubu Labs built a production AI agent for a digital health platform that helps patients understand their health test results from camera-based scans measuring 30+ vital signs. The system needed to provide plain-language medical explanations, answer follow-up questions conversationally, and route uncertain cases to clinicians—all while meeting healthcare regulatory requirements. The solution used LangGraph for explicit control flow with confidence-based routing decisions, RAG over a versioned medical knowledge base, and LangSmith for audit-grade observability. Key results included approximately 15% of conversations appropriately triggering human review, an 80% accuracy rate in routing decisions validated by clinicians, a 40% reduction in false positive reviews after threshold tuning, and very low rates of inappropriate clinical advice in production validated through weekly audits.

Cloud-Based Integrated Diagnostics Platform with AI-Assisted Digital Pathology

Philips

Philips partnered with AWS to transform medical imaging and diagnostics by moving their entire healthcare informatics portfolio to the cloud, with particular focus on digital pathology. The challenge was managing petabytes of medical imaging data across multiple modalities (radiology, cardiology, pathology) stored in disparate silos, making it difficult for clinicians to access comprehensive patient information efficiently. Philips leveraged AWS Health Imaging and other cloud services to build a scalable, cloud-native integrated diagnostics platform that reduces workflow time from 11+ hours to 36 minutes in pathology, enables real-time collaboration across geographies, and supports AI-assisted diagnosis. The solution now manages 134 petabytes of data covering 34 million patient exams and 11 billion medical records, with 95 of the top 100 US hospitals using Philips healthcare informatics solutions.

Context-Seeking Conversational AI for Health Information Navigation

Google

Google Research developed a "Wayfinding AI" prototype based on Gemini to address the challenge of people struggling to find relevant, personalized health information online. Through formative user research with 33 participants and iterative design, they created an AI agent that proactively asks clarifying questions to understand user goals and context before providing answers. In a randomized study with 130 participants, the Wayfinding AI was significantly preferred over a baseline Gemini model across multiple dimensions including helpfulness, relevance, goal understanding, and tailoring, demonstrating that a context-seeking, conversational approach creates more empowering health information experiences than traditional question-answering systems.

Cost Reduction Through Fine-tuning: Healthcare Chatbot and E-commerce Product Classification

Airtrain

Two case studies demonstrate significant cost reduction through LLM fine-tuning. A healthcare company reduced costs and improved privacy by fine-tuning Mistral-7B to match GPT-3.5's performance for patient intake, while an e-commerce unicorn improved product categorization accuracy from 47% to 94% using a fine-tuned model, reducing costs by 94% compared to using GPT-4.

Data Quality Assessment and Enhancement Framework for GenAI Applications

QuantumBlack

QuantumBlack developed AI4DQ Unstructured, a comprehensive toolkit for assessing and improving data quality in generative AI applications. The solution addresses common challenges in unstructured data management by providing document clustering, labeling, and de-duplication workflows. In a case study with an international health organization, the system processed 2.5GB of data, identified over ten high-priority data quality issues, removed 100+ irrelevant documents, and preserved critical information in 5% of policy documents that would have otherwise been lost, leading to a 20% increase in RAG pipeline accuracy.

Deploying Agentic AI for Clinical Trial Protocol Deviation Monitoring

Bayezian Limited

Bayezian Limited deployed a multi-agent AI system to monitor protocol deviations in clinical trials, where traditional manual review processes were time-consuming and error-prone. The system used specialized LLM agents, each responsible for checking specific protocol rules (visit timing, medication use, inclusion criteria, etc.), working on top of a pipeline that processed clinical documents and used FAISS for semantic retrieval of protocol requirements. While the system successfully identified patterns early and improved reviewer efficiency by shifting focus from manual checking to intelligent triage, it encountered significant challenges including handover failures between agents, memory lapses causing coordination breakdowns, and difficulties handling real-world data ambiguities like time windows and exceptions. The team improved performance through structured memory snapshots, flexible prompt engineering, stronger handoff signals, and process tracking, ultimately creating a useful but imperfect system that highlighted the gap between agentic AI theory and production reality.

Developing a Multilingual Ayurvedic Medical LLM: Challenges and Learnings

Trigent Software

Trigent Software attempted to develop IRGPT, a fine-tuned LLM for multilingual Ayurvedic medical consultations. The project aimed to combine traditional Ayurvedic medicine with modern AI capabilities, targeting multiple South Indian languages. Despite assembling a substantial dataset and implementing a fine-tuning pipeline using GPT-2 medium, the team faced significant challenges with multilingual data quality and cultural context. While the English-only version showed promise, the full multilingual implementation remains a work in progress.

Domain-Native LLM Application for Healthcare Insurance Administration

Anterior

Anterior, a clinician-led healthcare technology company, developed an AI system called Florence to automate medical necessity reviews for health insurance providers covering 50 million lives in the US. The company addressed the "last mile problem" in LLM applications by building an adaptive domain intelligence engine that enables domain experts to continuously improve model performance through systematic failure analysis, domain knowledge injection, and iterative refinement. Through this approach, they achieved 99% accuracy in care request approvals, moving beyond the 95% baseline achieved through model improvements alone.

Domain-Specific LLMs for Drug Discovery Biomarker Identification

BenchSci

BenchSci developed an AI platform for drug discovery that combines domain-specific LLMs with extensive scientific data processing to assist scientists in understanding disease biology. They implemented a RAG architecture that integrates their structured biomedical knowledge base with Google's Med-PaLM model to identify biomarkers in preclinical research, resulting in a reported 40% increase in productivity and reduction in processing time from months to days.

Enhancing Healthcare Service Delivery with RAG and LLM-Powered Search

Accolade

Accolade, facing challenges with fragmented healthcare data across multiple platforms, implemented a Retrieval Augmented Generation (RAG) solution using Databricks' DBRX model to improve their internal search capabilities and customer service. By consolidating their data in a lakehouse architecture and leveraging LLMs, they enabled their teams to quickly access accurate information and better understand customer commitments, resulting in improved response times and more personalized care delivery.

Enterprise Knowledge Base Assistant Using Multi-Model GenAI Architecture

Accenture

Accenture developed Knowledge Assist, a generative AI solution for a public health sector client to transform how enterprise knowledge is accessed and utilized. The solution combines multiple foundation models through Amazon Bedrock to provide accurate, contextual responses to user queries in multiple languages. Using a hybrid intent approach and RAG architecture, the system achieved over 50% reduction in new hire training time and 40% reduction in query escalations while maintaining high accuracy and compliance requirements.

Enterprise-Scale Deployment of AI Ambient Scribes Across Multiple Healthcare Systems

Memorial Sloan Kettering / McLeod Health / UCLA

This panel discussion features three major healthcare systems—McLeod Health, Memorial Sloan Kettering Cancer Center, and UCLA Health—discussing their experiences deploying generative AI-powered ambient clinical documentation (AI scribes) at scale. The organizations faced challenges in vendor evaluation, clinician adoption, and demonstrating ROI while addressing physician burnout and documentation burden. Through rigorous evaluation processes including randomized controlled trials, head-to-head vendor comparisons, and structured pilots, these systems successfully deployed AI scribes to hundreds to thousands of physicians. Results included significant reductions in burnout (20% at UCLA), improved patient satisfaction scores (5-6% increases at McLeod), time savings of 1.5-2 hours per day, and positive financial ROI through improved coding and RVU capture. Key learnings emphasized the importance of robust training, encounter-based pricing models, workflow integration, and managing expectations that AI scribes are not a universal solution for all specialties and clinicians.

Enterprise-Scale Healthcare LLM System for Unified Patient Journeys

John Snow Labs

John Snow Labs developed a comprehensive healthcare LLM system that integrates multimodal medical data (structured, unstructured, FHIR, and images) into unified patient journeys. The system enables natural language querying across millions of patient records while maintaining data privacy and security. It uses specialized healthcare LLMs for information extraction, reasoning, and query understanding, deployed on-premises via Kubernetes. The solution significantly improves clinical decision support accuracy and enables broader access to patient data analytics while outperforming GPT-4 in medical tasks.

Generative AI for Secondary Manuscript Generation in Life Sciences

Sorcero

Sorcero, a life sciences AI company, addresses the challenge of generating secondary manuscripts (particularly patient-reported outcomes manuscripts) from clinical study reports, a process that traditionally takes months and is costly, inconsistent, and delays patient access to treatments. Their solution uses generative AI to create foundational manuscript drafts within hours from source materials including clinical study reports, statistical analysis plans, and protocols. The system emphasizes trust, traceability, and regulatory compliance through rigorous validation frameworks, industry benchmarks (like CONSORT guidelines), comprehensive audit trails, and human oversight. The approach generates complete manuscripts with proper structure, figures, and tables while ensuring all assertions are traceable to source data, hallucinations are controlled, and industry standards are met.

Generative AI-Powered Intelligent Document Processing for Healthcare Operations

Myriad Genetics

Myriad Genetics, a genetic testing and precision medicine provider, faced challenges processing thousands of healthcare documents daily with their existing Amazon Comprehend and Amazon Textract solution, which cost $15,000 monthly per business unit with 8.5-minute processing times and required manual information extraction involving up to 10 full-time employees. Partnering with AWS Generative AI Innovation Center, they deployed the open-source GenAI IDP Accelerator using Amazon Bedrock with Amazon Nova models, implementing advanced prompt engineering techniques including AI-driven prompt engineering, negative prompting, few-shot learning, and chain-of-thought reasoning. The solution increased classification accuracy from 94% to 98%, reduced classification costs by 77%, decreased processing time by 80% (from 8.5 to 1.5 minutes), and automated key information extraction at 90% accuracy, projected to save $132K annually while reducing prior authorization processing time by 2 minutes per submission.

GPT-4 Visit Notes System

Summer Health

Summer Health successfully deployed GPT-4 to revolutionize pediatric visit note generation, addressing both provider burnout and parent communication challenges. The implementation reduced note-writing time from 10 to 2 minutes per visit (80% reduction) while making medical information more accessible to parents. By carefully considering HIPAA compliance through BAAs and implementing robust clinical review processes, they demonstrated how LLMs can be safely and effectively deployed in healthcare settings. The case study showcases how AI can simultaneously improve healthcare provider efficiency and patient experience, while maintaining high standards of medical accuracy and regulatory compliance.

Healthcare Conversational AI and Multi-Model Cost Management in Production

Amberflo / Interactly.ai

A panel discussion featuring Interactly.ai's development of conversational AI for healthcare appointment management, and Amberflo's approach to usage tracking and cost management for LLM applications. The case study explores how Interactly.ai handles the challenges of deploying LLMs in healthcare settings with privacy and latency constraints, while Amberflo addresses the complexities of monitoring and billing for multi-model LLM applications in production.

Healthcare Data Analytics Democratization with MapAI and LLM Integration

Komodo

Komodo Health developed MapAI, an NLP-powered AI assistant integrated into their MapLab enterprise platform, to democratize healthcare data analytics. The solution enables non-technical users to query complex healthcare data using natural language, transforming weeks-long data analysis processes into instant insights. The system leverages multiple foundation models, LangChain, and LangGraph for deployment, with an API-first approach for seamless integration with their Healthcare Map platform.

Healthcare NLP Pipeline for HIPAA-Compliant Patient Data De-identification

Dandelion Health

Dandelion Health developed a sophisticated de-identification pipeline for processing sensitive patient healthcare data while maintaining HIPAA compliance. The solution combines John Snow Labs' Healthcare NLP with custom pre- and post-processing steps to identify and transform protected health information (PHI) in free-text patient notes. Their approach includes risk categorization by medical specialty, context-aware processing, and innovative "hiding in plain sight" techniques to achieve high-quality de-identification while preserving data utility for medical research.

Healthcare Patient Journey Analysis Platform with Multimodal LLMs

John Snow Labs

John Snow Labs developed a comprehensive healthcare analytics platform that uses specialized medical LLMs to process and analyze patient data across multiple modalities including unstructured text, structured EHR data, FIR resources, and images. The platform enables healthcare professionals to query patient histories and build cohorts using natural language, while handling complex medical terminology mapping and temporal reasoning. The system runs entirely within the customer's infrastructure for security, uses Kubernetes for deployment, and significantly outperforms GPT-4 on medical tasks while maintaining consistency and explainability in production.

Healthcare Search Discovery Using ML and Generative AI on E-commerce Platform

Amazon Health Services

Amazon Health Services faced the challenge of integrating healthcare services into Amazon's e-commerce search experience, where traditional product search algorithms weren't designed to handle complex relationships between symptoms, conditions, treatments, and healthcare services. They developed a comprehensive solution combining machine learning for query understanding, vector search for product matching, and large language models for relevance optimization. The solution uses AWS services including Amazon SageMaker for ML models, Amazon Bedrock for LLM capabilities, and Amazon EMR for data processing, implementing a three-component architecture: query understanding pipeline to classify health searches, LLM-enhanced product knowledge base for semantic search, and hybrid relevance optimization using both human labeling and LLM-based classification. This system now serves daily health-related search queries, helping customers find everything from prescription medications to primary care services through improved discovery pathways.

HIPAA-Compliant LLM-Based Chatbot for Pharmacy Customer Service

Amazon

Amazon Pharmacy developed a HIPAA-compliant LLM-based chatbot to help customer service agents quickly retrieve and provide accurate information to patients. The solution uses a Retrieval Augmented Generation (RAG) pattern implemented with Amazon SageMaker JumpStart foundation models, combining embedding-based search and LLM-based response generation. The system includes agent feedback collection for continuous improvement while maintaining security and compliance requirements.

Human-AI Synergy in Pharmaceutical Research and Document Processing

Merantix

Merantix has implemented AI systems that focus on human-AI collaboration across multiple domains, particularly in pharmaceutical research and document processing. Their approach emphasizes progressive automation where AI systems learn from human input, gradually taking over more tasks while maintaining high accuracy. In pharmaceutical applications, they developed a system for analyzing rodent behavior videos, while in document processing, they created solutions for legal and compliance cases where error tolerance is minimal. The systems demonstrate a shift from using AI as mere tools to creating collaborative AI-human workflows that maintain high accuracy while improving efficiency.

Implementing LLMs for Patient Education and Healthcare Communication

National Healthcare Group

National Healthcare Group addressed the challenge of inconsistent and time-consuming patient education by implementing LLM-powered chatbots integrated into their existing healthcare apps and messaging platforms. The solution provides 24/7 multilingual patient education, focusing on conditions like eczema and medical test preparation, while ensuring privacy and accuracy. The implementation emphasizes integration with existing platforms rather than creating new standalone solutions, combined with careful monitoring and refinement of responses.

Implementing Multi-Agent RAG Architecture for Customer Care Automation

Doctolib

Doctolib evolved their customer care system from basic RAG to a sophisticated multi-agent architecture using LangGraph. The system employs a primary assistant for routing and specialized agents for specific tasks, incorporating safety checks and API integrations. While showing promise in automating customer support tasks like managing calendar access rights, they faced challenges with LLM behavior variance, prompt size limitations, and unstructured data handling, highlighting the importance of robust data structuration and API documentation for production deployment.

Implementing RAG for Enhanced Customer Care at Scale

Doctolib

Doctolib, a European e-health company, implemented a RAG-based system to improve their customer care services. Using GPT-4 hosted on Azure OpenAI, combined with OpenSearch as a vector database and a custom reranking system, they achieved a 20% reduction in customer care cases. The system includes comprehensive evaluation metrics through the Ragas framework, and overcame significant latency challenges to achieve response times under 5 seconds. While successful, they identified limitations with complex queries that led them to explore agentic frameworks as a next step.

Iterative Prompt Optimization and Model Selection for Nutritional Calorie Estimation

Taralli

Taralli, a calorie tracking application, demonstrates systematic LLM improvement through rigorous evaluation and prompt optimization. The developer addressed the challenge of accurate nutritional estimation by creating a 107-example evaluation dataset, testing multiple prompt optimization techniques (vanilla, few-shot bootstrapping, MIPROv2, and GEPA) across several models (Gemini 2.5 Flash, Gemini 3 Flash, and DeepSeek v3.2). Through this methodical approach, they achieved a 15% accuracy improvement by switching from Gemini 2.5 Flash to Gemini 3 Flash while using a few-shot learning approach with 16 examples, reaching 60% accuracy within a 10% calorie prediction threshold. The system was deployed with fallback model configurations and extended to support fully offline on-device inference for iOS.

LLM Applications in Drug Discovery and Call Center Analytics

QuantumBlack

QuantumBlack presented two distinct LLM applications: molecular discovery for pharmaceutical research and call center analytics for banking. The molecular discovery system used chemical language models and RAG to analyze scientific literature and predict molecular properties. The call center analytics solution processed audio files through a pipeline of diarization, transcription, and LLM analysis to extract insights from customer calls, achieving 60x performance improvement through domain-specific optimizations and efficient resource utilization.

LLM-Powered Crisis Counselor Training and Conversation Simulation

Crisis Text Line

Crisis Text Line transformed their mental health support services by implementing LLM-based solutions on the Databricks platform. They developed a conversation simulator using fine-tuned Llama 2 models to train crisis counselors, and created a conversation phase classifier to maintain quality standards. The implementation helped centralize their data infrastructure, enhance volunteer training, and scale their crisis intervention services more effectively, supporting over 1.3 million conversations in the past year.

LLM-Powered Information Extraction from Pediatric Cardiac MRI Reports

UK National Health Service (NHS)

Great Ormond Street Hospital NHS Trust developed a solution to extract information from 15,000 unstructured cardiac MRI reports spanning 10 years. They implemented a hybrid approach using small LLMs for entity extraction and few-shot learning for table structure classification. The system successfully extracted patient identifiers and clinical measurements from heterogeneous reports, enabling linkage with structured data and improving clinical research capabilities. The solution demonstrated significant improvements in extraction accuracy when using contextual prompting with models like FLAN-T5 and RoBERTa, while operating within NHS security constraints.

Medical AI Assistant for Battlefield Care Using LLMs

Johns Hopkins

Johns Hopkins Applied Physics Laboratory (APL) is developing CPG-AI, a conversational AI system using Large Language Models to provide medical guidance to untrained soldiers in battlefield situations. The system interprets clinical practice guidelines and tactical combat casualty care protocols into plain English guidance, leveraging APL's RALF framework for LLM application development. The prototype successfully demonstrates capabilities in condition inference, natural dialogue, and algorithmic care guidance for common battlefield injuries.

Medical Transcript Summarization Using Multiple LLM Models: An Evaluation Study

Oracle

A comparative study evaluating different LLM models (Claude, GPT-4, LLaMA, and Pi 3.1) for medical transcript summarization aimed at reducing administrative burden in healthcare. The study processed over 5,000 medical transcripts, comparing model performance using ROUGE scores and cosine similarity metrics. GPT-4 emerged as the top performer, followed by Pi 3.1, with results showing potential to reduce care coordinator preparation time by over 50%.

Multi-Agent AI Development Assistant for Clinical Trial Data Analysis

AstraZeneca

AstraZeneca developed a "Development Assistant" - an interactive AI agent that enables researchers to query clinical trial data using natural language. The system evolved from a single-agent approach to a multi-agent architecture using Amazon Bedrock, allowing users across different R&D domains to access insights from their 3DP data platform. The solution went from concept to production MVP in six months, addressing the challenge of scaling AI initiatives beyond isolated proof-of-concepts while ensuring proper governance and user adoption through comprehensive change management practices.

Multi-Agent Architecture for Addiction Recovery Support

OpenRecovery

OpenRecovery developed an AI-powered assistant for addiction recovery support using a sophisticated multi-agent architecture built on LangGraph. The system provides personalized, 24/7 support via text and voice, bridging the gap between expensive inpatient care and generic self-help programs. By leveraging LangGraph Platform for deployment, LangSmith for observability, and implementing human-in-the-loop features, they created a scalable solution that maintains empathy and accuracy in addiction recovery guidance.

MultiCare: A Large-Scale Medical Case Report Dataset for AI Model Training

National University of the South

The MultiCare dataset project addresses the challenge of training AI models for medical applications by creating a comprehensive, multimodal dataset of clinical cases. The dataset contains over 75,000 case report articles, including 135,000 medical images with associated labels and captions, spanning multiple medical specialties. The project implements sophisticated data processing pipelines to extract, clean, and structure medical case reports, images, and metadata, making it suitable for training language models, computer vision models, or multimodal AI systems in the healthcare domain.

Multimodal Healthcare Data Integration with Specialized LLMs

John Snow Labs

John Snow Labs developed a comprehensive healthcare data integration system that leverages multiple specialized LLMs to unify and analyze patient data from various sources. The system processes structured, unstructured, and semi-structured medical data (including EHR, PDFs, HL7, FHIR) to create complete patient journeys, enabling natural language querying while maintaining consistency, accuracy, and scalability. The solution addresses key healthcare challenges like terminology mapping, date normalization, and data deduplication, all while operating within secure environments and handling millions of patient records.

Natural Language Interface for Healthcare Data Analytics using LLMs

Aachen Uniklinik / Aurea Software

A UK-based NLQ (Natural Language Query) company developed an AI-powered interface for Aachen Uniklinik to make intensive care unit databases more accessible to healthcare professionals. The system uses a hybrid approach combining vector databases, large language models, and traditional SQL to allow non-technical medical staff to query complex patient data using natural language. The solution includes features for handling dirty data, intent detection, and downstream complication analysis, ultimately improving clinical decision-making processes.

NLP and Machine Learning for Early Sepsis Detection in Neonatal Care

Wroclaw Medical University

Wroclaw Medical University, in collaboration with the Institute of Mother and Child, is developing an AI-powered clinical decision support system to detect and manage sepsis in neonatal intensive care units. The system uses NLP to process unstructured medical records in real-time, combined with machine learning models to identify early sepsis symptoms before they become clinically apparent. Early results suggest the system can reduce diagnosis time from 24 hours to 2 hours while maintaining high sensitivity and specificity, potentially leading to reduced antibiotic usage and improved patient outcomes.

Open-Source Protein Structure Prediction and Generative Design Platform for Drug Discovery

Boltz

Boltz, founded by Gabriele Corso and Jeremy Wohlwend, developed an open-source suite of AI models (Boltz-1, Boltz-2, and BoltzGen) for structural biology and protein design, democratizing access to capabilities previously held by proprietary systems like AlphaFold 3. The company addresses the challenge of predicting complex molecular interactions (protein-ligand, protein-protein) and designing novel therapeutic proteins by combining generative diffusion models with specialized equivariant architectures. Their approach achieved validated nanomolar binders for two-thirds of nine previously unseen protein targets, demonstrating genuine generalization beyond training data. The newly launched Boltz Lab platform provides a production-ready infrastructure with optimized GPU kernels running 10x faster than open-source versions, offering agents for protein and small molecule design with collaborative interfaces for medicinal chemists and researchers.

Optimizing Medical Record Processing with Prompt Caching at Scale

Care Access

Care Access, a global health services and clinical research organization, faced significant operational challenges when processing 300-500+ medical records daily for their health screening program. Each medical record required multiple LLM-based analyses through Amazon Bedrock, but the approach of reprocessing substantial portions of medical data for each separate analysis question led to high costs and slower processing times. By implementing Amazon Bedrock's prompt caching feature—caching the static medical record content while varying only the analysis questions—Care Access achieved an 86% reduction in data processing costs (7x decrease) and 66% faster processing times (3x speedup), saving 4-8+ hours of processing time daily. This optimization enabled the organization to scale their health screening program efficiently while maintaining strict HIPAA compliance and privacy standards, allowing them to connect more participants with personalized health resources and clinical trial opportunities.

Production AI Deployment: Lessons from Real-World Agentic AI Systems

Databricks / Various

This case study presents lessons learned from deploying generative AI applications in production, with a specific focus on Flo Health's implementation of a women's health chatbot on the Databricks platform. The presentation addresses common failure points in GenAI projects including poor constraint definition, over-reliance on LLM autonomy, and insufficient engineering discipline. The solution emphasizes deterministic system architecture over autonomous agents, comprehensive observability and tracing, rigorous evaluation frameworks using LLM judges, and proper DevOps practices. Results demonstrate that successful production deployments require treating agentic AI as modular system architectures following established software engineering principles rather than monolithic applications, with particular emphasis on cost tracking, quality monitoring, and end-to-end deployment pipelines.

Production Evolution of an AI-Powered Medical Consultation Assistant

Doctolib

Doctolib developed and deployed an AI-powered consultation assistant for healthcare professionals that combines speech recognition, summarization, and medical content codification. Through a comprehensive approach involving simulated consultations, extensive testing, and careful metrics tracking, they evolved from MVP to production while maintaining high quality standards. The system achieved widespread adoption and positive feedback through iterative improvements based on both explicit and implicit user feedback, combining short-term prompt engineering optimizations with longer-term model and data improvements.

Responsible AI Implementation for Healthcare Form Automation

WellSky

WellSky, serving over 2,000 hospitals and handling 100 million forms annually, partnered with Google Cloud to address clinical documentation burden and clinician burnout. They developed an AI-powered solution focusing on form automation, implementing a comprehensive responsible AI framework with emphasis on evidence citation, governance, and technical foundations. The project aimed to reduce "pajama time" - where 75% of nurses complete documentation after hours - while ensuring patient safety through careful AI deployment.

Scalable Intelligent Document Processing with Multi-Tenant Serverless Architecture

Ricoh

Ricoh USA faced significant scalability challenges in their healthcare document processing operations, where each new customer implementation required 40-60 hours of custom engineering work involving unique prompt engineering, model fine-tuning, and integration testing. To address anticipated sevenfold growth in document volume (from 10,000 to 70,000 documents monthly), Ricoh partnered with AWS to implement the GenAI IDP Accelerator using a serverless architecture combining Amazon Textract for OCR and Amazon Bedrock foundation models for intelligent classification and extraction. The solution reduced customer onboarding time from 4-6 weeks to 2-3 days, decreased engineering hours per deployment by over 90% (from ~80 hours to <5 hours), and created a reusable, multi-tenant framework that maintains strict healthcare compliance standards (HITRUST, HIPAA, SOC 2) while enabling effective human-in-the-loop workflows through confidence scoring mechanisms.

Scientific Intent Translation System for Healthcare Analytics Using Amazon Bedrock

Aetion

Aetion developed a Measures Assistant to help healthcare professionals translate complex scientific queries into actionable analytics measures using generative AI. By implementing Amazon Bedrock with Claude 3 Haiku and a custom RAG system, they created a production system that allows users to express scientific intent in natural language and receive immediate guidance on implementing complex healthcare data analyses. This reduced the time required to implement measures from days to minutes while maintaining high accuracy and security standards.

Smart Business Analyst: Automating Competitor Analysis in Medical Device Procurement

Philips

A procurement team developed an advanced LLM-powered system called "Smart Business Analyst" to automate competitor analysis in the medical device industry. The system addresses the challenge of gathering and analyzing competitor data across multiple dimensions, including features, pricing, and supplier relationships. Unlike general-purpose LLMs like ChatGPT, this solution provides precise numerical comparisons and leverages multiple data sources to deliver accurate, industry-specific insights, significantly reducing the time required for competitive analysis from hours to seconds.

Streamlining Clinical Trial Documentation Generation with RAG and LLMs

Clario

Clario, a clinical trials endpoint data solutions provider, transformed their time-consuming manual documentation process by implementing a generative AI solution using Amazon Bedrock. The system automates the generation of business requirement specifications from medical imaging charter documents using RAG architecture with Amazon OpenSearch for vector storage and Claude 3.7 Sonnet for text generation. The solution improved accuracy, reduced manual errors, and significantly streamlined their documentation workflow while maintaining security and compliance requirements.

Text-to-SQL System for Complex Healthcare Database Queries

MSD

MSD collaborated with AWS Generative Innovation Center to implement a text-to-SQL solution using Amazon Bedrock and Anthropic's Claude models to translate natural language queries into SQL for complex healthcare databases. The system addresses challenges like coded columns, non-intuitive naming, and complex medical code lists through custom lookup tools and prompt engineering, significantly reducing query time from hours to minutes while democratizing data access for non-technical staff.

Unified Healthcare Data Platform with LLMOps Integration

Doctolib

Doctolib is transforming their healthcare data platform from a reporting-focused system to an AI-enabled unified platform. The company is implementing a comprehensive LLMOps infrastructure as part of their new architecture, including features for model training, inference, and GenAI assistance for data exploration. The platform aims to support both traditional analytics and advanced AI capabilities while ensuring security, governance, and scalability for healthcare data.

Unlocking Patient Population Insights Using Smart Subgroups and LLMs

Aetion

Aetion developed a system to help healthcare researchers discover patterns in patient populations using natural language queries. The solution combines unsupervised machine learning for patient clustering with Amazon Bedrock and Claude 3 LLMs to enable natural language interaction with the data. This allows users unfamiliar with real-world healthcare data to quickly discover patterns and generate hypotheses, reducing analysis time from days to minutes while maintaining scientific rigor.

Using LLMs to Combat Health Insurance Claim Denials

Fight Health Insurance

Fight Health Insurance is an open-source project that uses fine-tuned large language models to help people appeal denied health insurance claims in the United States. The system processes denial letters, extracts relevant information, and generates appeal letters based on training data from independent medical review boards. The project addresses the widespread problem of insurance claim denials by automating the complex and time-consuming process of crafting effective appeals, making it accessible to individuals who lack the resources or knowledge to navigate the appeals process themselves. The tool is available both as an open-source Python package and as a free hosted service, though the sustainability model is still being developed.