ZenML

Scientific Intent Translation System for Healthcare Analytics Using Amazon Bedrock

Aetion 2025
View original source

Aetion developed a Measures Assistant to help healthcare professionals translate complex scientific queries into actionable analytics measures using generative AI. By implementing Amazon Bedrock with Claude 3 Haiku and a custom RAG system, they created a production system that allows users to express scientific intent in natural language and receive immediate guidance on implementing complex healthcare data analyses. This reduced the time required to implement measures from days to minutes while maintaining high accuracy and security standards.

Industry

Healthcare

Technologies

Aetion Measures Assistant: LLMOps Case Study

Company and Use Case Overview

Aetion is a healthcare software and services company specializing in generating real-world evidence (RWE) on the safety, effectiveness, and value of medications and clinical interventions. The company serves major biopharma companies, regulatory agencies (including FDA and EMA), payors, and health technology assessment customers across the US, Canada, Europe, and Japan. Their core platform, the Aetion Evidence Platform (AEP), is a longitudinal analytic engine capable of applying causal inference and statistical methods to hundreds of millions of patient journeys.

The challenge Aetion faced was bridging the gap between scientific intent expressed in natural language and the technical implementation of complex patient variables in real-world data. Scientists, epidemiologists, and biostatisticians need to capture complex, clinically relevant patient variables that often involve sequences of events, combinations of occurrences and non-occurrences, and detailed numeric calculations. Previously, translating questions like “I want to find patients with a diagnosis of diabetes and a subsequent metformin fill” into working algorithms required specialized expertise and could take days.

The Measures Assistant Solution

Aetion developed “Measures Assistant” as part of their AetionAI suite, embedded within their Substantiate application. This feature enables users to express scientific intent in natural language and receive instructions on how to implement complex patient variables using AEP’s Measures system. Measures are described as “logical building blocks used to flexibly capture complex patient variables,” and they can be chained together to address nuanced research questions.

The solution architecture demonstrates thoughtful LLMOps practices. Measures Assistant is deployed as a microservice in a Kubernetes on AWS environment and accessed through a REST API. This containerized, microservices approach provides scalability and maintainability benefits typical of production ML systems. All data transmission is encrypted using TLS 1.2, addressing the security requirements essential in healthcare applications where patient data sensitivity is paramount.

LLM Selection and Integration

Aetion chose Amazon Bedrock as their foundation for working with large language models. According to the case study, the decision was based on Bedrock’s “vast model selection from multiple providers, security posture, extensibility, and ease of use.” This reflects a common pattern in enterprise LLMOps where managed services are preferred over self-hosted solutions to reduce operational burden and leverage built-in security features.

For the specific model, Aetion selected Anthropic’s Claude 3 Haiku, noting it was “more efficient in runtime and cost than available alternatives.” This model choice reflects practical production considerations where latency and cost per request matter significantly for user-facing applications. Haiku’s positioning as the fastest and most compact model in the Claude 3 family makes it suitable for interactive applications requiring quick responses.

Prompt Engineering and Knowledge Management

The prompt engineering approach used in Measures Assistant is sophisticated and demonstrates production-grade practices. The prompt template contains several key components:

The system uses a hybrid static-dynamic template approach. Static portions contain fixed instructions covering a broad range of well-defined behaviors. Dynamic portions select questions and answers from a local knowledge base based on semantic proximity to the user’s query. This is described as modeling “a small-scale, optimized, in-process knowledge base for a Retrieval Augmented Generation (RAG) pattern.”

Embedding and Retrieval System

For semantic search within the knowledge base, Aetion fine-tuned Mixedbread’s mxbai-embed-large-v1 Sentence Transformer to generate embeddings for their question-and-answer pairs and user questions. Similarity between questions is calculated using cosine similarity between embedding vectors. The decision to fine-tune an existing embedding model rather than using it off-the-shelf suggests Aetion invested in optimizing retrieval quality for their specific domain vocabulary and use cases.

This approach represents a pragmatic middle ground between pure RAG systems that rely on external vector databases and fully fine-tuned models. By maintaining a local, optimized knowledge base with pre-computed embeddings, Aetion can achieve fast retrieval while keeping the system architecture simpler than a full vector database deployment.

Guardrails and Quality Control

The case study emphasizes the importance of guardrails in ensuring response quality. Aetion maintains a local knowledge base created by scientific experts, and this information is incorporated into responses as guardrails. According to the text, “These guardrails make sure the service returns valid instructions to the user, and compensates for logical reasoning errors that the core model might exhibit.” This acknowledgment that LLMs can exhibit logical reasoning errors and need domain-specific constraints demonstrates mature LLMOps thinking.

Notably, the generation and maintenance of the question-and-answer pool involves a human-in-the-loop process. Subject matter experts continuously test Measures Assistant, and the resulting question-answer pairs are used to refine the system. This continuous improvement cycle, combining automated systems with expert human oversight, is essential for maintaining quality in production LLM applications, particularly in regulated industries like healthcare.

Architecture and Security Considerations

The deployment on Kubernetes provides containerized infrastructure benefits including scalability, fault tolerance, and easier deployment management. The REST API interface enables clean integration with the existing Substantiate application. The emphasis on TLS 1.2 encryption for both user-provided prompts and requests to Amazon Bedrock reflects the healthcare industry’s stringent security and compliance requirements.

The case study mentions that Amazon Bedrock’s security posture was a factor in the platform selection, which aligns with healthcare organizations’ needs to maintain HIPAA compliance and protect sensitive patient data. By using a managed service with enterprise security features, Aetion can leverage AWS’s compliance certifications rather than building and maintaining these capabilities internally.

Claimed Outcomes and Critical Assessment

The reported outcomes indicate that users can now turn natural language questions into measures “in a matter of minutes as opposed to days, without the need of support staff and specialized training.” This represents a significant claimed improvement in efficiency, though it’s worth noting this is from an AWS blog post that naturally highlights positive results.

Some aspects not fully addressed in the case study include specific accuracy metrics, how edge cases or ambiguous queries are handled, the volume of queries processed in production, and how model updates or changes are managed. The continuous refinement process with human experts suggests ongoing maintenance requirements that organizations considering similar implementations should factor into their planning.

Future Directions

Aetion indicates they are continuing to refine the knowledge base and expand generative AI capabilities across their product suite. This suggests an evolving LLMOps practice where the initial deployment serves as a foundation for broader AI integration. The approach of starting with a focused use case (Measures translation) before expanding demonstrates a measured adoption strategy that allows teams to build expertise and infrastructure incrementally.

The case study represents a practical example of LLM deployment in healthcare analytics, combining managed LLM services, custom knowledge bases, embedding-based retrieval, and human-in-the-loop quality assurance to create a production system that bridges the gap between domain experts and complex technical systems.

More Like This

Agentic AI Copilot for Insurance Underwriting with Multi-Tool Integration

Snorkel 2025

Snorkel developed a specialized benchmark dataset for evaluating AI agents in insurance underwriting, leveraging their expert network of Chartered Property and Casualty Underwriters (CPCUs). The benchmark simulates an AI copilot that assists junior underwriters by reasoning over proprietary knowledge, using multiple tools including databases and underwriting guidelines, and engaging in multi-turn conversations. The evaluation revealed significant performance variations across frontier models (single digits to ~80% accuracy), with notable error modes including tool use failures (36% of conversations) and hallucinations from pretrained domain knowledge, particularly from OpenAI models which hallucinated non-existent insurance products 15-45% of the time.

healthcare fraud_detection customer_support +90

Building a Microservices-Based Multi-Agent Platform for Financial Advisors

Prudential 2025

Prudential Financial, in partnership with AWS GenAI Innovation Center, built a scalable multi-agent platform to support 100,000+ financial advisors across insurance and financial services. The system addresses fragmented workflows where advisors previously had to navigate dozens of disconnected IT systems for client engagement, underwriting, product information, and servicing. The solution features an orchestration agent that routes requests to specialized sub-agents (quick quote, forms, product, illustration, book of business) while maintaining context and enforcing governance. The platform-based microservices architecture reduced time-to-value from 6-8 weeks to 3-4 weeks for new agent deployments, enabled cross-business reusability, and provided standardized frameworks for authentication, LLM gateway access, knowledge management, and observability while handling the complexity of scaling multi-agent systems in a regulated financial services environment.

healthcare fraud_detection customer_support +48

Building Economic Infrastructure for AI with Foundation Models and Agentic Commerce

Stripe 2025

Stripe, processing approximately 1.3% of global GDP, has evolved from traditional ML-based fraud detection to deploying transformer-based foundation models for payments that process every transaction in under 100ms. The company built a domain-specific foundation model treating charges as tokens and behavior sequences as context windows, ingesting tens of billions of transactions to power fraud detection, improving card-testing detection from 59% to 97% accuracy for large merchants. Stripe also launched the Agentic Commerce Protocol (ACP) jointly with OpenAI to standardize how agents discover and purchase from merchant catalogs, complemented by internal AI adoption reaching 8,500 employees daily using LLM tools, with 65-70% of engineers using AI coding assistants and achieving significant productivity gains like reducing payment method integrations from 2 months to 2 weeks.

fraud_detection chatbot code_generation +57