ZenML

Building a Secure AI Assistant for Visual Effects Artists Using Amazon Bedrock

Untold Studios 2025
View original source

Untold Studios developed an AI assistant integrated into Slack to help their visual effects artists access internal resources and tools more efficiently. Using Amazon Bedrock with Claude 3.5 Sonnet and a serverless architecture, they created a natural language interface that handles 120 queries per day, reducing information search time from minutes to seconds while maintaining strict data security. The solution combines RAG capabilities with function calling to access multiple knowledge bases and internal systems, significantly reducing the support team's workload.

Industry

Media & Entertainment

Technologies

Overview

Untold Studios is a tech-driven creative studio specializing in high-end visual effects and animation. The company faced a common enterprise challenge: how to enable diverse teams of artists with varying levels of technical expertise to efficiently access internal resources, tools, and workflows while maintaining strict data security requirements. Their solution was to build “Untold Assistant,” an AI-powered natural language interface integrated directly into Slack, leveraging Amazon Bedrock and Anthropic’s Claude 3.5 Sonnet model.

The case study is notable for its practical, production-focused approach to deploying LLMs in an enterprise setting. Rather than building complex custom infrastructure, Untold Studios leveraged managed AWS services and pre-built connectors to accelerate development while maintaining control over their data. The result is a system that handles up to 120 queries per day, with 10-20% of those queries triggering additional tool calls like image generation or knowledge base search.

Architecture and Infrastructure

The solution is built on a serverless AWS architecture, which provides scalability and responsiveness without the overhead of managing servers. The key AWS services used include:

The architecture follows a two-function Lambda pattern to meet Slack’s 3-second acknowledgment requirement. When an incoming event arrives from Slack via API Gateway, the first Lambda function (with reserved capacity) quickly acknowledges the event and forwards the request to a second Lambda function for processing without time restrictions. Importantly, they chose to invoke the second function directly rather than using Amazon SNS or SQS as an intermediary, prioritizing low latency over loose coupling.

Slack Integration as the User Interface

A key decision in this implementation was to use Slack as the primary user interface rather than building a custom frontend. This approach offers several practical advantages for production deployment:

This decision reflects a pragmatic approach to LLMOps: leveraging existing infrastructure and user habits rather than requiring adoption of new tools. It’s worth noting that this approach does create a dependency on Slack’s platform and API, which could be a consideration for other organizations.

Retrieval Augmented Generation (RAG) Implementation

The RAG implementation is a highlight of this case study, demonstrating effective use of managed connectors to reduce development complexity. The team uses Amazon Bedrock’s pre-built connectors for:

For data sources without pre-built connectors, they export content to Amazon S3 and use the S3 connector. A notable example is their asset library, where they export pre-chunked asset metadata to S3 and let Amazon Bedrock handle embeddings, vector storage, and search automatically.

This approach significantly decreased development time and complexity by eliminating the need to build custom vector store integrations and embedding pipelines. The trade-off is less control over the embedding and retrieval process, but for their use case, the speed of development was more valuable than fine-grained control.

Access to data sources is controlled through a tool-based permission system, where every tool encapsulates a data source and the LLM’s access to tools is restricted based on user roles and clearance levels.

Function Calling for Extensibility

The system uses Claude’s function calling capabilities to extend the assistant’s abilities beyond text generation. Rather than implementing a complex agentic loop, they opted for a single-pass function call approach to keep things “simple and robust.” However, in certain cases, the function itself may use the LLM internally to process and format data for the end user.

The implementation uses a Python base class pattern (AiTool) that enables automatic discovery and registration of new tools. To add a new capability, developers simply create a new class that inherits from AiTool, and the system automatically:

Current capabilities include knowledge base queries, internal asset library search, image generation via Stable Diffusion, and user-specific memory management. The modular architecture allows for rapid addition of new tools as needs arise.

The team explicitly chose not to use frameworks like LangChain, opting instead for a “lightweight, custom approach tailored to [their] specific needs.” This decision allowed them to maintain a smaller footprint and focus on essential features.

User Memory and Personalization

A notable production feature is the user memory system. Users can tell the assistant to remember preferences like “Keep all your replies as short as possible” or “If I ask for code it’s always Python.” These preferences are stored in DynamoDB and added to the context for every subsequent request from that user.

User management maps Slack user IDs to an internal user pool in DynamoDB (designed to be compatible with Amazon Cognito for future migration). This enables personalized capabilities based on each user’s role and clearance level.

Logging, Monitoring, and Analytics

For production observability, the system uses:

The comprehensive logging provides a rich dataset for analyzing usage patterns and optimizing performance. The team plans to use this data proactively: by analyzing saved queries using Amazon Titan Text Embeddings and agglomerative clustering, they can identify semantically similar questions. When cluster frequency exceeds a threshold (e.g., more than 10 similar questions from different users in a week), they enhance their knowledge base or update onboarding materials to address common queries.

Security Considerations

The case study emphasizes that security and control are “paramount” in their AI adoption strategy. By keeping all data within the AWS ecosystem, they eliminated dependencies on third-party AI tools and associated data privacy risks. The tool-based permission system ensures users only access data appropriate to their role.

It’s worth noting that while this approach provides control over data residency and access, it still relies on Amazon Bedrock’s managed service, meaning Anthropic’s Claude model is processing the queries. Organizations with stricter data handling requirements may need to consider self-hosted model options.

Production Results and Impact

The Untold Assistant currently handles up to 120 queries per day, with 10-20% triggering additional tool calls. The team reports that for new users unfamiliar with internal workflows, the assistant can reduce information retrieval time from minutes to seconds by serving as a “virtual member of the support team.”

Reported benefits include:

The development timeline is also notable: using managed services and pre-built connectors reduced development from “months to weeks.”

Future Development Plans

The team has specific technical improvements planned:

These plans demonstrate a mature approach to LLMOps, where production systems are continuously improved based on real usage data.

Balanced Assessment

This case study presents a practical, well-executed production deployment of LLM technology. The strengths include pragmatic architecture decisions (Slack as UI, managed connectors over custom RAG), a modular design for extensibility, and thoughtful attention to security and user permissions.

However, as with any vendor-authored case study (this was published on the AWS blog with AWS employees as co-authors), the presented results should be considered with appropriate skepticism. The usage metrics (120 queries/day, time savings) are self-reported without detailed methodology. The claim that development was reduced from “months to weeks” is difficult to verify and may not translate to other organizations with different requirements.

The single-pass function calling approach, while pragmatic, limits the system’s ability to handle complex multi-step tasks that would benefit from agentic reasoning. For organizations with more complex workflow automation needs, additional development would be required.

Overall, this case study provides a useful template for organizations looking to deploy LLM-powered assistants using AWS managed services, with realistic scope and practical architectural patterns.

More Like This

Agentic AI Copilot for Insurance Underwriting with Multi-Tool Integration

Snorkel 2025

Snorkel developed a specialized benchmark dataset for evaluating AI agents in insurance underwriting, leveraging their expert network of Chartered Property and Casualty Underwriters (CPCUs). The benchmark simulates an AI copilot that assists junior underwriters by reasoning over proprietary knowledge, using multiple tools including databases and underwriting guidelines, and engaging in multi-turn conversations. The evaluation revealed significant performance variations across frontier models (single digits to ~80% accuracy), with notable error modes including tool use failures (36% of conversations) and hallucinations from pretrained domain knowledge, particularly from OpenAI models which hallucinated non-existent insurance products 15-45% of the time.

healthcare fraud_detection customer_support +90

AI-Powered Vehicle Information Platform for Dealership Sales Support

Toyota 2025

Toyota Motor North America (TMNA) and Toyota Connected built a generative AI platform to help dealership sales staff and customers access accurate vehicle information in real-time. The problem was that customers often arrived at dealerships highly informed from internet research, while sales staff lacked quick access to detailed vehicle specifications, trim options, and pricing. The solution evolved from a custom RAG-based system (v1) using Amazon Bedrock, SageMaker, and OpenSearch to retrieve information from official Toyota data sources, to a planned agentic platform (v2) using Amazon Bedrock AgentCore with Strands agents and MCP servers. The v1 system achieved over 7,000 interactions per month across Toyota's dealer network, with citation-backed responses and legal compliance built in, while v2 aims to enable more dynamic actions like checking local vehicle availability.

customer_support chatbot question_answering +47

Multi-Agent AI System for Financial Intelligence and Risk Analysis

Moody’s 2025

Moody's Analytics, a century-old financial institution serving over 1,500 customers across 165 countries, transformed their approach to serving high-stakes financial decision-making by evolving from a basic RAG chatbot to a sophisticated multi-agent AI system on AWS. Facing challenges with unstructured financial data (PDFs with complex tables, charts, and regulatory documents), context window limitations, and the need for 100% accuracy in billion-dollar decisions, they architected a serverless multi-agent orchestration system using Amazon Bedrock, specialized task agents, custom workflows supporting up to 400 steps, and intelligent document processing pipelines. The solution processes over 1 million tokens daily in production, achieving 60% faster insights and 30% reduction in task completion times while maintaining the precision required for credit ratings, risk intelligence, and regulatory compliance across credit, climate, economics, and compliance domains.

fraud_detection document_processing question_answering +42