ZenML

Building Secure and Private Enterprise LLM Infrastructure

Slack 2024
View original source

Slack implemented AI features by developing a secure architecture that ensures customer data privacy and compliance. They used AWS SageMaker to host LLMs in their VPC, implemented RAG instead of fine-tuning models, and maintained strict data access controls. The solution resulted in 90% of AI-adopting users reporting increased productivity while maintaining enterprise-grade security and compliance requirements.

Industry

Tech

Technologies

Overview

Slack, the enterprise collaboration platform owned by Salesforce, embarked on building AI-powered features (collectively called “Slack AI”) that leverage large language models while adhering to their stringent data security and privacy requirements. The case study, published in April 2024, provides insights into how Slack approached the architectural and operational challenges of deploying LLMs in a production environment that serves enterprise customers with strict compliance needs.

Slack positions itself as a “conservative technologist” that takes a rigorous approach to adopting new infrastructure categories. Their experience with machine learning dates back to 2016, but the rapid advancement of commercially available LLMs presented both an opportunity and a challenge: how to deliver AI features that address user pain points (information overload, difficulty finding content) while maintaining their reputation for data stewardship.

Core Principles Driving Architecture

Before diving into implementation, Slack established four guiding principles specifically for their AI implementation:

These principles shaped every subsequent architectural decision and represent a thoughtful approach to LLMOps in enterprise contexts where data governance is paramount.

Infrastructure and Hosting Strategy

One of the most significant LLMOps decisions Slack made was around model hosting. The standard industry approach of calling hosted foundational model services directly was incompatible with their requirements, particularly their FedRAMP Moderate authorization which prohibits sending customer data outside their trust boundary.

Slack found a creative solution through AWS, which acts as a “trusted broker” between foundational model providers and customers. This arrangement allows Slack to host closed-source LLMs in an escrow VPC, meaning they control the data lifecycle while using top-tier commercial models. This is an important pattern for enterprises that want access to leading foundational models but cannot send data to third-party APIs.

The case study notes that most foundational models are closed-source since “their models are their secret sauce,” making traditional self-hosting difficult. The AWS escrow approach allows Slack to guarantee that model providers have no access to customer data while still leveraging advanced model capabilities. This represents a pragmatic middle ground in the LLMOps landscape between fully self-hosted open-source models and direct API calls to model providers.

Retrieval Augmented Generation (RAG) Architecture

A cornerstone of Slack AI’s architecture is the use of Retrieval Augmented Generation (RAG) rather than fine-tuning or training models on customer data. This decision was driven by privacy considerations: Slack has long-standing principles that data should not leak across workspaces and that customers should have choice around data practices. Given what they describe as the “young state” of generative AI, they felt they couldn’t make strong enough guarantees about these practices if they trained models on customer data.

With RAG, all context needed to perform a task is included within each request, making interactions stateless. For example, when summarizing a channel, Slack sends the LLM a prompt containing the messages to be summarized along with instructions. This statelessness provides both privacy benefits (no data retention by the model) and product benefits (results are grounded in company-specific knowledge rather than the public internet).

However, the case study acknowledges RAG’s operational challenges:

Slack evaluated multiple models to find one with a large context window and acceptable latency for their summarization and search use cases. They also combined prompt tuning with chaining traditional ML models to generative models to improve results. The article notes optimism that RAG is becoming “easier and faster with each iteration of models” as context windows grow and models improve at synthesizing information across large contexts.

Access Control and Authorization

Slack implemented a critical constraint that AI features can only access data the requesting user can already see. This means Slack AI’s search never surfaces results that standard search wouldn’t show, and summaries never include content the user couldn’t read in the channel directly.

Implementation leverages the requesting user’s Access Control List (ACLs) when fetching data and reuses existing libraries that fetch data for display in channels or search results. While the team notes this wasn’t technically difficult, it required an explicit architectural choice to build on top of existing core feature sets rather than creating separate data access paths.

An additional privacy measure ensures that only the user who invokes Slack AI can see the AI-generated output, reinforcing the principle of minimal data exposure.

Compliance Integration

Slack AI integrates with the platform’s existing enterprise compliance and security offerings, following a “least data” principle: storing only data needed to complete tasks, and only for necessary durations.

Where possible, outputs are ephemeral—conversation summaries and search answers generate point-in-time responses that aren’t stored on disk. For cases where persistence is necessary, Slack built special support to maintain awareness of the relationship between derived content (like summaries) and source messages.

A notable example is Data Loss Protection (DLP) integration: if a message is tombstoned due to DLP policies, any summaries derived from that message are automatically invalidated. This extends existing administrative controls to AI-generated content seamlessly.

Other compliance capabilities mentioned include:

Model Evaluation and Prompt Engineering

The case study mentions that Slack engaged in “prompt tuning” and evaluation of multiple models, though specific details are not provided. The authors note that this is just the beginning of their story, promising future content on “how we build prompts, evaluate models, or handle spiky demand.”

This suggests an ongoing operational practice around model evaluation and prompt engineering that is typical of mature LLMOps environments, though the specifics remain undisclosed in this particular article.

Results and Claimed Benefits

Slack reports that 90% of users who adopted AI features reported higher productivity compared to those who didn’t. While this is a strong claim, the case study doesn’t provide detailed methodology for this finding, so it should be interpreted as a general indicator of user satisfaction rather than a rigorous measure of productivity gains.

Critical Assessment

The case study presents a thoughtful, privacy-first approach to LLMOps that will resonate with enterprises facing similar constraints. However, several aspects deserve critical consideration:

The AWS escrow VPC arrangement, while creative, still relies on trust in AWS as the broker. Organizations with stricter requirements may need to evaluate whether this meets their specific compliance needs.

The decision to use RAG exclusively rather than any form of fine-tuning trades off potential quality improvements for privacy guarantees. As the generative AI industry matures, this trade-off may shift.

The claimed 90% productivity improvement lacks methodological transparency, making it difficult to assess its validity or generalizability.

The case study is notably vague on specific model choices, prompt engineering practices, and evaluation methodologies—likely for competitive reasons, but limiting its utility as a technical reference.

Despite these caveats, Slack’s approach represents a mature pattern for enterprise LLMOps that prioritizes trust and compliance while still delivering AI capabilities. Their emphasis on reusing existing infrastructure (ACLs, compliance systems) rather than building parallel systems is a pragmatic approach that other organizations could learn from.

More Like This

Agentic AI Copilot for Insurance Underwriting with Multi-Tool Integration

Snorkel 2025

Snorkel developed a specialized benchmark dataset for evaluating AI agents in insurance underwriting, leveraging their expert network of Chartered Property and Casualty Underwriters (CPCUs). The benchmark simulates an AI copilot that assists junior underwriters by reasoning over proprietary knowledge, using multiple tools including databases and underwriting guidelines, and engaging in multi-turn conversations. The evaluation revealed significant performance variations across frontier models (single digits to ~80% accuracy), with notable error modes including tool use failures (36% of conversations) and hallucinations from pretrained domain knowledge, particularly from OpenAI models which hallucinated non-existent insurance products 15-45% of the time.

healthcare fraud_detection customer_support +90

Building and Scaling Enterprise LLMOps Platforms: From Team Topology to Production

Various 2023

A comprehensive overview of how enterprises are implementing LLMOps platforms, drawing from DevOps principles and experiences. The case study explores the evolution from initial AI adoption to scaling across teams, emphasizing the importance of platform teams, enablement, and governance. It highlights the challenges of testing, model management, and developer experience while providing practical insights into building robust AI infrastructure that can support multiple teams within an organization.

code_generation high_stakes_application regulatory_compliance +32

AI-Powered Conversational Assistant for Streamlined Home Buying Experience

Rocket 2025

Rocket Companies, a Detroit-based FinTech company, developed Rocket AI Agent to address the overwhelming complexity of the home buying process by providing 24/7 personalized guidance and support. Built on Amazon Bedrock Agents, the AI assistant combines domain knowledge, personalized guidance, and actionable capabilities to transform client engagement across Rocket's digital properties. The implementation resulted in a threefold increase in conversion rates from web traffic to closed loans, 85% reduction in transfers to customer care, and 68% customer satisfaction scores, while enabling seamless transitions between AI assistance and human support when needed.

customer_support chatbot question_answering +40