Telus developed Fuel X, an enterprise-scale LLM platform that provides centralized management of multiple AI models and services. The platform enables creation of customized copilots for different use cases, with over 30,000 custom copilots built and 35,000 active users. Key features include flexible model switching, enterprise security, RAG capabilities, and integration with workplace tools like Slack and Google Chat. Results show significant impact, including 46% self-resolution rate for internal support queries and 21% reduction in agent interactions.
This case study presents FuelEx, an enterprise generative AI platform developed by Telus, a major Canadian telecommunications company. The presentation was delivered at the Toronto Machine Learning Summit by Liz (Developer Advocate and Engineering Manager) and Sarah (Software Developer) from Telus’s platform engineering team. FuelEx represents Telus’s approach to democratizing generative AI across their organization while maintaining enterprise-grade security, flexibility, and responsible AI practices.
The platform emerged from an engineering productivity team initiative approximately 18 months prior to the presentation. What began as a Slack integration for generative AI grew into a comprehensive platform used by thousands of employees daily. The team eventually merged expertise from Telus Digital Experience (formerly Telus International) and Willow Tree to form a unified vision and roadmap.
FuelEx operates as a centralized management layer sitting atop foundational AI services and cloud infrastructure. The architecture is deliberately positioned in what the presenters describe as the “AI value chain” - above hardware providers (like NVIDIA), cloud hosting layers (AWS, Azure, GCP), and foundational models, but below the application layer that end users interact with.
The platform consists of two main components: FuelEx Core and FuelEx Apps. The Core component handles all centralized management functions including integrations, observation/monitoring, orchestration across different models, moderation, and validation. The application layer includes web applications, Slack integration, Google Chat integration, with Microsoft Teams support planned for the future.
A notable architectural decision is the multi-cloud, multi-model approach. The platform is not locked into a single cloud provider - they host OpenAI models on Azure, Claude models on AWS Bedrock, and utilize GCP as well. This is managed through a proxy layer that enables load balancing, fallback mechanisms, and retries across the various model providers.
For Canadian data residency requirements (a critical concern for a Canadian telecom), the platform supports models and infrastructure hosted entirely within Canada. Their vector database solution uses Turbo Puffer, a Canadian company with Canadian hosting. They also leverage Cohere Command R models and other Canada-hosted model options.
The core user-facing concept in FuelEx is the “co-pilot” - customized AI bots created for specific use cases. These are analogous to OpenAI’s GPTs but built entirely as Telus’s own solution rather than layered on top of OpenAI’s Assistants API. Each co-pilot can have its own system prompt, connected knowledge bases, model selection, access controls, and guardrail configurations.
The platform uses Retrieval Augmented Generation (RAG) architecture for knowledge retrieval. When documents are uploaded, they are processed (including OCR for images), embedded, and stored in a vector database. At query time, similarity search retrieves relevant context which is then provided to the LLM along with the user’s question and system prompt.
For tool use and function calling, FuelEx implements a planner-executor architecture. When a user submits a query, the system sends all available tools to the LLM along with the original question and asks it to devise a plan. The LLM determines which tools to call (e.g., internet search, image generation with DALL-E), the execution layer runs those tools, and then the responses are sent back to the LLM for final summarization and response generation.
The presentation highlighted several production co-pilots deployed within Telus:
Spock (Single Point of Contact): An internal IT support bot used by all team members. After several months of operation, 46% of interactions are resolved by the co-pilot without escalation to human agents. In one month (June), they observed a 21% decrease in agent interactions month-over-month.
One Source: A customer service agent co-pilot that helps call center representatives pull information more quickly to respond to customer inquiries.
Milo: A co-pilot for retail store representatives to answer questions and process customer requests more efficiently.
Telus J Co-pilot: A generic co-pilot available to all Telus employees with access to tools like internet search and image generation for day-to-day queries.
The overall platform metrics are impressive: over 30,000 custom co-pilots have been built on the platform, with more than 35,000 active users. Use cases span meeting summaries, image generation, business requirements documentation, and coding assistance.
A key design philosophy is meeting users “where they work” rather than forcing them to adopt another web application. The platform integrates with Slack, Google Chat, and soon Microsoft Teams. The same co-pilot with identical configurations (system prompt, model, settings) can be accessed across different channels because it’s built at the platform level rather than being channel-specific.
Future integration plans include VS Code extensions, GitHub Actions, and Chrome extensions. The Chrome extension concept would enable scraping and contextualizing web content directly.
FuelEx includes a “Lab” feature for developers and builders to experiment with different models before deployment. The Lab enables:
The presenters demonstrated this by showing how the same translation system prompt produced different behaviors across models - with some following instructions precisely while others deviated.
Telus emphasizes responsible AI as a core differentiator. They have a dedicated Responsible AI Squad, over 500 trained data stewards, and thousands trained in prompt engineering. Their responsible AI framework includes a data enablement plan process and “purple teaming” (combining adversarial and defensive testing) with team members across the organization.
The platform implements configurable guardrails at multiple layers. For streaming responses, output-side guardrails are challenging to implement, so they leverage model-level guardrails (e.g., Azure’s content filtering) where configurable severity and category settings can be specified. For RAG-based use cases, guardrails are implemented by constraining the LLM to respond based on provided context rather than generating content independently.
Telus received the Outstanding Organization 2023 prize from the Responsible AI Institute, and their Telus.com support use case achieved Privacy by Design ISO certification.
For performance, the platform employs asynchronous processing where possible. For example, chat naming (where an LLM generates a title for a conversation) is done asynchronously after the main response is delivered. Streaming is used to provide perceived responsiveness even when underlying LLM calls are slow. The multi-model proxy layer also enables optimization through load balancing and fallback to alternative models when one is slow or unavailable.
Monitoring spans multiple cloud platforms (GCP, AWS, Azure) with customizable solutions for different organizational needs. Key metrics include response time, cost per query, and accuracy evaluation. For accuracy assessment, they maintain databases with “source of truth” answers and use LLMs to compare and score responses - a form of LLM-as-judge evaluation.
While the presentation is largely promotional (the team is actively marketing FuelEx), several practical limitations and honest assessments emerge. The presenters acknowledge that guardrails on streaming output remain challenging without model-level support. They note that LLM response times are fundamentally limited - “if it’s a slow LLM there’s not a lot that you can do.” The platform is not open source, though they emphasize extensibility through APIs for organizations to connect their own applications and data sources.
The multi-language support relies on the underlying LLM capabilities (specifically citing GPT-4’s multilingual abilities) rather than platform-specific handling, and the embedding-based retrieval should work across languages due to semantic similarity matching.
This case study represents a mature enterprise LLMOps implementation with approximately 18 months of production experience, significant scale (35,000+ users, 30,000+ co-pilots), and a thoughtful multi-model, multi-cloud architecture designed for flexibility and Canadian regulatory compliance.
Snorkel developed a specialized benchmark dataset for evaluating AI agents in insurance underwriting, leveraging their expert network of Chartered Property and Casualty Underwriters (CPCUs). The benchmark simulates an AI copilot that assists junior underwriters by reasoning over proprietary knowledge, using multiple tools including databases and underwriting guidelines, and engaging in multi-turn conversations. The evaluation revealed significant performance variations across frontier models (single digits to ~80% accuracy), with notable error modes including tool use failures (36% of conversations) and hallucinations from pretrained domain knowledge, particularly from OpenAI models which hallucinated non-existent insurance products 15-45% of the time.
Stripe, processing approximately 1.3% of global GDP, has evolved from traditional ML-based fraud detection to deploying transformer-based foundation models for payments that process every transaction in under 100ms. The company built a domain-specific foundation model treating charges as tokens and behavior sequences as context windows, ingesting tens of billions of transactions to power fraud detection, improving card-testing detection from 59% to 97% accuracy for large merchants. Stripe also launched the Agentic Commerce Protocol (ACP) jointly with OpenAI to standardize how agents discover and purchase from merchant catalogs, complemented by internal AI adoption reaching 8,500 employees daily using LLM tools, with 65-70% of engineers using AI coding assistants and achieving significant productivity gains like reducing payment method integrations from 2 months to 2 weeks.
Wealthsimple, a Canadian FinTech company, developed a comprehensive LLM platform to securely leverage generative AI while protecting sensitive financial data. They built an LLM gateway with built-in security features, PII redaction, and audit trails, eventually expanding to include self-hosted models, RAG capabilities, and multi-modal inputs. The platform achieved widespread adoption with over 50% of employees using it monthly, leading to improved productivity and operational efficiencies in client service workflows.