Minimal: Multi-Agent Customer Support System for E-commerce

Overview

Minimal is a Dutch startup founded by Titus Ex (a machine learning engineer) and Niek Hogenboom (an aerospace engineering graduate) that focuses on automating customer support workflows for e-commerce businesses. The company has developed an AI-powered system that integrates with popular helpdesk platforms like Zendesk, Front, and Gorgias, aiming to handle customer queries autonomously while maintaining the ability to execute real actions such as order cancellations, refunds, and address updates through direct integrations with e-commerce services like Shopify.

The case study, published via LangChain’s blog in January 2025, presents Minimal’s approach to building a production-ready multi-agent system for customer support. It’s worth noting that this is a promotional piece from LangChain’s ecosystem, so the claimed metrics (80%+ efficiency gains, 90% autonomous ticket resolution) should be considered with appropriate caution, as independent verification is not provided.

The Problem Space

E-commerce customer support presents a tiered complexity challenge. While basic support tickets (referred to as T1) are relatively straightforward to handle, more complex issues (T2 and T3) require deeper integration with business systems and nuanced understanding of company policies. Traditional approaches using monolithic LLM prompts were found to conflate multiple tasks, leading to errors and inefficient token usage. The team discovered that attempting to handle all aspects of customer support within a single prompt led to reliability issues and made it difficult to maintain and extend the system.

Multi-Agent Architecture

Minimal’s core technical innovation lies in their multi-agent architecture, which decomposes the customer support workflow into specialized components. This approach represents a significant LLMOps pattern for managing complexity in production LLM systems.

The architecture consists of three main agent types working in coordination:

Planner Agent: This agent serves as the orchestration layer, receiving incoming customer queries and breaking them down into discrete sub-problems. For example, a complex query might be decomposed into separate concerns like “Return Policy” and “Troubleshooting Front-End Issues.” The Planner Agent coordinates with specialized research agents and determines the overall flow of information through the system.

Research Agents: These specialized agents handle individual sub-problems identified by the Planner Agent. They perform retrieval and re-ranking operations against the company’s knowledge base, which includes documentation like returns guidelines, shipping rules, and other customer protocols stored in what Minimal calls their “training center.” This represents a RAG (Retrieval-Augmented Generation) pattern where agents pull relevant context before generating responses.

Tool-Calling Agent: This agent receives the consolidated “tool plan” from the Planner Agent and executes actual business operations. This includes decisive actions like processing refunds via Shopify or updating shipping addresses. Importantly, this agent consolidates logs for post-processing and chain-of-thought validation, which is crucial for maintaining auditability in a production environment.

The final step in the pipeline produces a reasoned draft reply to the customer that references correct protocols, checks relevant data, and ensures compliance with business rules around refunds and returns.

Rationale for Multi-Agent Design

The team’s decision to adopt a multi-agent architecture over a monolithic approach was driven by several production concerns. They found that combining all tasks in a single prompt led to conflation of responsibilities and increased error rates. By splitting tasks across specialized agents, they achieved several benefits:

Reduced prompt complexity for each individual agent
Improved reliability through separation of concerns
Enhanced extensibility, allowing new specialized agents to be added without disrupting existing workflows
More efficient token usage by only invoking relevant capabilities for each sub-task

This architectural pattern is increasingly common in production LLM systems where complexity needs to be managed systematically.

Testing and Evaluation with LangSmith

A significant portion of Minimal’s LLMOps practice centers on their use of LangSmith for testing and benchmarking. Their development workflow leverages LangSmith’s capabilities for:

Performance Tracking: The team tracks model responses and performance metrics over time, enabling them to detect regressions and measure improvements as they iterate on the system.

Prompt Comparison: LangSmith enables side-by-side comparisons of different prompting strategies, including few-shot versus zero-shot approaches and chain-of-thought variants. This systematic experimentation is essential for optimizing production LLM systems.

Sub-Agent Logging: Each sub-agent’s output is logged, allowing the team to catch unexpected reasoning loops or erroneous tool calls. This visibility into the internal workings of the multi-agent system is critical for debugging and quality assurance.

Test-Driven Iteration: When errors are discovered—such as policy misunderstandings or missing steps—the team creates new tests based on LangSmith’s trace logs. They then add more few-shot examples or further decompose sub-problems to address the issues. This iterative, test-driven approach helps maintain velocity while improving system reliability.

Technology Stack and Integration

Minimal built their system using the LangChain ecosystem, with LangGraph serving as the orchestration framework for their multi-agent architecture. The choice of LangGraph was motivated by several factors:

Modularity: LangGraph’s modular design allows flexible management of sub-agents without the constraints of a monolithic framework. This enables customization for specific workflow needs.

Integration Capabilities: The code-like design of the framework facilitated the development of proprietary connectors for e-commerce services including Shopify, Monta Warehouse Management Services, and Firmhouse (for recurring e-commerce).

Future-Proofing: The architecture supports easy addition of new agents and transition to next-generation LLMs. New subgraphs for new tasks can be added and connected back to the Planner Agent without major refactoring.

The system integrates with helpdesk platforms (Zendesk, Front, Gorgias) to provide a unified interface for handling customer queries, operating in either draft mode (co-pilot, where responses are suggested for human review) or fully automated mode.

Production Deployment Considerations

The case study reveals several important production considerations. The system maintains chain-of-thought validation through consolidated logging, which is essential for compliance and auditability in e-commerce contexts where refund and return policies must be followed precisely. The dual-mode operation (draft vs. automated) provides flexibility for businesses with different risk tolerances.

The integration with multiple e-commerce and helpdesk platforms demonstrates the importance of robust connectors and API integrations for production LLM systems. Real-world utility depends not just on the quality of LLM outputs but on the ability to execute actual business operations.

Claimed Results and Caveats

Minimal claims 80%+ efficiency gains across a variety of e-commerce stores and projects that 90% of customer support tickets will be handled autonomously with only 10% escalated to human agents. They report having earned revenue from Dutch e-commerce clients and plan to expand across Europe.

However, these claims should be evaluated carefully. The source is a promotional case study published by LangChain, which has a commercial interest in showcasing successful applications of their ecosystem. No independent verification of the efficiency metrics is provided, and the specific methodology for measuring “efficiency gains” is not detailed. The 90% autonomous resolution target is stated as an expectation rather than a demonstrated achievement.

Lessons for LLMOps Practitioners

This case study illustrates several important patterns for production LLM deployments:

Multi-agent architectures can help manage complexity and improve reliability compared to monolithic approaches
Systematic testing and evaluation infrastructure (like LangSmith) is crucial for iterative improvement
Integration with existing business systems and workflows determines real-world utility
Modular, extensible architectures support future evolution as LLM capabilities advance
Logging and chain-of-thought validation enable debugging and compliance requirements
Dual-mode operation (co-pilot vs. autonomous) provides flexibility for different deployment scenarios

The case study represents an interesting example of a production multi-agent system, though practitioners should seek additional evidence before adopting similar approaches, particularly regarding the claimed efficiency metrics.

Multi-Agent Customer Support System for E-commerce

Industry

Technologies

Overview

The Problem Space

Multi-Agent Architecture

Rationale for Multi-Agent Design

Testing and Evaluation with LangSmith

Technology Stack and Integration

Production Deployment Considerations

Claimed Results and Caveats

Lessons for LLMOps Practitioners

More Like This

Building Production AI at Scale with Internal Tooling and Agent-Based Systems

Agentic AI Copilot for Insurance Underwriting with Multi-Tool Integration

AI Agent Development and Evaluation Platform for Insurance Underwriting