ZenML

Scaling Enterprise RAG with Advanced Vector Search Migration

Danswer 2024
View original source

Danswer, an enterprise search solution, migrated their core search infrastructure to Vespa to overcome limitations in their previous vector database setup. The migration enabled them to better handle team-specific terminology, implement custom boost and decay functions, and support multiple vector embeddings per document while maintaining performance at scale. The solution improved search accuracy and resource efficiency for their RAG-based enterprise search product.

Industry

Tech

Technologies

Overview

Danswer is an open-source enterprise search platform that connects disparate knowledge sources within organizations—including Google Drive, Slack, Salesforce, and other enterprise tools—and makes this information accessible through a unified search and chat interface powered by GenAI. The company’s core value proposition centers on making Large Language Models more intelligent by providing them with relevant organizational context through Retrieval Augmented Generation (RAG). This case study documents their decision to migrate from their previous vector database infrastructure to Vespa, a more capable search engine, to meet the demands of enterprise-scale deployments.

The Problem: Scaling RAG for Enterprise

As Danswer scaled to serve enterprise customers with millions of documents, they encountered significant limitations with their previous search infrastructure. Their original architecture used a vector-only search approach, which proved insufficient for handling the nuances of enterprise search queries. A critical discovery was that team-specific terminology—internal project names, product codenames, and organizational jargon—could not be adequately captured by deep learning embedding models alone. Terms like internal project names had no general English representation that embedding models could meaningfully encode, leading to poor search results for queries containing such terminology.

To address this, Danswer initially added a separate keyword search component alongside their vector search. However, this dual-system approach created new problems around weighting and normalization. When vector similarity scores and keyword relevance scores come from completely separate systems, combining them into a unified ranking becomes problematic. The lack of a coherent normalization strategy across search types degraded the overall search quality.

Technical Challenges and Solutions

Hybrid Search with Proper Normalization

One of the key technical requirements was the ability to combine vector-based semantic search with traditional keyword search in a meaningful way. Vespa’s architecture allows for easy normalization across multiple search types within a single query execution, enabling Danswer to finally achieve the search accuracy they needed. This hybrid approach means that queries benefit from both the semantic understanding of embeddings and the precision of exact keyword matching, with proper score normalization ensuring that results from both approaches are fairly ranked together.

Time-Based Decay Functions

Enterprise document collections often suffer from version control issues—multiple versions of the same document may exist with conflicting information, and outdated documents aren’t always properly archived or deleted. Danswer’s users requested the ability to apply time-based decay to document relevance, so that documents that haven’t been touched or read for extended periods would be ranked lower. This requirement translated to needing flexible document ranking functions that could incorporate a “time last touched” attribute and apply decay calculations based on the difference between that timestamp and the current query time.

Vespa’s ranking expression language proved to be one of the most flexible options available for implementing such custom ranking functions. The platform even provided documentation and examples for this exact use case, significantly simplifying the implementation. This flexibility in ranking expressions is a key differentiator for production RAG systems where relevance needs to account for factors beyond just semantic similarity.

Multi-Pass Indexing with Multiple Vector Embeddings

To capture both overarching document context and specific details, Danswer implemented a multi-pass indexing approach. Each document is split into different sections for processing, with each pass using a different context window size. This creates multiple vector embeddings per document—one representing the broader context and others representing specific chunks or sections.

According to the case study, Vespa is the only hybrid search engine capable of storing and querying multiple vector embeddings for a single document without requiring document duplication. This optimization is particularly important for Danswer’s deployment model. Since Danswer is primarily a self-hosted solution (chosen by enterprises for data security reasons), reducing resource requirements is crucial. The ability to avoid duplicating documents for every chunk and every context size significantly reduces the compute and storage resources needed to serve the document index. This efficiency enables more organizations to deploy Danswer even without access to powerful servers or large cloud computing budgets.

Technology Selection Process

As an open-source project themselves, Danswer limited their evaluation to self-hosted search engine options. Their previous stack consisted of two separate search engines—one for vector search and one for keyword search—both relatively new players in the space. While these newer solutions were surprisingly stable, they were designed primarily for ease of initial setup rather than advanced functionality.

Danswer attempted workarounds for their limitations, such as applying time decay as a post-processing step after initial search. However, these approaches suffered accuracy degradation once document scales exceeded several million, as the initial retrieval step couldn’t account for all ranking factors.

The evaluation narrowed to four established projects: OpenSearch, Elasticsearch, Weaviate, and Vespa. Vespa was selected based on several factors:

Deployment Considerations and Challenges

The case study honestly acknowledges that Vespa is developer-facing software with a learning curve. The flexibility that enables advanced features comes with inherent complexity in configuration, deployment, and query/indexing options. This makes it less suitable for rapid prototyping but more appropriate for production systems requiring fine-grained control.

Danswer notes they are still in the process of fully understanding Vespa’s multi-node Kubernetes deployments for their self-hosted installations. For their managed cloud offering (Danswer Cloud), they are migrating from self-managed Vespa on AWS to Vespa Cloud, a managed service where infrastructure complexity is handled by the Vespa team. This dual deployment strategy—self-hosted for security-conscious enterprises and managed cloud for others—represents a common pattern in enterprise LLM applications.

LLMOps Implications

This case study highlights several important considerations for production RAG systems:

The migration from a simpler but limited search stack to Vespa represents a common pattern in LLMOps: initial implementations optimize for speed-to-market, but production requirements eventually demand more sophisticated infrastructure. The willingness to “rip out the core” of their previous stack demonstrates the importance of search quality to their product’s success.

More Like This

Fine-tuning Custom Embedding Models for Enterprise Search

Glean 2023

Glean implements enterprise search and RAG systems by developing custom embedding models for each customer. They tackle the challenge of heterogeneous enterprise data by using a unified data model and fine-tuning embedding models through continued pre-training and synthetic data generation. Their approach combines traditional search techniques with semantic search, achieving a 20% improvement in search quality over 6 months through continuous learning from user feedback and company-specific language adaptation.

document_processing question_answering unstructured_data +32

Agentic AI Copilot for Insurance Underwriting with Multi-Tool Integration

Snorkel 2025

Snorkel developed a specialized benchmark dataset for evaluating AI agents in insurance underwriting, leveraging their expert network of Chartered Property and Casualty Underwriters (CPCUs). The benchmark simulates an AI copilot that assists junior underwriters by reasoning over proprietary knowledge, using multiple tools including databases and underwriting guidelines, and engaging in multi-turn conversations. The evaluation revealed significant performance variations across frontier models (single digits to ~80% accuracy), with notable error modes including tool use failures (36% of conversations) and hallucinations from pretrained domain knowledge, particularly from OpenAI models which hallucinated non-existent insurance products 15-45% of the time.

healthcare fraud_detection customer_support +90

Scaling Generative AI for Manufacturing Operations with RAG and Multi-Model Architecture

Georgia-Pacific 2025

Georgia-Pacific, a forest products manufacturing company with 30,000+ employees and 140+ facilities, deployed generative AI to address critical knowledge transfer challenges as experienced workers retire and new employees struggle with complex equipment. The company developed an "Operator Assistant" chatbot using AWS Bedrock, RAG architecture, and vector databases to provide real-time troubleshooting guidance to factory operators. Starting with a 6-8 week MVP deployment in December 2023, they scaled to 45 use cases across multiple facilities within 7-8 months, serving 500+ users daily with improved operational efficiency and reduced waste.

chatbot question_answering document_processing +41