ZenML

Privacy-Preserving LLM Usage Analysis System for Production AI Safety

Anthropic 2024
View original source

Anthropic developed Clio, a privacy-preserving analysis system to understand how their Claude AI models are used in production while maintaining strict user privacy. The system performs automated clustering and analysis of conversations to identify usage patterns, detect potential misuse, and improve safety measures. Initial analysis of 1 million conversations revealed insights into usage patterns across different languages and domains, while helping identify both false positives and negatives in their safety systems.

Industry

Tech

Technologies

Overview

Anthropic’s Clio (Claude insights and observations) represents a significant advancement in LLMOps tooling for understanding real-world language model usage at scale while preserving user privacy. The system addresses a fundamental challenge in deploying LLMs: how to gain meaningful insights into actual usage patterns for safety, product development, and governance purposes without compromising user privacy. This case study demonstrates how an LLM provider can use their own model as part of a sophisticated analysis pipeline for production monitoring and safety enforcement.

The Problem

Despite the rapid adoption of large language models, AI providers have historically had limited visibility into how their models are actually being used in production. This knowledge gap creates several challenges:

Anthropic explicitly frames this as a dual responsibility: maintaining system safety while protecting user privacy. Clio is their attempt to demonstrate these goals are not mutually exclusive.

Technical Architecture

Clio employs a multi-stage automated pipeline that is notable for being powered entirely by Claude itself, rather than human analysts. This design choice is intentional and central to the privacy-preserving nature of the system.

Stage 1: Facet Extraction

For each conversation in the dataset, Clio extracts multiple “facets” which are specific attributes or metadata. These include conversation topics, the number of back-and-forth turns, the language used, and other relevant dimensions. Claude is instructed to extract this information while omitting private details—a form of automated anonymization at the extraction layer.

Stage 2: Semantic Clustering

Similar conversations are automatically grouped together by theme or general topic using semantic similarity. This aggregation step is critical for privacy because it moves from individual conversations to patterns across many users, making it impossible to identify specific individuals or conversations.

Stage 3: Cluster Description

Each resulting cluster receives a descriptive title and summary that captures common themes from the underlying data while explicitly excluding private or identifying information. This abstraction layer ensures that what human analysts eventually see is already sanitized and aggregated.

Stage 4: Hierarchical Organization

Clusters are organized into a multi-level hierarchy that enables easier exploration and navigation. An interactive interface allows authorized Anthropic analysts to explore patterns across different dimensions such as topic, language, and other facets.

Privacy Protections: Defense in Depth

The system implements multiple layers of privacy protection, reflecting a “defense in depth” approach:

The research paper accompanying this announcement includes extensive privacy validation testing, suggesting Anthropic has invested significant effort in ensuring these protections work as intended.

Production Insights and Findings

Anthropic analyzed 1 million conversations from claude.ai (both Free and Pro tiers) to understand actual usage patterns. This empirical, bottom-up approach differs significantly from curated public datasets like WildChat or LMSYS-Chat-1M, which capture only specific contexts and user bases.

Usage Distribution

The analysis revealed several key findings:

Cross-Language Variation

Usage patterns varied considerably across languages, reflecting different cultural contexts and needs. By calculating base rates for each language, Clio could identify topics where a given language appeared much more frequently than expected, revealing language-specific usage patterns for Spanish, Chinese, Japanese, and other languages.

Trust and Safety Applications

Perhaps the most operationally significant application of Clio is its integration with Anthropic’s Trust and Safety enforcement systems. This represents a practical LLMOps use case where the tool actively improves production safety measures.

Detecting Coordinated Misuse

Clio proved effective at identifying patterns of coordinated, sophisticated misuse that would be invisible when examining individual conversations. In late September 2024, Anthropic identified a network of automated accounts using similar prompt structures to generate spam for search engine optimization. While no individual conversation violated usage policies, the coordinated pattern across accounts revealed prohibited platform abuse, and the account network was removed. Similar techniques identified attempts to resell unauthorized access to Claude.

High-Stakes Event Monitoring

Clio enables enhanced monitoring during periods of uncertainty or high-stakes events. Use cases included:

This capability to detect “unknown unknowns” complements proactive safety measures and enables rapid response to emerging challenges.

Improving Classifier Performance

A particularly valuable LLMOps application is using Clio to identify weaknesses in existing Trust and Safety classifiers:

False Negatives: Clio identified cases where existing classifiers failed to flag violating content, particularly in translation scenarios where harmful content in one language was translated to another.

False Positives: Clio helped identify benign content incorrectly flagged as harmful, including job seekers requesting resume advice (flagged due to personal information), programming questions about security/networking/web scraping (misidentified as hacking), and Dungeons & Dragons combat statistics (triggering harm detection systems).

The overall correlation between Trust and Safety classifiers and Clio classifications was r = 0.71, indicating broad agreement but with meaningful opportunities for improvement in both directions.

Ethical Considerations and Mitigations

Anthropic explicitly acknowledges ethical considerations and describes their mitigations:

Limitations and Balanced Assessment

While Anthropic presents Clio as a significant advancement, several limitations and considerations warrant attention:

The system’s effectiveness depends entirely on Claude’s ability to correctly extract, cluster, and summarize information while filtering private details. As with any automated system, edge cases and failure modes likely exist that haven’t been fully characterized.

The minimum threshold requirements for cluster exposure may create blind spots for detecting rare but serious individual misuse patterns—precisely the cases that might be most harmful.

The claim that privacy and safety are not mutually exclusive is demonstrated for this specific implementation, but the trade-offs may not generalize to all contexts or all desired analyses.

Finally, while the transparency of this publication is commendable, users cannot independently verify the privacy protections are working as described, requiring trust in Anthropic’s implementation and governance.

Implications for LLMOps Practice

Clio demonstrates several important patterns for LLMOps practitioners:

Using LLMs themselves as part of the observability and monitoring stack is a viable approach that can enable analyses not possible with traditional tools. The multi-stage pipeline with privacy checks at each layer provides a template for building similar systems. The integration with Trust and Safety enforcement shows how monitoring insights can drive operational improvements. The hierarchical clustering approach enables exploration at multiple levels of granularity, from high-level categories to specific niches.

The system also demonstrates the value of bottom-up, empirical approaches to understanding LLM usage compared to relying solely on anticipated use cases during development.

More Like This

Scaling Agentic AI for Digital Accessibility and Content Intelligence

Siteimprove 2025

Siteimprove, a SaaS platform provider for digital accessibility, analytics, SEO, and content strategy, embarked on a journey from generative AI to production-scale agentic AI systems. The company faced the challenge of processing up to 100 million pages per month for accessibility compliance while maintaining trust, speed, and adoption. By leveraging AWS Bedrock, Amazon Nova models, and developing a custom AI accelerator architecture, Siteimprove built a multi-agent system supporting batch processing, conversational remediation, and contextual image analysis. The solution achieved 75% cost reduction on certain workloads, enabled autonomous multi-agent orchestration across accessibility, analytics, SEO, and content domains, and was recognized as a leader in Forrester's digital accessibility platforms assessment. The implementation demonstrated how systematic progression through human-in-the-loop, human-on-the-loop, and autonomous stages can bridge the prototype-to-production chasm while delivering measurable business value.

content_moderation summarization classification +39

Agentic AI Copilot for Insurance Underwriting with Multi-Tool Integration

Snorkel 2025

Snorkel developed a specialized benchmark dataset for evaluating AI agents in insurance underwriting, leveraging their expert network of Chartered Property and Casualty Underwriters (CPCUs). The benchmark simulates an AI copilot that assists junior underwriters by reasoning over proprietary knowledge, using multiple tools including databases and underwriting guidelines, and engaging in multi-turn conversations. The evaluation revealed significant performance variations across frontier models (single digits to ~80% accuracy), with notable error modes including tool use failures (36% of conversations) and hallucinations from pretrained domain knowledge, particularly from OpenAI models which hallucinated non-existent insurance products 15-45% of the time.

healthcare fraud_detection customer_support +90

AI-Powered Content Curation for Financial Crime Detection

LSEG 2025

London Stock Exchange Group (LSEG) Risk Intelligence modernized its WorldCheck platform—a global database used by financial institutions to screen for high-risk individuals, politically exposed persons (PEPs), and adverse media—by implementing generative AI to accelerate data curation. The platform processes thousands of news sources in 60+ languages to help 10,000+ customers combat financial crime including fraud, money laundering, and terrorism financing. By adopting a maturity-based approach that progressed from simple prompt-only implementations to agent orchestration with human-in-the-loop validation, LSEG reduced content curation time from hours to minutes while maintaining accuracy and regulatory compliance. The solution leverages AWS Bedrock for LLM operations, incorporating summarization, entity extraction, classification, RAG for cross-referencing articles, and multi-agent orchestration, all while keeping human analysts at critical decision points to ensure trust and regulatory adherence.

fraud_detection regulatory_compliance content_moderation +33