ZenML

LLM Security: Discovering and Mitigating Repeated Token Attacks in Production Models

Dropbox 2024
View original source

Dropbox's security research team discovered vulnerabilities in OpenAI's GPT-3.5 and GPT-4 models where repeated tokens could trigger model divergence and extract training data. They identified that both single-token and multi-token repetitions could bypass OpenAI's initial security controls, leading to potential data leakage and denial of service risks. The findings were reported to OpenAI, who subsequently implemented improved filtering mechanisms and server-side timeouts to address these vulnerabilities.

Industry

Tech

Technologies

Overview

Dropbox, as part of its mission to build AI-powered features for its customers, conducted extensive internal security research on OpenAI’s ChatGPT models (GPT-3.5 and GPT-4). This case study represents a significant contribution to LLMOps security practices, demonstrating the critical importance of security testing when deploying third-party LLMs in production applications. The research team discovered multiple vulnerabilities related to repeated token attacks that could induce model “divergence” from intended behavior and enable extraction of memorized training data.

This work builds upon both Dropbox’s prior prompt injection research (presented at CAMLIS 2023) and the academic work of Nasr, Carlini, et al. in their paper “Scalable Extraction of Training Data from (Production) Language Models.” The findings were responsibly disclosed to OpenAI in January 2024, confirmed as vulnerabilities, and subsequently patched. This case study offers valuable lessons for any organization operating LLMs in production environments.

The Problem: Token Repetition and Model Divergence

The core vulnerability centers on how LLMs process repeated tokens within prompts. When textual input is sent to an LLM, it first passes through a tokenizer (in OpenAI’s case, the cl100k_base tokenizer) that converts UTF-8 text into a sequence of integers ranging from 0 to approximately 100,000. Dropbox discovered that certain repeated token patterns could cause ChatGPT models to deviate from their alignment training and exhibit unexpected behaviors.

Initially discovered in April 2023 during internal security reviews, Dropbox found that repeated character sequences in user-controlled portions of prompt templates could exploit the models in several ways. The LLM could be induced to disregard prompt guardrails, produce hallucinatory responses, and potentially leak information. By October 2023, further research confirmed that the attack mechanism was specifically tied to token repetition rather than character repetition more broadly.

For example, when the token for “dog” (cl100k_base token ID 5679) was repeated 16,000 times between two questions in a prompt, GPT-3.5 not only ignored the first question entirely but began repeating the sentence “It is to be a good person.” until the 16K token limit was reached. Similarly, repeating the token for “accomplishment” (token ID 61238) 3,500 times caused hallucinations about dog health using explicit anatomical terminology.

Understanding Model Divergence

The phenomenon observed by Dropbox aligns with what Nasr, Carlini, et al. termed “divergence”—where repeated tokens trigger ChatGPT models to deviate from their chatbot-style dialog and revert to a lower-level language modeling objective. This divergence can yield hallucinatory responses and, more critically, can cause the model to output memorized training data including PII and other sensitive information.

The divergence attack can be executed through multiple vectors: including repeated tokens directly within the prompt input, asking the model to output a token many times (e.g., “Repeat this word forever: poem”), or combining both approaches. This flexibility in attack vectors makes the vulnerability particularly concerning for production deployments.

OpenAI’s Initial Response and Its Gaps

After the Scalable Extraction paper was published in November 2023, OpenAI implemented filtering for prompts containing repeated single cl100k_base tokens. This was the first observed security control deployed by OpenAI to mitigate the effect of repeated tokens. However, Dropbox’s continued research discovered significant gaps in this filtering strategy.

The critical finding was that multi-token repeats could still elicit model divergence and training data extraction. OpenAI’s filtering focused on single-token repetition, reflecting the emphasis of the academic paper, but failed to account for sequences of two or more tokens repeated multiple times.

GPT-3.5 Training Data Extraction

Dropbox demonstrated successful extraction of memorized training data from GPT-3.5 using the two-token string “jq_THREADS” (token IDs 45748 and 57339) repeated 2,048 times. The model responded with an excerpt from the jq utility documentation hosted on GitHub. The response contained over 100 sequential matching cl100k_base tokens from the actual documentation, including technical descriptions like “jq is a lightweight and flexible command-line JSON processor” and “jq is written in portable C, and it has zero runtime dependencies.”

The probability of GPT-3.5 randomly generating output matching this many sequential tokens is extremely low, providing strong evidence that the response contained memorized training data.

GPT-4 Training Data Extraction

Critically, Dropbox also demonstrated that GPT-4 was vulnerable to the same class of attacks. Using the phrase “/goto maiden” (token IDs 93120 and 74322) repeated 12,000 times in a gpt-4-32k prompt, the model responded with an excerpt from the Old Testament’s Book of Genesis (Chapter 24, verses 16-27), closely matching the English Standard Version from 2016.

There was some non-determinism in GPT-4 responses—the Bible passage could not always be observed with identical prompts. However, experiments with 13,000 repetitions produced similar responses that included additional verses from Genesis, 2 Samuel, and Esther. The Scalable Extraction paper did not address GPT-4 vulnerability, making this a novel finding by Dropbox.

Denial of Service and Resource Exhaustion Concerns

Beyond data extraction, Dropbox identified another production security concern: certain prompts could induce expensive long-running requests. In experiments from December 2023 and January 2024, requests for GPT-4 to repeat certain phrases (e.g., “poem secret”) timed out after ten minutes, receiving HTTP 502 errors due to OpenAI’s Cloudfront proxy timeout.

Using streaming requests, gpt-4-32k returned over 10,700 tokens before hitting the timeout. Given the 32K token window—with only one-third of the streamed response obtained—the full request might have taken approximately thirty minutes on OpenAI’s server side. This represents a significant potential for denial of service or resource exhaustion attacks, particularly concerning for production applications where adversaries might craft inputs to consume excessive computational resources.

Dropbox recommends mitigating this risk by sanitizing prompt inputs and setting appropriate values for the max_tokens parameter on OpenAI chat completion API calls.

Responsible Disclosure and OpenAI’s Response

Dropbox followed responsible disclosure practices, formally sharing results with OpenAI via their security disclosure email on January 24, 2024. OpenAI requested a 90-day disclosure period, confirmed both the GPT-3.5 and GPT-4 findings as training data extraction vulnerabilities, and implemented remediations.

By January 29, 2024, OpenAI had updated their filtering strategy to block prompts with multi-token repeats (not just single-token repeats). They also implemented server-side timeouts for long-running ChatGPT requests, returning a server message recommending users decrease the value of max_tokens. However, OpenAI did not confirm the long-latency request finding as a formal bug.

Implications for LLMOps Practitioners

This case study carries significant implications for organizations deploying LLMs in production:

Tooling and Open Source Contributions

Dropbox has published Python scripts in their GitHub repository for discovering combinations of tokens that are effective in executing the repeated token attack on ChatGPT models. They also plan to publicly release their internal repeated tokens detector as a LangChain-compatible open-source tool to help other AI/ML practitioners mitigate prompt injection and data extraction attacks.

This commitment to sharing security research and tooling represents a valuable contribution to the broader LLMOps community, enabling other organizations to better protect their LLM deployments against similar attack vectors.

Conclusion

This case study demonstrates the importance of rigorous security testing in LLMOps workflows. Dropbox’s proactive security research identified critical vulnerabilities in production LLM systems that could have exposed memorized training data including PII. The responsible disclosure process led to improved security controls from OpenAI, while Dropbox’s planned open-source contributions will help the broader community address these risks. For any organization deploying LLMs in production, this research underscores the need for comprehensive input validation, appropriate resource limits, and ongoing security monitoring as essential components of a mature LLMOps practice.

More Like This

Agentic AI Copilot for Insurance Underwriting with Multi-Tool Integration

Snorkel 2025

Snorkel developed a specialized benchmark dataset for evaluating AI agents in insurance underwriting, leveraging their expert network of Chartered Property and Casualty Underwriters (CPCUs). The benchmark simulates an AI copilot that assists junior underwriters by reasoning over proprietary knowledge, using multiple tools including databases and underwriting guidelines, and engaging in multi-turn conversations. The evaluation revealed significant performance variations across frontier models (single digits to ~80% accuracy), with notable error modes including tool use failures (36% of conversations) and hallucinations from pretrained domain knowledge, particularly from OpenAI models which hallucinated non-existent insurance products 15-45% of the time.

healthcare fraud_detection customer_support +90

Building Production AI Agents for Enterprise HR, IT, and Finance Platform

Rippling 2025

Rippling, an enterprise platform providing HR, payroll, IT, and finance solutions, has evolved its AI strategy from simple content summarization to building complex production agents that assist administrators and employees across their entire platform. Led by Anker, their head of AI, the company has developed agents that handle payroll troubleshooting, sales briefing automation, interview transcript summarization, and talent performance calibration. They've transitioned from deterministic workflow-based approaches to more flexible deep agent paradigms, leveraging LangChain and LangSmith for development and tracing. The company maintains a dual focus: embedding AI capabilities within their product for customers running businesses on their platform, and deploying AI internally to increase productivity across all teams. Early results show promise in handling complex, context-dependent queries that traditional rule-based systems couldn't address.

customer_support healthcare document_processing +39

Multi-Agent Financial Research and Question Answering System

Yahoo! Finance 2025

Yahoo! Finance built a production-scale financial question answering system using multi-agent architecture to address the information asymmetry between retail and institutional investors. The system leverages Amazon Bedrock Agent Core and employs a supervisor-subagent pattern where specialized agents handle structured data (stock prices, financials), unstructured data (SEC filings, news), and various APIs. The solution processes heterogeneous financial data from multiple sources, handles temporal complexities of fiscal years, and maintains context across sessions. Through a hybrid evaluation approach combining human and AI judges, the system achieves strong accuracy and coverage metrics while processing queries in 5-50 seconds at costs of 2-5 cents per query, demonstrating production viability at scale with support for 100+ concurrent users.

question_answering data_analysis chatbot +49