Membership Privacy Risks in LLMs

Exposing privacy vulnerabilities in Large Language Models, honored with an oral presentation at the 2025 Conference on Empirical Methods in Natural Language Processing.

This post describes work by Hongyan Chang, Ali Shahin Shamsabadi, Kleomenis Katevas, and Hamed Haddadi, from Brave, and Reza Shokri, from NUS.

Summary

Large Language Models (LLMs) can memorize and leak sensitive training data, posing serious privacy risks. To assess such memorization and information leakage, we introduced CAMIA (Context-Aware Membership Inference Attack), the first method tailored to the generative nature of LLMs, which nearly doubles detection accuracy compared to prior approaches and reveals where memorization truly occurs.

Why should we care about privacy in LLMs?

LLMs are increasingly embedded in products we use every day—from chatbots and virtual assistants to search engines and productivity tools. With this integration comes a critical question: can these models inadvertently leak the data they were trained on?

Consider a few real-world scenarios:

  • Healthcare: If clinical notes were part of training, a model might accidentally reveal sensitive patient information.
  • Enterprise: If internal emails or documents were included, an attacker could trick the model into reproducing private communications. For instance, LinkedIn recently announced that they plan to use user data to improve their generative AI models (including LLMs), raising concerns about whether internal communications or private content might appear in generated outputs.
  • Education and media: If exam questions or paywalled content appeared in training, the model might regurgitate them verbatim.

These risks matter not just for individual users, but also for institutions facing regulatory, ethical, and reputational stakes. Privacy lapses could violate data protection laws, copyright rules, or undermine trust in deployed AI systems.

How can we measure privacy risks in LLMs?

Membership inference attacks (MIAs) are designed to assess model memorization by testing whether a given data point was part of the training set. In simple terms, an adversary asks: "did the model see this example during training?" If the answer can be reliably inferred, the model is leaking information about its training data—a direct privacy risk.

The core intuition is that machine learning models often behave differently on training samples versus unseen samples. These behavioral differences can appear in loss values, confidence scores, prediction stability, or other model outputs. MIAs systematically exploit these gaps: if an adversary can distinguish members from non-members based on such signals, it shows that the model is memorizing and leaking training data.

Illustration of a membership inference attack: adversary tests if a particular record (i.e., “Bob's” data) was used in training (H₀ = not used, H₁ = used) by probing the model.

One way to think about this is as a hypothesis test:

  • H₀: Bob’s record was not used in training.
  • H₁: Bob’s record was used in training.

By probing the model with Bob’s record and observing its behavior, the adversary tries to decide which hypothesis is more likely. If the model’s responses give away this information, it means training data membership can be inferred, revealing a concrete privacy risk.

What are the challenges in measuring privacy risks in LLMs?

A simple instantiation of membership inference is the loss-threshold attack, where a sample is classified as a member if the model loss on it falls below a predefined threshold. While this basic method works in many settings, more advanced MIAs probe richer aspects of model behavior—such as output entropy, prediction dynamics, or shadow models—to achieve stronger inference.

Overlapping curves show model loss for members (left, lower loss) and non-members (right, higher loss). A dashed line marks the threshold: points left are predicted as members, right as non-members.

In the context of LLMs, a straightforward adaptation is to compute the model’s loss on a target sentence. If the loss is unusually low compared to the non-training text, this suggests that the sentence might have been memorized. While such naïve adaptations can already reveal leakage, they fall short when applied to LLMs.

This is because most MIAs were originally designed for classification models, which output a single prediction per input. LLMs, however, are generative: they produce text token-by-token, and each prediction is conditioned on the prefix of all preceding tokens. This sequential structure makes memorization context-dependent, meaning that simply aggregating losses across an entire sequence misses the crucial token-level dynamics that drive leakage.

Comparison of classification models and autoregressive LLMs. The classification model predicts a single label (e.g., “Dog” for a dog image), while the LLM predicts a sequence of tokens (e.g., “Harry Potter is…”). Loss is measured at each prediction step.

Consider the example in the graphic above:

  • When the prefix already contains strong cues—e.g. “Harry Potter is…written by… The world of Harry…”—the model confidently predicts the next token “Potter.” Here, the low loss is due to the prefix providing enough context, not because the model memorized the training instance.
  • By contrast, if the prefix is just “Harry,” predicting “Potter” requires much more reliance on memorized training sequences. In this case, the model’s low loss is a stronger indicator of membership.

Our method: CAMIA (Context-Aware Membership Inference Attack) 

Our key insight is that memorization in LLMs is context-dependent.

  • When the prefix provides clear guidance—for example, through repetition or strong overlap with the next tokens—the model can generalize without relying on memorization.
  • When the prefix is ambiguous or complex, the model becomes uncertain, and in these cases, it is more likely to fall back on memorized training sequences.

Therefore, an effective membership inference attack should not depend solely on overall sequence loss, but should explicitly capture how context shapes predictive uncertainty at the token level.

To combat this, we introduced CAMIA (Context-Aware Membership Inference Attack), a new approach that tracks how uncertainty evolves during text generation. CAMIA can:

  • Measure how quickly uncertainty is resolved across prefixes, revealing when the model transitions from “guessing” to “confident recall.”
  • Adjust for cases where uncertainty is artificially reduced by repetition or trivial patterns.
  • Operate at the token level instead of relying on a single static loss threshold.

By focusing on these contextual dynamics, CAMIA uncovers memorization behaviors in LLMs that traditional MIAs fail to detect.

Our results

On the MIMIR benchmark, across six Pythia (70M–12B parameters) and GPT Neo (125M–2.7B parameters) models and six domains including Web, Wikipedia, Medical, News, Mathematics, Arxiv, and GitHub, CAMIA consistently outperforms existing attacks. 

CAMIA is effective: It increases the true positive rate from 20.11% to 32.00% while keeping the false positive rate at 1% (Higher TPR and Lower FPR indicate better attack performance) when applied to Pythia 2.8B on the ArXiv dataset.

CAMIA is computationally efficient: It requires only the calculation and composition of membership signals. Evaluating 1,000 samples from the Arxiv dataset using a single A100 GPU, CAMIA completes in approximately 38 minutes.

CAMIA is open-sourced

CAMIA is accepted at the 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP 2025) as an oral presentation and is nominated for the outstanding paper award. CAMIA will be presented at the conference between November 4th and 9th, 2025, in Suzhou, China.

CAMIA is also available as an open-source implementation.

Ready for a better Internet?

Brave’s easy-to-use browser blocks ads by default, making the Web faster, safer, and less cluttered for people all over the world.