MOPE: Model Perturbation-based Privacy Attacks on Language Models

Published in EMNLP 2023 Main Conference – Large Language Models and the Future of NLP track, 2023

We show that small, structured perturbations in parameter space reveal whether a sample was part of a language model’s training set.

Recommended citation: Marvin Li*, Jason Wang*, Jeffrey Wang*, and Seth Neel. (2023). "MOPE: Model Perturbation-based Privacy Attacks on Language Models." Proceedings of EMNLP 2023.
Download Paper