MOPE: Model Perturbation-based Privacy Attacks on Language Models
Published in EMNLP 2023 Main Conference – Large Language Models and the Future of NLP track, 2023
We show that small, structured perturbations in parameter space reveal whether a sample was part of a language model’s training set.
Recommended citation: Marvin Li*, Jason Wang*, Jeffrey Wang*, and Seth Neel. (2023). "MOPE: Model Perturbation-based Privacy Attacks on Language Models." Proceedings of EMNLP 2023.
Download Paper