This is the code repository for the ACL Findings paper: DP-MLM: Differentially Private Text Rewriting Using Masked Language Models
In this repository, you will find a requirements.txt
file, which contains all necessary Python dependencies.
Otherwise, there are two main files, both of which arte easily importable and reusable:
DPMLM.py
: code for running theDP-MLM
mechanism.privatize
replaces a single token, whiledpmlm_rewrite
will rewrite an entire text.LLMDP.py
: implementations of bothDP-Paraphrase
andDP-Prompt
. Note that forDP-Prompt
, you will need to download the corresponding LMs, i.e., from Hugging Face.
M = DPMLM.DPMLM()
M.dpmlm_rewrite("hello world", epsilon=100)
M = LLMDP.DPPrompt()
M.privatize("hello world", epsilon=100)
In order to use LLMDP.DPParaphrase
, you must download the fine-tuned model directory.
This can be found at the following link: Model
Also, you will need to download the wordnet 2022 corpus: python -m wn download oewn:2022
Finally, each code implementation sets specific clipping bounds, which was done for the purposes of comparable evaluation in the paper. These can be freely changed in the parameters, and should be experimented with for (possibly) better performance.