FM privacy leakage issues

Mitigate Evaluate

In this session, our readings cover:

Required Readings:

Are Large Pre-Trained Language Models Leaking Your Personal Information?

Privacy Risks of General-Purpose Language Models

More Readings:

Privacy in Large Language Models: Attacks, Defenses and Future Directions

ProPILE: Probing Privacy Leakage in Large Language Models

Blog: FM Privacy Leakage Issues

Section 1 Background and Introduction

Privacy in AI is an emerging field that has seen a rapid increase in relevance as AI technologies have been implemented across more and more industries. Privacy-preserving measures are still relatively new, but improving and adopting them is the key to effectively harnessing the power of Artificial Intelligence.

1. Artificial Intelligence-Generated Content Background and Safety

Wang, T., Zhang, Y., Qi, S., Zhao, R., Xia, Z., & Weng, J. (2023). Security and privacy on generative data in AIGC: A survey. arXiv preprint arXiv:2309.09435.

The process of AIGC:

2. Subclassifications of Security and Privacy on Generative Data

Privacy refers to ensuring that individual sensitive information is protected.

Privacy in AIGC: Generative models may mimic sensitive content, which makes it possible to replicate sensitive training data.

AIGC for privacy: Generative data contains virtual content, replacing the need to use sensitive data for training.

Controllability refers to ensuring effective management and control access of information to restrict unauthorized access.

Access control: Generative data needs to be controlled to prevent negative impacts from adversaries.

Traceability: Generative data needs to support the tracking of the generation process for monitoring any behavior involving security.

Authenticity refers to maintaining the integrity and truthfulness of data.

Generative detection: The ability to detect the difference between generated data and real data.

Generative attribution: Data should be further attributed to generative models to ensure credibility and enable accountability.

Compliance refers to adhering to relevant laws, regulations, and industry standards.

Non-toxicity: generative data is prohibited from containing toxic content.

Factuality: Generative data is strictly factual and should not be illogical or inaccurate.

3. Areas of Concern

While leaking user information is never ideal, some areas are of more concern than others:

4. Defenses: Differential Privacy

Differential privacy safeguards databases and real-time data by perturbing data with noise to ensure observer indistinguishability. This perturbation balances data accuracy and privacy, crucial in sensitive domains like healthcare. Achieving this balance is challenging, particularly in Cyber-Physical Systems (CPSs) where accuracy is paramount. Differential privacy’s efficacy lies in navigating this delicate balance between data accuracy and privacy preservation.

Hassan, M. U., Rehmani, M. H., & Chen, J. (2019). Differential privacy techniques for cyber physical systems: a survey. IEEE Communications Surveys & Tutorials, 22(1), 746-789.

5. Defenses: Distributed Models

By distributing the databases used for a model, risks are much lower for any given attack and many attacks may be outright thwarted. However, analysis on reported data from distributed nodes can still leak information. To combat this, combining with DP allows a federated system that is very private.

Wei, K., Li, J., Ding, M., Ma, C., Yang, H. H., Farokhi, F., … & Poor, H. V. (2020). Federated learning with differential privacy: Algorithms and performance analysis. IEEE Transactions on Information Forensics and Security, 15, 3454-3469.

Section 2 Privacy Risks of General-Purpose Language Models

Despite the utility and performance of general-purpose language models (LMs), they don’t come without privacy risks. The authors of “Privacy Risks of General-Purpose Language Models” (Pan et al., 2020) outline the privacy risks found in earlier general-purpose LMs.

General purpose large language models are becoming increasingly popular and are used for a variety of end purposes due to their flexibility. Despite this, “general-purpose language models tend to capture much sensitive information in the sentence embeddings”. Much of this sensitive information is financial or medical data. In generative AI in the image domain, attacks exist for reconstructing similar source images. These same attacks exist in natural language processing (NLP).

As mentioned previously, model inversion attacks exist for image generators. For example, Fredrickson et al published the following that demonstrates this attack: 

There are also membership inference attacks. For example,  “Membership inference attacks against machine learning models” (Shokri et al. 2017). 

There also exists general ML privacy risks where no specific private data is exposed, rather big data is used to predict unknown private info.

There are several motivations for this study: 

This paper shows how even relatively simple attacks pose a threat in order to better inform the public about the risks of using LLMs with sensitive information. 

The attack the authors use has 3 underlying assumptions: 

  1. The adversary has access to a set of embeddings of plain text, which may contain the sensitive information the adversary is interested in

  2. For simplicity only, we assume the adversary knows which type of pre-trained language models the embeddings come from.

  3. The adversary has access to the pre-trained language model as an oracle, which takes a sentence as input and outputs the corresponding embedding

    1. The format of the plain text is fixed and the adversary knows the generating rules of the plain text.

This image outlines the basics of their attack.

To carry out the attack, 4 steps are taken: 

  1. Create non-sensitive training data approximation (external corpus).

  2. Query model for embeddings using an external corpus.

  3. Using embeddings and labels to train attack model.

  4. Use an attack model to infer sensitive training data.

The authors use this attack methodology to create two case studies that recognize patterns: 

  1. Citizen ID - commonly used, but possibly sensitive

    1. May exist in training data or sensitive data that an organization is using LLMs to process.

    2. Examples include US Social Security numbers, which are considered semi-private.

  2. Genome Sequence - Bert used for splice site predictions

    1. However, DNA can contain indicators for medical conditions, demographic info, etc.

The authors demonstrate high accuracy in recovering the private information of citizens. This is done by generating 1000 citizen IDs that contain private information using a defined schema.  These IDs are used to query the target model to get embeddings for the victims. This method successfully identifies the specific month and day of the victim’s birthday with more than 80% accuracy on the first attempt and determines the complete birth date with over 62% accuracy within the top five attempts.

For the second case study, the authors demonstrate being able to accurately recover genomes on various nucleotide positions.

The authors also conduct two case studies involving keyword inference. The first involves airline reviews providing info on travel plans and the second involves medical descriptions providing sensitive health information. From these, the authors conclude the following: 

From this study, the authors find 4 main defense strategies that can be used: 

In conclusion, the following points made from this study: 

  1. There are serious risks of leaking private data from training/backend inputs for LLMs.

  2. Attacks against even black-box systems are relatively effective without further defensive measures.

  3. Existing defenses against keyword inference and pattern-matching attacks on NLP models are possibly sufficient.

    1. However, awareness and widespread adoption are majorly lacking.

Section 3 Are Large Pre-Trained Language Models Leaking Your Personal Information?

This paper (Huang et al, 2022), explores how pre-trained large language models (PLMs) are prone to leak user information, particularly email addresses, due to PLMs’ capacity to memorize and associate data. 

The authors conduct a 2 part attack task. The first part, given an email address context, examines whether the model can recover the email address. The second part queries PLMs for an associated email address, given an owner’s name. For this, the Enron corpus of email dresses and names is used. 

This study attempts to measure memorization and associations of PLMs. To measure memorization, the prefix of a sequence is inputted to the PLM. To measure association, four prompts (as shown in the figure above) are used to extract the target email address.

From measuring memorization and association, the authors conclude that PLMs can memorize information well, but cannot associate well.

The author’s experiments also show that the more knowledge the PLM gets, the likelihood of the attack being successful increases. The same trend is observed when the PLM is larger.

Despite PLMs being vulnerable to leaking private data, they are still relatively safe when training data is public and private: 

Additionally, if the attacker already finds the context, they can simply get the email address after the context without the help of PLMs.

To mitigate PLM vulnerabilities the authors recommend pre and post-processing: 

The authors conclude that PLMs do leak personal information due to memorization, however, since the models are weak at the association, the risk of specific personal information being extracted by attackers is low.

Section 4 Privacy in Large Language Models: Attacks, Defenses, and Future Directions

“Privacy in Large Language Models: Attacks, Defenses, and Future Directions” (Li et al., 2023) analyzes current privacy attacks on LLMs, discusses defense strategies, highlights emerging concerns, and suggests areas for future research.

There are 3 motivations for this work: 

Taxonomy of attacks this paper covers.

Backdoor attacks involve adversaries activating hidden triggers in models or datasets to manipulate outputs or compromise fine-tuned language models by releasing poisoned pre-trained LLMs.

Prompt injection attacks involve injecting or manipulating malicious content into the prompt to influence the model to output an unwanted output.

Training data extraction attacks involve prompting the LLM to recover data is likely memorized training data.

 Membership inference attacks are are attacks that attempt to determine a if data was used to train the LLM. 

Attacks with extra information use model embeddings to recover an input’s sensitive attributes or to recover the original input of the embedding. Gradient leakage could be used to recover input texts. 

Other types of attacks include prompt extraction attacks, adversarial attacks, side channel attacks, and decoding algorithm stealing. 

In addition to these attacks the authors also outline some privacy defences.

Federated learning can train LLMs in a collaborative manner without sharing private data. 

Additionally, defenses can be specific to a type of attack such as backdoor attacks or data extraction attacks. 

The authors point out two limitations they observe: 

  1. Impracticability of Privacy Attacks.

  2. Limitations of Differential Privacy Based LLMs

They also recommend the following future works: 

  1. Ongoing Studies about Prompt Injection Attacks

  2. Future Improvements on SMPC (Secure Multi-Party Computation)

  3. Privacy Alignment to Human Perception

  4. Empirical Privacy Evaluation

In conclusion this survey lists existing privacy attacks and defenses in LMs and LLMs and critiques the limitations of these approaches and suggests future directions for privacy studies in language models.