Survey AI Risk framework

Mitigate Evaluate

In this session, our readings cover:

Required Readings:

TrustLLM: Trustworthiness in Large Language Models

A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and the Ugly

More Readings:

Survey of Vulnerabilities in Large Language Models Revealed by Adversarial Attacks

Ignore This Title and HackAPrompt: Exposing Systemic Vulnerabilities of LLMs through a Global Scale Prompt Hacking Competition

Even More:

ACL 2024 Tutorial: Vulnerabilities of Large Language Models to Adversarial Attacks

Generative AI and ChatGPT: Applications, challenges, and AI-human collaboration

NIST AI RISK MANAGEMENT FRAMEWORK




Blog: AI Risk Framework Blog

Introduction and Background

Exploring Crucial Security Research Questions

The Good, The Bad, and The Ugly of LLMs in Security

The ugly: the papers focusing on the discussion of security vulnerabilities and potential defense mechanisms within LLMs.

Vulnerabilities and Defenses Full Diagram

  1. AI-Inherent Vulnerabilities
    • Stem from the very nature and architecture of LLMs.
    • Adversarial attacks refer to strategies used to intentionally manipulate LLMs.
    • Inference attacks exploit unintended information leakage from responses.
    • Extraction attacks attempt to extract sensitive information from training data.
    • Instruction tuning attacks aim to provide explicit instructions during the fine-tuning process.
  2. Non-AI Inherent Vulnerabilities
    • Non-AI inherent attacks encompass external threats and new vulnerabilities LLMs might encounter.
    • Remote Code execution typically target LLMs to execute code arbitrarily.
    • Side channel attacks aim to leak information from the model.
    • Supply chain vulnerabilities refer to the risks that arise from using vulnerable components or services.

Positive and Negative impacts on Security and Privacy

Continuing to cover the Good, Bad, Ugly paper, we now go further into the risks and benefits offered by AI.

Benefits and Opportunities

LLMs for Code Security**

Code security lifecycle -> coding (C ) -> test case generation (TCG) -> execution and monitoring (RE)

  1. Secure Coding (C)
    • Sandoval et al evaluated code written by student programmers when assisted by LLMs
    • Finding; participants assisted by LLMs did not introduce new security risks
  2. Test Case Generating (TCG)
    • Zhang et al. generated security tests (using ChatGPT-4.0) to assess the impact of vulnerable library dependencies on SW applications.
    • Finding: LLMs could successfully generate tests that demonstrated various supply chain attacks, outperforming existing security test generators.

Fuzzing (and its LLM based variations)

Fuzzing is an industry standard technique: for generating test cases. It works by attempting to crash a system or trigger errors by supplying a large volume of random inputs. By tracking which parts of the code are executed by these inputs, code coverage metrics can be calculated.

An effective fuzzer generates semi-valid inputs that are “valid enough” in that they are not directly rejected by the parser, but do create unexpected behaviors deeper in the program and are “invalid enough” to expose corner cases that have not been properly dealt with.

LLM in Running and Execution

  1. Vulnerability detection
    • Noever et. al. : GPT-4 identified approx. 4x vulnerabilities compared to traditional static code analyzers (e.g., Snyk and Fortify)
    • Moumita et al. applied LLMs for software vulnerability detection
      • Finding: Higher False positive rate of LLM
    • Cheshkov et al. point out that the ChatGPT performed no better than a dummy classifier for both binary and multi-label classification tasks in code vulnerability detection
    • DefectHunter: combining LLMs with advanced models (e.g., Conformer) to identify software vulnerabilities effectively.
  2. Malware Detection
    • Henrik Plate et . al. - LLM-based malware detection can complement human reviews but not replace them
      • Observation: use of simple tricks can also deceive the LLM’s assessments.
    • Apiiro - malicious code analysis tool using LLMs
  3. Code fixing
    • ChatRepair: leverages PLMs for generating patches without dependency on bug-fixing datasets.

Note: Malware is the threat while vulnerabilities are exploitable risks and unsecured entry points that can be leveraged by threat actors

Findings of LLM in Code Security

LLMs for Data Security and Privacy

“Privacy” is characterized by scenarios in which LLMs are utilized to ensure the confidentiality of either code or data.

4 aspects:

Negative Impacts on Security and Privacy

NIST AI Risk Management Framework

The National Institute of Standards and Technology (NIST) released an official AI risk management framework early 2023, acknowledging the growing risks and benefits available from AI based technologies across a wide variety of industries and fields. You can find the paper covered in this section here.

Motivation

NIST Risk Definition

“Risk refers to the composite measure of an event’s probability of occurring and the magnitude or degree of the consequences of the corresponding event”

AI Harms

Challenges

Risk Measurement

Risk Tolerance and Prioritization

AI RMF Lifecycle

Lifecycle diagram for AI Systems development, deployment, and impact

Corresponding Table

AI Risks and Trustworthiness

AI RMF Core

Basic system set forth by NIST for managing AI systems in an organization. Divided into four sections:

  1. Govern: Center-most aspect, applies across all others
  2. Map: Gathers information and organize for others
  3. Measure: Quantify risks and other impacts
  4. Manage: Allocate resources, take actions

For further details, see the next section

More on NIST AI RMF

TRUSTLLM: TRUSTWORTHINESS IN LARGE LANGUAGE MODELS

Guidelines and Principles for Trustworthiness Assessment of LLMs

Curated List of LLMs

Assessment of Truthfulness

Assessment of Safety

Assessment of Fairness

Assessment of Robustness

Assessment of Privacy Preservation

Assessment of Machine Ethics

Discussion of Transparency

Discussion of Accountability

Summary of the TrustLLM (Dimensions vs LLMs)

Future Direction and Concluding Notes