LLM Security Testing and Risks

by William

Large language models (LLMs) are a type of artificial intelligence (AI) that are trained on massive datasets of text and code. They can be used for a variety of tasks, such as generating text, translating languages, and writing different kinds of creative content. They consequently require LLM Security Testing as with any other code/devices.

LLMs are becoming increasingly popular, but they also pose a number of security risks. These risks can be exploited by attackers to gain unauthorised access to data, steal intellectual property, or launch other attacks.

This article discusses the different LLM security risks and how to test for them. It also provides some best practices for securing LLMs.

What is an LLM?

An LLM is a type of statistical language model that is trained on a massive dataset of text and code. This dataset can be anything from books and articles to code repositories and social media posts.

LLMs are able to learn the statistical relationships between words and phrases in the dataset. This allows them to generate text, translate languages, and answer questions in a way that is similar to how a human would.

LLM Security Risks

LLMs are vulnerable to a number of security risks. These risks can be exploited by attackers to gain unauthorized access to data, steal intellectual property, or launch other attacks.

Some of the most common LLM security risks include:

LLM01: Prompt Injection

This manipulates a large language model (LLM) through crafty inputs, causing unintended actions by the LLM. Direct injections overwrite system prompts, while indirect ones manipulate inputs from external sources.

LLM02: Insecure Output Handling

This vulnerability occurs when an LLM output is accepted without scrutiny, exposing backend systems. Misuse may lead to severe consequences like XSS, CSRF. SSRF. privilege escalation, or remote code execution.

LLM03: Training Data Poisoning

This occurs when LLM training data is tampered, introducing vulnerabilities or biases that compromise security, effectiveness, or ethical behavior. Sources include Common Crawl, WebText, OpenWebText, & books.

LLM04: Model Denial of Service

Attackers cause resource-heavy operations on LLMs, leading to service degradation or high costs. The vulnerability is magnified due to the resource-intensive nature of LLMs and unpredictability of user inputs.

LLM05: Supply Chain Vulnerabilities

LLM application lifecycle can be compromised by vulnerable components or services, leading to security attacks. Using third-party datasets, pre-trained models, and plugins can add vulnerabilities.

LLM06: Sensitive Information Disclosure

LLMs may inadvertently reveal confidential data in its responses, leading to unauthorized data access, privacy violations, and security breaches. It’s crucial to implement data sanitization and strict user policies to mitigate this.

LLM07: Insecure Plugin Design

LLM plugins can have insecure inputs and insufficient access control. This lack of application control makes them easier to exploit and can result in consequences like remote code execution.

LLM08: Excessive Agency

LLM-based systems may undertake actions leading to unintended consequences. The issue arises from excessive functionality, permissions, or autonomy granted to the LLM-based systems.

LLM09: Overreliance

Systems or people overly depending on LLMs without oversight may face misinformation, miscommunication, legal issues, and security vulnerabilities due to incorrect or inappropriate content generated by LLMs.

LLM10: Model Theft

This involves unauthorized access, copying, or exfiltration of proprietary LLM models. The impact includes economic losses, compromised competitive advantage, and potential access to sensitive information.

LLM Security Best Practices

There are a number of best practices that can be followed to improve the security of LLMs. These best practices include:

  • Use strong access controls: LLMs should be protected by strong access controls. This means that only authorized users should be able to access them.
  • Monitor LLM activity: LLM activity should be monitored for suspicious behavior. This could include things like generating large amounts of text or code, or accessing sensitive data.
  • Use a sandbox: LLMs should be used in a sandbox environment. This means that they should be isolated from the rest of the system. This will help to prevent them from causing damage if they are compromised.
  • Regularly update LLMs: LLMs should be regularly updated with the latest security patches. This will help to protect them from known vulnerabilities.
  • Train LLMs on sanitised data: LLMs should be trained on sanitized data. This means that the data should be free of any sensitive information. This will help to prevent the LLM from generating text that contains malicious code or data.

LLM Security Testing

LLMs should be regularly tested for security vulnerabilities. LLM security testing can be done manually or using automated tools.

Manual LLM security testing can be done by penetration testers who will try to exploit the LLM for vulnerabilities. Automated tools can be used to scan the LLM for known vulnerabilities.

It is important to test LLMs for both known and unknown vulnerabilities. Known vulnerabilities can be found in security advisories and vulnerability databases. Unknown vulnerabilities can be found by penetration testers who are familiar with the LLM’s architecture and functionality.

LLM Security Compliance

LLMs should be compliant with relevant security standards and regulations. This includes standards such as the OWASP Top 10 for LLM Applications and the GDPR.

Compliance with these standards will help to ensure that the LLM is properly secured and that it is not vulnerable to attack.

LLM Security Research

LLM security is a rapidly evolving field. As LLMs become more powerful and complex, new security risks are being discovered all the time.

Some of the areas of active research in LLM security testing include:

  • Prompt injection attacks: Researchers are working on developing new techniques to prevent prompt injection attacks. This includes developing new input validation techniques and new ways to train LLMs to be more resistant to these attacks.
  • Data leakage: Researchers are working on developing new ways to protect LLMs from leaking sensitive information. This includes developing new encryption techniques and new ways to train LLMs to be more aware of the sensitive information they are generating.
  • Unauthorised code execution: Researchers are working on developing new ways to prevent LLMs from executing unauthorised code. This includes developing new sandboxing techniques and new ways to train LLMs to be more aware of the code they are executing.
  • Insufficient input validation: Researchers are working on developing new input validation techniques that are specifically designed for LLMs. This includes developing techniques that are able to detect and prevent malicious code or data from being injected into the LLM’s prompts.
  • Security misconfigurations: Researchers are working on developing new ways to prevent LLMs from being exposed to the internet without proper security measures in place. This includes developing new tools and techniques to help developers and system administrators secure LLMs.

Additional Security Risks

Another significant security risk associated with LLMs is the vulnerabilities of plugins, as discussed in a recent article by Embrace The Red. The article explains how the first exploitable Cross Plugin Request Forgery was found in the wild and the fix which was applied. It discusses the reality of Indirect Prompt Injections in the ChatGPT ecosystem and provides a real-world example demonstration with the Expedia plugin. The article also explains a Proof of Concept Exploit involving a malicious website, ChatGPT, and the Zapier plugin, and provides a detailed explanation of the injection payload used.

Another critical vulnerability associated with LLMs is the arbitrary code execution vulnerability in the langchain package, as reported in the Snyk Vulnerability Database. The vulnerability, identified as CVE-2023-29374, is due to the usage of insecure methods exec and eval in LLMMathChain, which is part of the langchain package. Langchain is a tool for building applications with LLMs through composability. The vulnerability allows an attacker to execute arbitrary code by exploiting the insecure methods in LLMMathChain. A proof of concept exploit is provided in the vulnerability report, demonstrating how an attacker could use the calculator app in langchain to import the os library and access the OPENAI_API_KEY environment variable. The vulnerability report recommends upgrading to langchain version 0.0.142 or higher to fix the issue.

A new prompt injection attack on the ChatGPT web version has been described in an article on System Weakness. The attack involves a user copying text from an attacker’s website, which secretly injects a malicious ChatGPT prompt into the copied text. When the user pastes and sends the text to ChatGPT, the malicious prompt tricks ChatGPT into appending a single-pixel image (using markdown) to the chatbot’s answer and adding sensitive chat data as an image URL parameter. This sends the sensitive data to the attacker’s remote server along with the GET request for the image. The attack can be optionally extended to affect all future answers and make the injection persistent. The article provides a detailed explanation of the attack, including a proof-of-concept website, and discusses its limitations and possible consequences, such as sensitive data leakage, inserting phishing links into ChatGPT output, and polluting ChatGPT output with garbage images.

An article on Embrace The Red discusses the untrustworthiness of LLM responses and explores specific threats to chatbots. It highlights the risks of data exfiltration via hyperlinks, LLM tags and mentions of other users, application-specific commands returned by LLM, and other features supported by the client that might be invoked by the response. The article suggests several mitigations, including threat modeling and manual penetration testing, test automation and fuzzing, using a supervisor or moderator, human review, and least privilege permissions. It concludes with a reminder that integration context matters and recommends careful threat modeling and testing of AI model integrations to identify and mitigate input and output injection and elevation opportunities.

The article concludes with suggested mitigation strategies, such as Human in the Loop, a security contract and threat model for plugins, and isolation, and highlights the importance of understanding the threats that plugin developers must mitigate and those that ChatGPT mitigates or is vulnerable to. This highlights the importance of raising awareness among developers and users about the limitations and risks of plugins and the need for more proactive measures from OpenAI to address these issues.


LLMs are powerful tools with a wide range of applications, but they also come with several security risks. By enlisting LLM security testing practices outlined in this article, you can help protect your LLMs from attack and ensure that they are used safely and securely. Regularly updating LLMs, using strong access controls, monitoring LLM activity, and implementing output sanitisation are some of the best practices that can help mitigate these risks. Additionally, it is crucial to be aware of the potential risks associated with plugins and to follow the recommended mitigation strategies to address these vulnerabilities. Ultimately, a proactive and comprehensive approach to LLM security is essential to maximise the benefits of LLMs while minimising the associated risks.

Take the first step towards securing your devices and infrastructure by contacting us for a free consultation. We’ll help you understand your risk landscape and suggest the best course of action tailored to your business requirements and objectives. Get in touch with us today for a free quote via the contact form.

You may also like

Leave a Comment