U.S. flag

An official website of the United States government

Dot gov

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Https

Secure .gov websites use HTTPS
A lock () or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Breadcrumb

  1. Home

AgentDojo-Inspect

AgentDojo-Inspect is a codebase created by the U.S. AI Safety Institute to facilitate research into agent hijacking and defenses against said hijacking. Agent hijacking is a type of indirect prompt injection [1] in which an attacker inserts malicious instructions into data that may be ingested by an AI agent, causing it to take unintended, harmful actions.AgentDojo-Inspect is a fork of the original AgentDojo repository [2], which was created by researchers at ETH Zurich [3]. This fork extends the upstream AgentDojo in four key ways:1. It adds an Inspect bridge that allows AgentDojo evaluations to be run using the Inspect evaluations framework [4] (see below for more details).2. It fixes some bugs in the upstream AgentDojo's task suites (most of these fixes have been merged upstream). It also removes certain tasks that are of low quality.3. It adds new injection tasks in the Workspace environment that have to do with mass data exfiltration (these have since been merged upstream).4. It adds a new terminal environment and associated tasks that test for remote code execution vulnerabilities in this environment.[1] Greshake K, Abdelnabi S, Mishra S, Endres C, Holz T, Fritz M (2023) Not what you?ve signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection (arXiv), arXiv:2302.12173. https://doi.org/10.48550/arXiv.2302.12173[2] Edoardo Debenedetti (2025) ethz-spylab/agentdojo. Available at https://github.com/ethz-spylab/agentdojo.[3] Debenedetti E, Zhang J, Balunovi? M, Beurer-Kellner L, Fischer M, Tramèr F (2024) AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents (arXiv), arXiv:2406.13352. https://doi.org/10.48550/arXiv.2406.13352[4] UK AI Safety Institute (2024) Inspect AI: Framework for Large Language Model Evaluations. Available at https://github.com/UKGovernmentBEIS/inspect_ai.

About this Dataset

Updated: 2025-04-06
Metadata Last Updated: 2025-02-06 00:00:00
Date Created: N/A
Data Provided by:
Dataset Owner: N/A

Access this data

Contact dataset owner Access URL
Landing Page URL
Table representation of structured data
Title AgentDojo-Inspect
Description AgentDojo-Inspect is a codebase created by the U.S. AI Safety Institute to facilitate research into agent hijacking and defenses against said hijacking. Agent hijacking is a type of indirect prompt injection [1] in which an attacker inserts malicious instructions into data that may be ingested by an AI agent, causing it to take unintended, harmful actions.AgentDojo-Inspect is a fork of the original AgentDojo repository [2], which was created by researchers at ETH Zurich [3]. This fork extends the upstream AgentDojo in four key ways:1. It adds an Inspect bridge that allows AgentDojo evaluations to be run using the Inspect evaluations framework [4] (see below for more details).2. It fixes some bugs in the upstream AgentDojo's task suites (most of these fixes have been merged upstream). It also removes certain tasks that are of low quality.3. It adds new injection tasks in the Workspace environment that have to do with mass data exfiltration (these have since been merged upstream).4. It adds a new terminal environment and associated tasks that test for remote code execution vulnerabilities in this environment.[1] Greshake K, Abdelnabi S, Mishra S, Endres C, Holz T, Fritz M (2023) Not what you?ve signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection (arXiv), arXiv:2302.12173. https://doi.org/10.48550/arXiv.2302.12173[2] Edoardo Debenedetti (2025) ethz-spylab/agentdojo. Available at https://github.com/ethz-spylab/agentdojo.[3] Debenedetti E, Zhang J, Balunovi? M, Beurer-Kellner L, Fischer M, Tramèr F (2024) AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents (arXiv), arXiv:2406.13352. https://doi.org/10.48550/arXiv.2406.13352[4] UK AI Safety Institute (2024) Inspect AI: Framework for Large Language Model Evaluations. Available at https://github.com/UKGovernmentBEIS/inspect_ai.
Modified 2025-02-06 00:00:00
Publisher Name National Institute of Standards and Technology
Contact mailto:[email protected]
Keywords artificial intelligence , ai , agent , security , cybersecurity
{
    "identifier": "ark:\/88434\/mds2-3690",
    "accessLevel": "public",
    "contactPoint": {
        "hasEmail": "mailto:[email protected]",
        "fn": "Tony Wang"
    },
    "programCode": [
        "006:052"
    ],
    "landingPage": "https:\/\/data.nist.gov\/od\/id\/mds2-3690",
    "title": "AgentDojo-Inspect",
    "description": "AgentDojo-Inspect is a codebase created by the U.S. AI Safety Institute to facilitate research into agent hijacking and defenses against said hijacking. Agent hijacking is a type of indirect prompt injection\u00a0[1] in which an attacker inserts malicious instructions into data that may be ingested by an AI agent, causing it to take unintended, harmful actions.AgentDojo-Inspect is a fork of the original AgentDojo repository [2], which was created by\u00a0researchers at ETH Zurich [3]. This fork extends the upstream AgentDojo in four key ways:1. It adds an Inspect bridge that allows AgentDojo evaluations to be run using the Inspect evaluations framework [4] (see below for more details).2. It fixes some bugs in the upstream AgentDojo's task suites (most of these fixes have been merged upstream). It also removes certain tasks that are of low quality.3. It adds new injection tasks in the Workspace environment that have to do with mass data exfiltration (these have since been merged upstream).4. It adds a new terminal environment and associated tasks that test for remote code execution vulnerabilities in this environment.[1] Greshake K, Abdelnabi S, Mishra S, Endres C, Holz T, Fritz M (2023) Not what you?ve signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection (arXiv), arXiv:2302.12173. https:\/\/doi.org\/10.48550\/arXiv.2302.12173[2] Edoardo Debenedetti (2025) ethz-spylab\/agentdojo. Available at https:\/\/github.com\/ethz-spylab\/agentdojo.[3] Debenedetti E, Zhang J, Balunovi? M, Beurer-Kellner L, Fischer M, Tram\u00e8r F (2024) AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents (arXiv), arXiv:2406.13352. https:\/\/doi.org\/10.48550\/arXiv.2406.13352[4] UK AI Safety Institute (2024) Inspect AI: Framework for Large Language Model\u00a0Evaluations. Available at\u00a0https:\/\/github.com\/UKGovernmentBEIS\/inspect_ai.",
    "language": [
        "en"
    ],
    "distribution": [
        {
            "accessURL": "https:\/\/github.com\/usnistgov\/agentdojo-inspect",
            "title": "AgentDojo-Inspect source code (GitHub)"
        }
    ],
    "bureauCode": [
        "006:55"
    ],
    "modified": "2025-02-06 00:00:00",
    "publisher": {
        "@type": "org:Organization",
        "name": "National Institute of Standards and Technology"
    },
    "theme": [
        "Information Technology:Cybersecurity"
    ],
    "keyword": [
        "artificial intelligence",
        "ai",
        "agent",
        "security",
        "cybersecurity"
    ]
}