Dataset Search
Sort By
Search results
54 results found
Human Observational Data in a Production Environment
Data provided by National Institute of Standards and Technology
A heterogeneous dataset of human measurement data and human-generated text. This dataset was generated by TechSolve Inc. (techsolve.org) as a collaborative effort with NIST. Respondents were asked to observe and evaluate a machining process in which a rotary bit (the "tool") removed layers of a workpiece until the tool was worn to exhaustion. One trial and 19 official experiments were completed, one for each of 20 tools.
Tags: language processing,technical language processing,text,manufacturing,machining,process measurement and control,image and signal processing,
Modified: 2025-04-06
Trojan Detection Software Challenge - llm-pretrain-apr2024-train
Data provided by National Institute of Standards and Technology
TrojAI llm-pretrain-apr2024 Train DatasetThis is the training data used to create and evaluate trojan detection software solutions. This data, generated at NIST, consists Llama2 Large Language Models refined using fine-tuning and LoRA to perform next token prediction. A known percentage of these trained AI models have been poisoned with triggers which induces modified behavior. This data will be used to develop software solutions for detecting which trained AI models have been poisoned via embedded triggers into the model weights.
Tags: Trojan Detection; Artificial Intelligence; AI; Machine Learning; Adversarial Machine Learning;,
Modified: 2025-04-06
Trojan Detection Software Challenge - cyber-network-c2-mar2024-train
Data provided by National Institute of Standards and Technology
TrojAI cyber-network-c2-mar2024 Train DatasetThis is the training data used to create and evaluate trojan detection software solutions. This data, generated at NIST, consists of ResNet18 and ResNet34 neural network models that classify botnet command and control (c2) and benign network traffic packets trained on the USTC-TFC2016 dataset. A known percentage of these trained AI models have been poisoned with a known trigger which induces incorrect behavior. This data will be used to develop software solutions for detecting which trained AI models have been poisoned via embedded triggers.
Tags: Trojan Detection; Artificial Intelligence; AI; Machine Learning; Adversarial Machine Learning;,
Modified: 2025-04-06
"pyproject2conda": A script to convert `pyproject.toml` dependencies to `environemnt.yaml` files.
Data provided by National Institute of Standards and Technology
The main goal of `pyproject2conda` is to provide a means to keep all basicdependency information, for both `pip` based and `conda` based environments, in`pyproject.toml`. I often use a mix of pip and conda when developing packages,and in my everyday workflow. Some packages just aren't available on both. The application provides a simple comment based syntax to add information to dependencies when creating `environment.yaml`. This package is actively used by the author, but is still very much a work inprogress.
Tags: python,devoloper tool,python packaging,
Modified: 2025-04-06
Trojan Detection Software Challenge - rl-randomized-lavaworld-aug2023-train
Data provided by National Institute of Standards and Technology
Round rl-randomized-lavaworld-aug2023-train Train DatasetThis is the training data used to create and evaluate trojan detection software solutions. This data, generated at NIST, consists of Reinforcement Learning agents trained to navigate the Lavaworld Minigrid environment. A known percentage of these trained AI models have been poisoned with a known trigger which induces incorrect behavior. This data will be used to develop software solutions for detecting which trained AI models have been poisoned via embedded triggers.
Tags: Trojan Detection; Artificial Intelligence; AI; Machine Learning; Adversarial Machine Learning;,
Modified: 2025-04-06
Trojan Detection Software Challenge - nlp-question-answering-aug2023-train
Data provided by National Institute of Standards and Technology
nlp-question-answering-aug2023-trainThis is the train data used to evaluate trojan detection software solutions. This data, generated at NIST, consists of natural language processing (NLP) AIs trained to perform extractive question answering on English text. A known percentage of these trained AI models have been poisoned with a known trigger which induces incorrect behavior. This data will be used to develop software solutions for detecting which trained AI models have been poisoned via embedded triggers.
Tags: Trojan Detection; Artificial Intelligence; AI; Machine Learning; Adversarial Machine Learning;,
Modified: 2025-04-06
Trojan Detection Software Challenge - rl-lavaworld-jul2023-train
Data provided by National Institute of Standards and Technology
Round rl-lavaworld-jul2023-train Train DatasetThis is the training data used to create and evaluate trojan detection software solutions. This data, generated at NIST, consists of Reinforcement Learning agents trained to navigate the Lavaworld Minigrid environment. A known percentage of these trained AI models have been poisoned with a known trigger which induces incorrect behavior. This data will be used to develop software solutions for detecting which trained AI models have been poisoned via embedded triggers.
Tags: Trojan Detection; Artificial Intelligence; AI; Machine Learning; Adversarial Machine Learning;,
Modified: 2025-04-06
Trojan Detection Software Challenge - cyber-apk-nov2023-train
Data provided by National Institute of Standards and Technology
TrojAI cyber-apk-nov2023 Train DatasetThis is the training data used to create and evaluate trojan detection software solutions. This data, generated at NIST, consists of small feed forward multi-layer perceptron type neural network models classifying APK feature vectors as malware or clean. A known percentage of these trained AI models have been poisoned with a known trigger which induces incorrect behavior. This data will be used to develop software solutions for detecting which trained AI models have been poisoned via embedded triggers.
Tags: Trojan Detection; Artificial Intelligence; AI; Machine Learning; Adversarial Machine Learning;,
Modified: 2025-04-06
Trojan Detection Software Challenge - cyber-network-c2-feb2024-train
Data provided by National Institute of Standards and Technology
TrojAI cyber-network-c2-feb2024 Train DatasetThis is the training data used to create and evaluate trojan detection software solutions. This data, generated at NIST, consists of ResNet18 and ResNet34 neural network models that classify botnet command and control (c2) and benign network traffic packets trained on the USTC-TFC2016 dataset. A known percentage of these trained AI models have been poisoned with a known trigger which induces incorrect behavior. This data will be used to develop software solutions for detecting which trained AI models have been poisoned via embedded triggers.
Tags: Trojan Detection; Artificial Intelligence; AI; Machine Learning; Adversarial Machine Learning;,
Modified: 2025-04-06
Trojan Detection Software Challenge - object-detection-feb2023-train
Data provided by National Institute of Standards and Technology
Round 13 Train DatasetThis is the training data used to create and evaluate trojan detection software solutions. This data, generated at NIST, consists of object detection AIs trained both on synthetic image data build from Cityscapes and the DOTA_v2 dataset. A known percentage of these trained AI models have been poisoned with a known trigger which induces incorrect behavior. This data will be used to develop software solutions for detecting which trained AI models have been poisoned via embedded triggers. This dataset consists of 128 AI models using a small set of model architectures.
Tags: Trojan Detection; Artificial Intelligence; AI; Machine Learning; Adversarial Machine Learning;,
Modified: 2025-04-06