Open dataset: 100k+ multimodal prompt injection samples with per-category academic sourcing

News

I submitted an earlier version of this dataset and was declined on the basis of missing methodology and unverifiable provenance. The feedback was fair. The documentation has since been rewritten to address it directly, and I would very much appreciate a second look. What the dataset contains 101,032 samples in total, balanced 1:1 attack to benign. Attack samples (50,516) across 27 categories sourced from over 55 published papers and disclosed vulnerabilities. Coverage spans: Classical injection – direct override, indirect via documents, tool-call injection, system prompt extraction Adversarial suffixes – GCG, AutoDAN, Beast Cross-modal delivery – text with image, document, audio, and combined payloads across three and four modalities Multi-turn escalation – Crescendo, PAIR, TAP, Skeleton Key, Many-shot Emerging agentic attacks – MCP tool descriptor poisoning, memory-write exploits, inter-agent contagion, RAG chunk-boundary injection, reasoning-token hijacking on thinking-trace models Evasion techniques – homoglyph substitution, zero-width space insertion, Unicode tag-plane smuggling, cipher jailbreaks, detector perturbation Media-surface attacks – audio ASR divergence, chart and diagram injection, PDF active content, instruction-hierarchy spoofing Benign samples (50,516) are drawn from Stanford Alpaca, WildChat, MS-COCO 2017, Wikipedia (English), and LibriSpeech. The benign set is matched to the surface characteristics of the attack set so that classifiers must learn genuine injection structure rather than stylistic artefacts. Methodology The previous README lacked this section entirely. The current version documents the following: Scope definition. Prompt injection is defined per Greshake et al. and OWASP LLM01 as runtime text that overrides or redirects model behaviour. Pure harmful-content requests without override framing are explicitly excluded. Four-layer construction. Hand-crafted seeds, PyRIT template expansion, cross-modal delivery matrix, and matched benign collection. Each layer documents the tool used, the paper referenced, and the design decision behind it. Label assignment. Labels are assigned by construction at the category level rather than through per-sample human review. This is stated plainly rather than overclaimed. Benign edge-case design. The ten vocabulary clusters used to reduce false positives on security-adjacent language are documented individually. Quality control. Deduplication audit results are included: zero duplicate texts in the benign pool, zero benign texts appearing in attacks, one documented legacy duplicate cluster with cause noted. Known limitations. Six limitations are stated explicitly: text-based multimodal representation, hand-crafted seed counts, English-skewed benign pool, no inter-rater reliability score, ASR figures sourced from original papers rather than re-measured, and small v4 seed counts for emerging categories. Reproducibility Generators are deterministic (random.seed(42)). Running them reproduces the published dataset exactly. Every sample carries attack_source and attack_reference fields with arXiv or CVE links. A reviewer can select any sample, follow the citation, and verify that the attack class is documented in the literature. Comparison to existing datasets The README includes a comparison table against deepset (500 samples), jackhhao (2,600), Tensor Trust (126k from an adversarial game), HackAPrompt (600k from competition data), and InjectAgent (1,054). The gap this dataset aims to fill is multimodal cross-delivery combinations and emerging agentic attack categories, neither of which exists at scale in current public datasets. What this is not To be direct: this is not a peer-reviewed paper. The README is documentation at the level expected of a serious open dataset submission – methodology, sourcing, limitations, and reproducibility – but it does not replace academic publication. If that bar is a requirement for r/netsec specifically, that is reasonable and I will accept the feedback. Links GitHub: https://github.com/Josh-blythe/bordair-multimodal Hugging Face: https://huggingface.co/datasets/Bordair/bordair-multimodal I am happy to answer questions about any construction decision, provide verification scripts for specific categories, or discuss where the methodology falls short. submitted by /u/BordairAPI [link] [comments]Technical Information Security Content & DiscussionRead More

Open dataset: 100k+ multimodal prompt injection samples with per-category academic sourcing

News

I submitted an earlier version of this dataset and was declined on the basis of missing methodology and unverifiable provenance. The feedback was fair. The documentation has since been rewritten to address it directly, and I would very much appreciate a second look. What the dataset contains 101,032 samples in total, balanced 1:1 attack to benign. Attack samples (50,516) across 27 categories sourced from over 55 published papers and disclosed vulnerabilities. Coverage spans: Classical injection – direct override, indirect via documents, tool-call injection, system prompt extraction Adversarial suffixes – GCG, AutoDAN, Beast Cross-modal delivery – text with image, document, audio, and combined payloads across three and four modalities Multi-turn escalation – Crescendo, PAIR, TAP, Skeleton Key, Many-shot Emerging agentic attacks – MCP tool descriptor poisoning, memory-write exploits, inter-agent contagion, RAG chunk-boundary injection, reasoning-token hijacking on thinking-trace models Evasion techniques – homoglyph substitution, zero-width space insertion, Unicode tag-plane smuggling, cipher jailbreaks, detector perturbation Media-surface attacks – audio ASR divergence, chart and diagram injection, PDF active content, instruction-hierarchy spoofing Benign samples (50,516) are drawn from Stanford Alpaca, WildChat, MS-COCO 2017, Wikipedia (English), and LibriSpeech. The benign set is matched to the surface characteristics of the attack set so that classifiers must learn genuine injection structure rather than stylistic artefacts. Methodology The previous README lacked this section entirely. The current version documents the following: Scope definition. Prompt injection is defined per Greshake et al. and OWASP LLM01 as runtime text that overrides or redirects model behaviour. Pure harmful-content requests without override framing are explicitly excluded. Four-layer construction. Hand-crafted seeds, PyRIT template expansion, cross-modal delivery matrix, and matched benign collection. Each layer documents the tool used, the paper referenced, and the design decision behind it. Label assignment. Labels are assigned by construction at the category level rather than through per-sample human review. This is stated plainly rather than overclaimed. Benign edge-case design. The ten vocabulary clusters used to reduce false positives on security-adjacent language are documented individually. Quality control. Deduplication audit results are included: zero duplicate texts in the benign pool, zero benign texts appearing in attacks, one documented legacy duplicate cluster with cause noted. Known limitations. Six limitations are stated explicitly: text-based multimodal representation, hand-crafted seed counts, English-skewed benign pool, no inter-rater reliability score, ASR figures sourced from original papers rather than re-measured, and small v4 seed counts for emerging categories. Reproducibility Generators are deterministic (random.seed(42)). Running them reproduces the published dataset exactly. Every sample carries attack_source and attack_reference fields with arXiv or CVE links. A reviewer can select any sample, follow the citation, and verify that the attack class is documented in the literature. Comparison to existing datasets The README includes a comparison table against deepset (500 samples), jackhhao (2,600), Tensor Trust (126k from an adversarial game), HackAPrompt (600k from competition data), and InjectAgent (1,054). The gap this dataset aims to fill is multimodal cross-delivery combinations and emerging agentic attack categories, neither of which exists at scale in current public datasets. What this is not To be direct: this is not a peer-reviewed paper. The README is documentation at the level expected of a serious open dataset submission – methodology, sourcing, limitations, and reproducibility – but it does not replace academic publication. If that bar is a requirement for r/netsec specifically, that is reasonable and I will accept the feedback. Links GitHub: https://github.com/Josh-blythe/bordair-multimodal Hugging Face: https://huggingface.co/datasets/Bordair/bordair-multimodal I am happy to answer questions about any construction decision, provide verification scripts for specific categories, or discuss where the methodology falls short. submitted by /u/BordairAPI [link] [comments]Technical Information Security Content & DiscussionRead More