// service 01 — adversarial ai defense

Adversarial AI Defense

We run structured adversarial campaigns against your production AI. Not benchmark evaluations — live red-team operations against your actual deployed system.

47 engagements·312 findings·94% remediation rate

The Problem

IPI is not a benchmark problem.

Indirect prompt injection attacks operate entirely off the evaluation distributions your models were tested on. Adversaries craft payloads that arrive through trusted channels — PDFs, emails, web content, tool outputs — bypassing every guardrail designed for direct user input.

Standard red teams know how to break web apps. They do not know how to break chain-of-thought. We do. Our operators hold offensive AI research backgrounds, not OSCP certifications.

// ipi attack telemetry

IPI attack volume per month — Jan 2024 → Mar 2026

Source: LogicLeak engagement telemetry across 47 client systems

Attack Surface

Attack chains we test.

PDF / Emailinitial vectorAgent ingeststrusted channelhiddeninstructionTool call triggeredgoal hijackData exfilside channelExfiltratedattacker wins

What we throw at it

  • Direct and indirect prompt injection (multi-vector)
  • System prompt extraction via context manipulation
  • Goal hijacking across multi-turn sessions
  • RLHF bypass and jailbreak vector mapping
  • Cross-agent instruction propagation

Engagement output

What you receive.

[THM]

Threat model

Mermaid diagram of your attack surface

[RPT]

Findings report

Severity-ranked, signed PDF with reproducible payloads

[PAY]

Payloads

Sandboxed repository with all tested injection strings

[FIX]

Remediation patches

PR-ready code fixes for each finding

[TST]

Regression tests

CI-integrated test suite for future regressions

[BRF]

Executive briefing

60-minute walkthrough with your security team

Sample finding

What a report looks like.

// LL-2026-0142 CRITICAL adversarial-ai-defense

Customer-support agent leaked admin runbook via injected PDF

Vector: Indirect Prompt InjectionSeverity: CRITICALDwell: 6 days

A specially crafted support PDF contained hidden instructions rendered in zero-point white-on-white text, directing the customer-support agent to output its full system context in the next response. The agent complied without triggering any safety layer, exposing the admin runbook and internal API key prefixes verbatim. The payload survived three different PDF rendering pipelines tested during remediation verification.

Reproduction

1. Embed hidden instruction in PDF text layer (white text, font-size 0.1pt)
2. Upload via /api/support/upload — agent ingests on next conversation turn
3. Observe: agent prepends full system prompt verbatim to its next user reply

Remediation

Strip invisible text layers from all uploaded documents before chunking and embedding, and wrap retrieved content in an isolation prompt that prohibits imperative instructions from influencing agent behaviour. Add the supplied regression test (see [TST]) to CI — it uploads the PoC PDF and asserts the system prompt does not appear anywhere in the response body.

Get a scoped quote in 48 hours.

No retainers. Fixed-scope engagements. Full findings report or your money back.