// ADVERSARIAL AI SECURITY · LOGICLEAK RESEARCH

Your AI is leaking. We close the gaps before they cost you.

We red-team production AI systems — agents, RAG pipelines, tool-calling chains. Find the semantic flaws that firewalls and LLM-as-a-judge can't see. Patch them before they ship.

Request audit See methodology

INCIDENTS · 7D

287▲ +14%

ACTIVE PAYLOADS

1,842▲ +6%

EXPOSED SYSTEMS

67%▲ +3%

MEAN DWELL · DAYS

11▼ −18%

// 01 — threat snapshot · live

What's actually happening this week.

Aggregated, anonymized telemetry from 47 active LogicLeak engagements plus public incident feeds. Updated continuously. Last refresh shown per panel.

PRODUCTION INCIDENTS
ROLLING 7D

LIVE

287

▲ +14.3%vs 251 prior 7d

Source: engagement telemetry · OWASP LLM feed · CVE/AI tracker

VECTORS · 7D · TOP 5

LIVE

Indirect Prompt Injection92

RAG Permission Bypass67

Tool-call Hijack54

Token Flooding (DoW)41

Cache Poisoning33

SEVERITY · 7D

LIVE

CRITICAL22%

HIGH41%

MEDIUM28%

LOW9%

MEAN DWELL
DETECTION DELAY

LIVE

12days

▼ −18% vs 90d avg

0d20d40d+

INDUSTRY EXPOSURE · 7D

LIVE

IPI

RAG

Tool

DoW

Cache

Other

Fintech

Healthcare

B2B SaaS

E-commerce

GovTech

◆ PEAK Healthcare × RAG · 31◆ LOWEST GovTech × DoW · 2

WEAKNESS · OWASP LLM CLASS

LIVE

CRITICALOWASP LLM01 · Prompt Injection

Indirect injection via document upload

Agent-mode chatbots ingesting user-uploaded PDFs are leaking system prompts and tool definitions to attackers in 38% of audited deployments.

> pdf.upload({ source: "knowledge_base_2024.pdf" })
> agent.process() → SYSTEM_PROMPT_LEAKED
> exfil.endpoint ← "https://attacker.tld/c2"

Observed in 18/47 audited systems · Q1 2026

↻ AUTO-REFRESH · NEXT IN REFRESH IN 60s→ Full threat landscape

// 02 — engagement surface

Three failure modes. Three engagements.

Every AI system we audit fails in one of three ways. We engage along the exact axis your stack is exposed on — not a generic security review.

ADVERSARIAL · AI · DEFENSEdepth: red team · 4 weeks

╭──[ AGENT ]──╮
│  prompt ─→  │
│  tools  ─→  │  ⚠ injection
│  memory ─→  │
╰─────────────╯

FAILURE MODE

Hostile content reaches the model and rewrites its instructions.

PDFs, web pages, emails, and tool responses become attack vectors when ingested by agentic systems.

OUR ENGAGEMENT

We run adversarial payloads against your live agent.

Indirect prompt injection, tool-call hijacking, system prompt extraction, jailbreak chaining.

WHAT WE TEST

Indirect prompt injection via uploads
Tool-call hijack & arg poisoning
System prompt & instruction leak
Multi-turn jailbreak chains

ENGAGEMENTS · 47 · MEDIAN FINDINGS · 14 · REMEDIATION · 94%

→ See full methodology

RAG · PERIMETER · HARDENINGdepth: data-plane audit · 3 weeks

┌─[ vector db ]─┐
│  ░░░ doc_001  │ ← acl?
│  ░░░ doc_002  │ ← acl?
│  ░░░ doc_003  │ ← leak
└───────────────┘

FAILURE MODE

Your retrieval layer ignores the access controls your database respects.

Vector stores return chunks that the user's auth context should never have surfaced — embeddings don't carry permissions.

OUR ENGAGEMENT

We probe the retrieval boundary as a cross-tenant attacker.

ACL bypass at query-time, embedding similarity leakage, chunk overlap exfil, prompt-forced retrieval.

WHAT WE TEST

Cross-tenant retrieval bypass
Embedding inversion exposure
Chunk overlap & boundary leak
Prompt-forced retrieval of restricted docs

ENGAGEMENTS · 38 · MEDIAN FINDINGS · 11 · REMEDIATION · 91%

→ See full methodology

COST · CONTAINMENT · LAYERdepth: infra + economics · 2 weeks

[ user ] ── 1 req
   │
   └─→ [ agent ]
          │
          ├─→ [ tool ]  $$$
          └─→ [ tool ]  $$$$$

FAILURE MODE

A single attacker drains your monthly token budget in nine hours.

Recursive tool calls, embedding-pump loops, and context inflation turn cost into a weapon — Denial-of-Wallet is real.

OUR ENGAGEMENT

We model your cost surface and plant guardrails.

Per-user token caps, semantic caching, prompt compression, anomaly triggers, recursive depth limits.

WHAT WE TEST

Per-user cost ceiling enforcement
Recursive agent depth limits
Cache poisoning & invalidation
Token-flooding anomaly detection

ENGAGEMENTS · 29 · MEDIAN SAVED · 42% · UPTIME · 99.9%

→ See full methodology

// SCOPING

Most engagements run across two pillars. The recon call sets the exact mix.

Two-week recon. Findings in week three. Hardening in week four.

Request audit → See engagement formats

// 03 — live simulation

Watch the exploit happen in real time.

A real IPI exploit — from malicious PDF upload to classified data exfiltration — in 1.7 seconds. Every step reproduces a finding from an actual engagement.

// PHASES

logicleak — ipi-demo-2026INFO

ATTACKERINFOT+0

Attacker uploads a PDF containing a hidden injection directive via the support portal

typing…1 / 6

// SOURCE

Reproduced from LogicLeak engagement LL-2026-0142 · Fintech, Series C · sanitised

Request audit → Read the IPI briefing

// 04 — findings · sanitized

From the last 90 days of engagements.

Three findings, anonymized. Each one shipped to production and survived internal review before we found it.

DISCLOSURE WINDOW · 90 DAYS · 6 ENGAGEMENTS · 47 FINDINGS

SEVERITYCRITICAL · 1HIGH · 1MEDIUM · 1LOW · 0

PILLARADVERSARIALRAGCOST

SORTby severity ↓

CRITICALLL-2026-0184

OWASP LLM01 · Prompt Injection

Customer-support agent leaked admin runbook via uploaded PDF

Series-C fintech. PDF helpdesk macro contained hidden white-on-white instructions that triggered an unauthorized knowledge_search('admin') call. 14 lines of internal runbook returned to anonymous external user in 1.7 seconds.

$ pdf.upload({ file: "helpdesk_macro_2024.pdf" })
> extract.complete · 14kb · trust=user_upload
$ agent.process({ ticket: "#48217" })
> tool.knowledge_search("admin") · acl_check=false
Breach indicator: ! 3 admin docs ████████ returned to ticket reply
> breach.elapsed · 1.7s · alarms_fired=0

VectorIndirect Prompt Injection

IndustryFintech · Series C

Dwell14 days

DISCLOSED · PATCHED

fix · 11 days

→ Full disclosure under NDA

HIGHLL-2026-0179

OWASP LLM06 · Sensitive Information Disclosure

RAG returned a competitor's draft contract to a junior CS rep

B2B SaaS, ~400 employees. Vector store contained partner agreements across all tenants. Embedding similarity query for 'pricing terms' bypassed row-level ACLs and surfaced a 2024 redlined contract from a different customer.

$ rag.query("pricing terms enterprise tier")
> vector.search · top_k=8 · acl_filter=disabled
> match.0 · doc="acme_2024_redlined.docx" · score=0.91
Breach indicator: ! cross_tenant_leak · doc_owner != caller_tenant
> response.compose · doc_excerpt=1,840 chars

VectorRAG Permission Bypass

IndustryB2B SaaS

Dwell47 days

DISCLOSED · IN REMEDIATION

fix · 21 days

→ Full disclosure under NDA

MEDIUMLL-2026-0171

OWASP LLM10 · Unbounded Consumption

Single user burned $4,200 in tokens via 9-hour recursive loop

AI-native startup. Agent's planner-executor pattern had no recursion ceiling and no per-user cost cap. One adversarial input triggered self-prompting that ran until the on-call engineer noticed billing spike on the next morning's dashboard.

$ agent.plan({ goal: "[adversarial input]" })
> plan.steps=14 · executor.invoked()
> executor.subplan · recursion_depth=27
> tokens.consumed · cumulative=42M ($4,234)
Breach indicator: ! billing.threshold_exceeded · alert_lag=8h 47m
> shutdown · manual · by on_call_eng

VectorDenial-of-Wallet

IndustryAI-native startup

Dwell1 day

DISCLOSED · PATCHED

fix · 4 days

→ Full disclosure under NDA

// DISCLOSURE

These are the ones we can talk about.

44 more findings from this quarter remain under embargo. Engagement clients receive the full feed monthly. Public briefings publish 90 days after remediation.

Request audit → See all public briefings

// 05 — methodology · 4 weeks

Four weeks from engagement to hardening.

Every engagement runs the same skeleton. Scope and depth are dialed in week one. By week four, the fixes are in production with regression tests.

WEEK 01

WEEK 02

WEEK 03

WEEK 04

PHASE 015 DAYS

Reconnaissance

Scope mapping. Surface enumeration. We don't write a single payload until we have the map.

DELIVERABLES

Threat model (Mermaid)
Agent + tool inventory
RAG perimeter sketch
Cost surface baseline

TOOLS · burp, garak, custom probes

PHASE 0210 DAYS

Adversarial test

Live red-team against the actual system. No theoretical findings — every issue we report has a working payload.

DELIVERABLES

~80 standardized IPI payloads
RAG ACL bypass attempts
Tool-call hijack probes
DoW + recursion stress

TOOLS · promptfoo, llm-attacks, in-house tooling

PHASE 035 DAYS

Findings report

Severity-ranked findings with reproducible payloads, code-level remediation, and a signed PDF for compliance review.

DELIVERABLES

Signed PDF · ~40 pages
Reproducible payload repo
Executive briefing · 60 min
Remediation roadmap

TOOLS · internal report stack

PHASE 0410 DAYS

Harden + ship

We implement the fixes ourselves. Semantic sandboxing, cost guards, prompt firewalls. PR-ready, with regression tests in your CI.

DELIVERABLES

PR-ready remediation patches
Regression test suite (CI)
Re-test against original payloads
30-day post-engagement support

TOOLS · your repo, your CI, our patches

PHASE 015 DAYS

Reconnaissance

Scope mapping. Surface enumeration. We don't write a single payload until we have the map.

DELIVERABLES

Threat model (Mermaid)
Agent + tool inventory
RAG perimeter sketch
Cost surface baseline

TOOLS · burp, garak, custom probes

PHASE 0210 DAYS

Adversarial test

Live red-team against the actual system. No theoretical findings — every issue we report has a working payload.

DELIVERABLES

~80 standardized IPI payloads
RAG ACL bypass attempts
Tool-call hijack probes
DoW + recursion stress

TOOLS · promptfoo, llm-attacks, in-house tooling

PHASE 035 DAYS

Findings report

Severity-ranked findings with reproducible payloads, code-level remediation, and a signed PDF for compliance review.

DELIVERABLES

Signed PDF · ~40 pages
Reproducible payload repo
Executive briefing · 60 min
Remediation roadmap

TOOLS · internal report stack

PHASE 0410 DAYS

Harden + ship

We implement the fixes ourselves. Semantic sandboxing, cost guards, prompt firewalls. PR-ready, with regression tests in your CI.

DELIVERABLES

PR-ready remediation patches
Regression test suite (CI)
Re-test against original payloads
30-day post-engagement support

TOOLS · your repo, your CI, our patches

TYPICAL TIMELINE

30 days · scope to ship

TEAM SIZE

2-3 researchers

ENGAGEMENTS · 2026 Q1

11 active

→ See full engagement contract sample

// 06 — engagement · scoping call

Tell us what's in production.
We'll tell you where it's exposed.

A 30-minute recon call. We look at your stack, name the realistic attack surface, and tell you whether an engagement makes sense. No deck. No follow-up sequence.

Request scoping call Read methodology

✓Reply within 24h

✓Under engagement NDA

✓Fixed-scope proposal in 48h

Your AI is leaking. We close the gaps before they cost you.

What's actually happening this week.

Three failure modes. Three engagements.

Watch the exploit happen in real time.

From the last 90 days of engagements.

Customer-support agent leaked admin runbook via uploaded PDF

RAG returned a competitor's draft contract to a junior CS rep

Single user burned $4,200 in tokens via 9-hour recursive loop

These are the ones we can talk about.

Four weeks from engagement to hardening.

Reconnaissance

Adversarial test

Findings report

Harden + ship

Reconnaissance

Adversarial test

Findings report

Harden + ship

Tell us what's in production.We'll tell you where it's exposed.

Tell us what's in production.
We'll tell you where it's exposed.