In twenty-three RAG system audits conducted between October 2025 and February 2026, we found that fifteen — 67% — returned at least one document to a requester who lacked the ACL permissions to view it. The failures were consistent enough to constitute a pattern, and the pattern was not what most security teams assumed it would be.
The Wrong Mental Model
When enterprises think about RAG access control, they typically frame it as an authentication problem: ensure the user querying the system is authenticated, and then scope the retrieval to documents they're allowed to see. The standard implementation is a metadata filter applied at query time — 'only return documents where allowed_roles contains the current user's role.' This is correct in theory and broken in practice.
// WARNING
Metadata filter bypass: In 8 of 15 failure cases, the root cause was that document ACL metadata was stale. Documents had been reclassified as confidential after initial ingestion, but the vector store's metadata was never updated. The embedding and its associated permissions had diverged.
The Ingestion-Time vs Query-Time Gap
The fundamental problem is that RAG systems bifurcate document access control into two events: ingestion and retrieval. Permissions are typically evaluated once at ingestion and stored as metadata alongside the embedding. At retrieval time, the system filters on this cached metadata. If the underlying document's permissions change — an HR file is reclassified, an M&A document's access list is trimmed, a contract is placed under legal hold — the vector store metadata is rarely updated synchronously.
Most vector store architectures make this problem worse. Updating metadata for a specific document requires knowing the vector ID, which is an internal identifier not typically surfaced to the document management system. The result is a one-way sync: ingestion pushes permissions into the vector store, but permission changes in the source system don't propagate back.
# Common (broken) pattern: permissions set at ingestion only
def ingest_document(doc, user_roles):
embedding = embed(doc.content)
vector_store.upsert(
id=doc.id,
vector=embedding,
metadata={"allowed_roles": user_roles, "ingested_at": now()}
)
# No mechanism to update when doc.permissions changes in source system
# What's needed: a permission resolver at query time
def retrieve_with_live_acl(query, current_user):
results = vector_store.query(query, top_k=20)
# Re-check permissions against authoritative source BEFORE returning
return [r for r in results if acl_service.can_read(current_user, r.doc_id)]Semantic Neighbourhood Leakage
The second failure mode is subtler and harder to fix. Even with real-time permission checks, the retrieval ranking itself can leak information. Vector similarity search returns documents ordered by semantic proximity to the query. If a user queries for 'Q4 acquisition targets' and the system correctly filters out the confidential M&A document, it may still return the public-facing investor FAQ that was written to address questions about that same acquisition — because the two documents are semantically similar. The retrieval ranking reveals the existence and rough contour of the protected document.
// BREACH
In one engagement, a junior analyst was able to reconstruct the approximate deal terms of a confidential acquisition by observing which public documents were ranked highly in response to targeted queries, and inferring what content must exist nearby in the embedding space.
Remediations
The most robust fix is late-binding permission evaluation: never trust cached metadata. At query time, retrieve a candidate set, extract the source document IDs, and re-check current permissions against the authoritative ACL system before returning any results to the model. This adds latency — typically 50-150ms per query depending on ACL backend — but eliminates the ingestion-time gap entirely.
For semantic neighbourhood leakage, the mitigation is namespace isolation. Don't store documents with different classification levels in the same vector index. Maintain separate indices per classification tier and only query the indices the user is cleared for. This prevents cross-classification similarity ranking entirely.
// NOTE
Audit recommendation: Run a permission drift analysis quarterly. Export all vector store document IDs with their stored ACL metadata, compare against the current state of the authoritative source system, and flag divergences for re-ingestion. Most teams discover significant drift within the first 90 days of a system being in production.