Field notes · Read all posts
SOC 2 Type II·HIPAA·ISO 27001
Field notes / INTEGRATION

Pentesting Across the Stack: Infrastructure, LLM Guidance, Autonomous Loops

Why reaper, pentestgpt, and pentagi together deliver complete attack-surface coverage across traditional infrastructure and frontier model layers.

Frontier model deployments mix traditional infrastructure (networks, containers, APIs, databases) with LLM-based services. No single pentest tool covers both layers.

Reaper brings infrastructure expertise. Pentestgpt applies LLM reasoning to guide the workflow. Pentagi automates full-cycle exploitation. Together in ARX, they deliver defense-in-depth across the entire stack — and they correlate findings across infrastructure and frontier-model layers in ways no single tool can.

FULL-STACK PENTEST TARGET Infra + Models REAPER Infrastructure scanning PENTESTGPT LLM-guided reasoning PENTAGI Autonomous recon → exploit FINDINGS CORRELATED IN ARX network vulns model reasoning autonomous chains

Reaper's Infrastructure Pentest

Reaper (Apache 2.0) is the traditional pentest layer. It scans networks, hosts, APIs, and services for CVEs, misconfigurations, credential exposure, and exploitation chains. Reaper finds the kernel vulnerability, the unpatched database, the exposed API endpoint.

In ARX's stack, reaper establishes what's vulnerable on the infrastructure side: here's where an attacker gains code execution on your frontier model's serving infrastructure. It's a familiar tool category — but the findings matter most when they're correlated with what's running on top of that infrastructure.

Pentestgpt's LLM-Guided Reasoning

Pentestgpt applies language model reasoning to guide pentest workflows. Given reaper's findings (e.g., "SQL injection in user endpoint"), pentestgpt reasons about implications: How does this SQL injection affect frontier model inference? Can an attacker exfiltrate training data? Can they poison model inputs?

Pentestgpt doesn't itself exploit; it interprets. It bridges reaper's infrastructure findings with frontier model risk semantics. Without pentestgpt, you have a list of CVEs. With pentestgpt, you have prioritized exploitation paths that matter for your specific frontier model deployment.

Pentagi's Autonomous Pentest Agent

Pentagi (shipping in ARX today) is the autonomous red-team agent. Recon → vulnerability detection → exploitation → post-exploit all in one loop. Pentagi runs autonomously (within ARX's governance gates) to discover and validate attack chains.

Unlike reaper (linear scan-identify-exploit) or pentestgpt (reasoning), pentagi auto-loops — it finds a vulnerability, exploits it, uses the access to find the next vulnerability, chains exploits together. It validates that an exploit chain is real, not just theoretically possible.

Reaper finds infra vulns. Pentestgpt reasons about model implications. Pentagi validates exploitation chains across both layers. Together they defend the whole stack — not just infrastructure, not just models.

How They Layer Together

In ARX, reaper, pentestgpt, and pentagi run in parallel, with findings correlated:

Reaper scans: Infrastructure vulnerabilities (CVEs, misconfigs, exposed endpoints).

Pentestgpt reasons: What do those infrastructure vulns mean for frontier models? What exploitation paths matter most?

Pentagi auto-loops: Discovers vulnerabilities at all layers, chains exploits, confirms real-world impact.

When all three surface the same vulnerability (reaper finds the SQL injection, pentestgpt flags it as high-impact for models, pentagi chains it with credential theft), ARX escalates with high confidence. When they diverge (reaper finds it, pentestgpt flags it, but pentagi can't chain it), ARX downgrades severity — it's not a real exploit chain.

Why All Three Matter

Reaper alone finds infrastructure vulns but has no semantic understanding of frontier model implications. Pentestgpt alone reasons but needs infrastructure telemetry to ground its reasoning. Pentagi alone auto-loops but might miss traditional infra angles — it's agent-focused. Each solves a distinct problem:

Reaper: What's vulnerable on the infrastructure layer?

Pentestgpt: Why does that matter for frontier models?

Pentagi: Can we exploit it end-to-end?

Together, they close all three questions. ARX governs every phase: no reaper scan runs unattended, no pentestgpt reasoning escapes audit, no pentagi exploit executes without authorization.

Status Today

Pentagi ships in ARX today. Reaper and pentestgpt are currently deferred — available on customer request. Email mershard@arxsec.io to discuss your pentest requirements across infrastructure and frontier models. We'll provision all three under ARX governance within 24 hours.

Getting Started

Organizations running reaper, pentestgpt, or pentagi get a free ARX seat. ARX wraps all three in hard governance gates: signed authorization artifacts (no unattended campaigns), immutable audit trails, and unified policies. Email mershard@arxsec.io to scope your full-stack pentest requirements.

— Mershard J.B. Frierson, Founder · ARX · mershard@arxsec.io · 945-372-8711

// More field notes

See what Arx looks like on your agents.

30-minute demo. We'll load one of your Python agents into a sandbox workspace and walk your review board through what they'd see.