Field notes · Read all posts
SOC 2 Type II·HIPAA·ISO 27001
Field notes / INTEGRATION

Building the LLM Red-Team Pipeline: Scanning, Validation, Orchestration

How garak, promptfoo, and pyrit work together to scan for vulnerabilities, validate exploits, and orchestrate campaigns across frontier models.

Frontier models demand multi-phase red-team assessments. Scanning alone misses exploitation angles. Validation without orchestration remains piecemeal. The right answer is a pipeline.

ARX brings garak, promptfoo, and pyrit together into a coordinated workflow that establishes baseline vulnerabilities, validates exploitation paths, and orchestrates unified campaigns across your LLM deployments. Three tools, three responsibilities, one governed pipeline.

RED-TEAM PIPELINE GARAK Scan vulnerabilities PROMPTFOO Validate exploits PYRIT Orchestrate campaigns ARX GOVERNANCE LAYER Findings flow through governance gates at each phase phase 1 phase 2 phase 3

Garak's Baseline Scanning

NVIDIA's garak (Apache 2.0) is the foundational layer. It scans LLMs across attack categories: jailbreak, toxic output, PII leakage, hallucination, bias, and prompt injection. Garak doesn't exploit — it surfaces what's exploitable. For each category, it runs structured probes, collects model responses, and flags patterns.

In ARX's pipeline, garak findings populate the upstream of the red-team workflow: here's what your frontier model is vulnerable to. The output is a normalized list of attack vectors, each mapped to severity and category. Nothing is exploited yet — just discovered.

Promptfoo's Adversarial Validation

Once garak surfaces vulnerabilities, promptfoo (MIT) validates exploitation. Promptfoo is an LLM eval and red-team CLI with a plugin architecture: jailbreak, PII extraction, hallucination, prompt injection, toxicity, and dozens more.

For each garak finding, promptfoo runs the exploit scenario — does the model actually leak PII when prompted this way? Can we jailbreak it with this attack pattern? Promptfoo doesn't just run; it scores success/failure and reasons about impact. In ARX, promptfoo's results are the middle phase: not all findings are easily exploitable; promptfoo proves which ones matter.

Pyrit's Orchestration Layer

Microsoft's pyrit (MIT) orchestrates campaigns. It coordinates multi-tool red-teams, parameter sweeps, and model variants into a unified interface. Pyrit doesn't itself scan or exploit — it orchestrates.

A pyrit campaign might run garak's jailbreak probes against GPT-4o, Claude, and Llama all at once, feed results to promptfoo's plugins, sweep prompt templates, and correlate findings. In ARX, pyrit is the control plane: how do you systematically assess all your frontier models against the full category matrix without manual orchestration overhead?

Garak finds vulnerabilities. Promptfoo proves them real. Pyrit runs them at scale. Together they close the loop — from discovery to validated risk to orchestrated assessment.

How ARX Wraps the Pipeline

ARX's LLM red-team pipeline connects all three with hard governance checkpoints. Garak scans → findings normalized into severity/category. Promptfoo validates → exploitation success/failure flagged. Pyrit orchestrates → campaigns run at scale with parameter tracking. Each phase is logged with the initiator, LLM spend, findings, and any human escalations.

No tool runs unattended. Every campaign requires a signed authorization artifact. The pipeline enforces that findings don't become actions without human review. Critical-severity exploits trigger automatic escalation — a human approves before the campaign continues.

Why All Three Matter

Garak alone finds vulns but doesn't validate whether they're exploitable at scale. Promptfoo alone validates arbitrary attacks but has no systematic discovery phase. Pyrit alone is orchestration without the scanning or validation content. Each solves a distinct problem:

Garak: Baseline vulnerability surface. What attack categories exist?

Promptfoo: Exploitation validation. Which vulnerabilities actually work?

Pyrit: Campaign coordination. How do you assess all models against all categories without manual work?

Together, they form a closed-loop red-team pipeline. ARX governs every step: no findings leak, no campaigns run without authorization, and every escalation is auditable.

Getting Started

Organizations running garak, promptfoo, or pyrit get a free ARX seat with the full governance wrapper. Sign a scope artifact, describe your frontier model stack, and ARX provisions a 14-day workspace with all three tools integrated under hard authorization gates, sandbox containment, and immutable audit trails.

Email mershard@arxsec.io to discuss your red-team requirements. We'll scope the assessment, determine which of garak/promptfoo/pyrit are best for your models, and get you running under ARX governance in 24 hours.

— Mershard J.B. Frierson, Founder · ARX · mershard@arxsec.io · 945-372-8711

// More field notes

See what Arx looks like on your agents.

30-minute demo. We'll load one of your Python agents into a sandbox workspace and walk your review board through what they'd see.