v0.2 · active research Pentest Framework

Pentest Agent Framework — LLM-Orchestrated Engagements

A framework that orchestrates multiple LLM-driven agents through the standard phases of a pentest engagement — reconnaissance, vulnerability identification, exploitation, post-exploitation, reporting. Each phase is an autonomous agent with a constrained toolset and a verification step; the orchestrator routes findings between phases and decides when human review is needed.

LLM agents recon → exploitation verified findings CERT.pl disclosure
Engagements
5+
authorised pentests run through the framework
Disclosed CVEs
4+
via CERT.pl, more pending
Agents per engagement
6
recon, vuln-id, exploit, post-ex, verify, report
Tool budget
bounded
per-phase rate limits + cost caps

Phase Pipeline

  1. Reconnaissance. Target profiling: open ports, services, software versions, TLS fingerprints, web-stack identification. Wraps nmap, nuclei, httpx, whatweb behind a uniform LLM-callable interface. Output: structured asset inventory with confidence scores.
  2. Vulnerability identification. Cross-reference inventory against CVE databases (NVD, CISA KEV, CIRCL vulnerability-lookup) and check for misconfigurations the LLM has been trained to recognise (default credentials, exposed admin interfaces, known weak crypto). Output: ranked candidate findings.
  3. Exploitation. For each candidate, the LLM decides between (a) using a public PoC after sandbox verification, (b) writing custom exploit code, or (c) flagging for human approval before any active action. Defaults to (c) on anything CVSS ≥ 8.0.
  4. Post-exploitation. Lateral movement is gated by explicit human opt-in per engagement. Default mode is “proof-of-shell only” — confirm code execution, evidence-collect, then stop.
  5. Verification. A separate agent (different model from the one that exploited) replays the finding against a clean target to confirm reproducibility. Catches the LLM-hallucinated-vuln failure mode.
  6. Reporting. Markdown writeup with reproducer scripts, screenshots, CVSS scoring, remediation guidance. CERT.pl disclosure ticket drafted automatically for any unpatched CVE-class finding.

Notable Engagements

What’s Next