v0.2 · active research
Pentest Framework
Pentest Agent Framework — LLM-Orchestrated Engagements
A framework that orchestrates multiple LLM-driven agents through the standard phases of a pentest engagement — reconnaissance, vulnerability identification, exploitation, post-exploitation, reporting. Each phase is an autonomous agent with a constrained toolset and a verification step; the orchestrator routes findings between phases and decides when human review is needed.
Engagements
5+
authorised pentests run through the framework
Disclosed CVEs
4+
via CERT.pl, more pending
Agents per engagement
6
recon, vuln-id, exploit, post-ex, verify, report
Tool budget
bounded
per-phase rate limits + cost caps
Phase Pipeline
- Reconnaissance. Target profiling: open ports, services, software
versions, TLS fingerprints, web-stack identification. Wraps
nmap,nuclei,httpx,whatwebbehind a uniform LLM-callable interface. Output: structured asset inventory with confidence scores. - Vulnerability identification. Cross-reference inventory against CVE databases (NVD, CISA KEV, CIRCL vulnerability-lookup) and check for misconfigurations the LLM has been trained to recognise (default credentials, exposed admin interfaces, known weak crypto). Output: ranked candidate findings.
- Exploitation. For each candidate, the LLM decides between (a) using a public PoC after sandbox verification, (b) writing custom exploit code, or (c) flagging for human approval before any active action. Defaults to (c) on anything CVSS ≥ 8.0.
- Post-exploitation. Lateral movement is gated by explicit human opt-in per engagement. Default mode is “proof-of-shell only” — confirm code execution, evidence-collect, then stop.
- Verification. A separate agent (different model from the one that exploited) replays the finding against a clean target to confirm reproducibility. Catches the LLM-hallucinated-vuln failure mode.
- Reporting. Markdown writeup with reproducer scripts, screenshots, CVSS scoring, remediation guidance. CERT.pl disclosure ticket drafted automatically for any unpatched CVE-class finding.
Notable Engagements
- LibreNMS white-box pentest. Five-phase plan executed against a local LibreNMS deployment. Four vulnerabilities found and disclosed via CERT.pl case #5598022: three remote-code-execution bugs and one stored-XSS. Cross-checked using the 8-model triage benchmark from the AFA project.
- TP-Link router pentest. Eight distinct vulnerabilities identified across the firmware. Disclosure in progress under the standard 90-day window.
- CUW Tychy quarterly pentests. Recurring authorised engagements against a partner organisation's infrastructure. The framework's recon and verification phases automate the tedious parts; human review stays on exploitation decisions.
- Infolab / TachoSpeed IR. Incident-response support for a partner; the framework's recon phase runs internally on the affected network to characterise blast radius.
What’s Next
- v0.3 phases: post-exploitation (gated), continuous engagement mode for long-running engagements, integration with the IRIS agent-mesh for multi-host coordination.
- Methodology writeup: how the orchestrator decides when to invoke a fresh agent vs. continue the conversation, and how cost-bounded LLM use compares to unbounded human-time pentest spend.
- Cross-link with AFA: vulnerabilities surfaced by fuzzing flow into the pentest framework's verification phase to confirm exploitability before disclosure.