v0.3.0 · active research
AFA / Fuzzer
Autonomous Fuzzing Agent — LLM-Augmented AFL++ Campaigns
An LLM-augmented fuzzing platform that targets open-source software with AFL++ campaigns guided by AI seed generation, harness construction, and cross-model triage of crashes. Built to scale: 13 target projects in rotation, ~600 evolved seeds across the corpus, an 8-model benchmark for triage accuracy, and an automated CVE-disclosure path through CERT.pl.
Targets in rotation
13
OSS libraries + network daemons
Evolved seeds
598
corpus across all targets
Triage models
8
cross-benchmarked on real crashes
Disclosure path
CERT.pl
automated triage → ticket
What It Does
- Seed generation. An LLM reads each target's protocol grammar / file format and produces a starter seed set. The seeds get fed to AFL++ which evolves them via coverage-guided mutation. The LLM is re-invoked periodically with current coverage gaps to bias future seed proposals.
- Harness construction. For library targets, the LLM writes a fuzzing
harness (the
LLVMFuzzerTestOneInputentry point and its argument decoders) given the target's headers. Build, compile, sanitiser-instrument (ASAN+UBSAN), and integrate into AFL++ withCMPLOG+ persistent mode. - Crash triage. Every reproducible crash gets analysed by an 8-model ensemble (mix of local + cloud LLMs). Each model independently classifies severity, root cause family (UAF, OOB, integer overflow, type confusion, etc.), and likely CVSS vector. Disagreement between models surfaces the case for manual review; consensus crashes auto-progress to CERT.pl draft.
- Disclosure. Consensus high-severity crashes get a CERT.pl ticket drafted automatically with proof-of-concept, affected versions, and a suggested CVSS score. Disclosure runs on the standard 90-day clock from upstream notification.
Notable Results
- wolfSSL 5.9.0. Three AFL++/ASAN harnesses, multi-hour campaigns running on a dedicated pentest host. Coverage-guided exploration of the TLS handshake state machine, X.509 parser, and cipher-suite negotiation.
- LibreNMS pentest. Four vulnerabilities disclosed via CERT.pl (case #5598022): three RCE + one stored-XSS. Cross-checked across the 8-model triage benchmark to compare LLM accuracy on real, novel bugs.
- Cross-architecture coverage. Campaigns run on x86_64 with QEMU user-mode for ARM and MIPS targets where source builds aren't available.
What’s Next
- Phase 4: continuous-integration mode — new upstream releases automatically trigger a fresh fuzzing window with the carried-over seed corpus.
- Crash-grouping by faulting-stack signature so duplicate reports collapse before triage spend.
- Public methodology writeup with the 8-model triage benchmark numbers, on this blog under /research.