AFA — Autonomous Fuzzing Agent

v0.3.0 · active research AFA / Fuzzer

Autonomous Fuzzing Agent — LLM-Augmented AFL++ Campaigns

An LLM-augmented fuzzing platform that targets open-source software with AFL++ campaigns guided by AI seed generation, harness construction, and cross-model triage of crashes. Built to scale: 13 target projects in rotation, ~600 evolved seeds across the corpus, an 8-model benchmark for triage accuracy, and an automated CVE-disclosure path through CERT.pl.

AFL++ CMPLOG ASAN LLM harness generation multi-model triage

Targets in rotation

OSS libraries + network daemons

Evolved seeds

598

corpus across all targets

Triage models

cross-benchmarked on real crashes

Disclosure path

CERT.pl

automated triage → ticket

What It Does

Seed generation. An LLM reads each target's protocol grammar / file format and produces a starter seed set. The seeds get fed to AFL++ which evolves them via coverage-guided mutation. The LLM is re-invoked periodically with current coverage gaps to bias future seed proposals.
Harness construction. For library targets, the LLM writes a fuzzing harness (the LLVMFuzzerTestOneInput entry point and its argument decoders) given the target's headers. Build, compile, sanitiser-instrument (ASAN + UBSAN), and integrate into AFL++ with CMPLOG + persistent mode.
Crash triage. Every reproducible crash gets analysed by an 8-model ensemble (mix of local + cloud LLMs). Each model independently classifies severity, root cause family (UAF, OOB, integer overflow, type confusion, etc.), and likely CVSS vector. Disagreement between models surfaces the case for manual review; consensus crashes auto-progress to CERT.pl draft.
Disclosure. Consensus high-severity crashes get a CERT.pl ticket drafted automatically with proof-of-concept, affected versions, and a suggested CVSS score. Disclosure runs on the standard 90-day clock from upstream notification.

Notable Results

wolfSSL 5.9.0. Three AFL++/ASAN harnesses, multi-hour campaigns running on a dedicated pentest host. Coverage-guided exploration of the TLS handshake state machine, X.509 parser, and cipher-suite negotiation.
LibreNMS pentest. Four vulnerabilities disclosed via CERT.pl (case #5598022): three RCE + one stored-XSS. Cross-checked across the 8-model triage benchmark to compare LLM accuracy on real, novel bugs.
Cross-architecture coverage. Campaigns run on x86_64 with QEMU user-mode for ARM and MIPS targets where source builds aren't available.

What’s Next

Phase 4: continuous-integration mode — new upstream releases automatically trigger a fresh fuzzing window with the carried-over seed corpus.
Crash-grouping by faulting-stack signature so duplicate reports collapse before triage spend.
Public methodology writeup with the 8-model triage benchmark numbers, on this blog under /research.