Attacking AI — 120 Prompts From 16 Unique Sources In 30 Days
Our LLM honeypot impersonates a small commercial chat-completion
endpoint (POST /v1/chat/completions, model
nexova-assistant-v2) with an OpenAI-compatible API
surface. In 30 days we logged 120 prompts across 19
sessions from 16 unique source IPs. Small numbers
compared with the SMTP-and-RDP volumes elsewhere in the fleet,
but the shape is interesting: 67% of classified prompts
tagged as OWASP LLM03 (Training Data Poisoning), the
rest split between benign use, sensitive-info-disclosure
attempts, and a single Prompt Injection.
The decoy — what the attacker sees
Two OpenAI-compatible endpoints:
GET /v1/models returns a small model catalogue
(nexova-assistant-v2 + a couple of plausible siblings),
and POST /v1/chat/completions serves
templated responses. A third path POST /chat exists for
non-OpenAI-shaped clients. The model name is intentionally distinct
from any real product so we know any traffic that asks for
nexova-assistant-v2 came from a scanner that read
our /v1/models response — that’s how we
separate “hit the surface by accident” from
“deliberate interaction.”
| Endpoint | Model requested | Hits (30d) |
|---|---|---|
/v1/models | (discovery, no model) | 81 |
/v1/chat/completions | nexova-assistant-v2 | 31 |
/chat | (generic) | 8 |
The 81 /v1/models hits are the canonical scanner shape:
ask for the model list first, then either move on (most do) or
follow up with chat completions (31 did). The 8
/chat hits are from older clients that don’t
know the OpenAI surface — useful as a control sample.
OWASP LLM Top 10 classification breakdown
| Class | Description | Prompts | % of classified |
|---|---|---|---|
| LLM03 | Training Data Poisoning | 81 | 67% |
| BENIGN | No attack indicator | 31 | 26% |
| LLM02 | Sensitive Information Disclosure | 4 | 3.3% |
| LLM05 | Improper Output Handling | 3 | 2.5% |
| LLM01 | Prompt Injection | 1 | 0.8% |
Per-category — what they actually asked
Most LLM03 hits in our 30-day window came from a single source (
199.127.61.253, 51 of the 81 prompts — 63% of
this category, 43% of all classified prompts). Looks like one
researcher or kit operator working through a corpus of poisoning
shapes against our endpoint. The remaining LLM03 traffic is
spread thinly across other sources.
192.168.0.44 (zion, our internal
pentest box) with 31 prompts — that’s our own
functional-testing traffic against the honeypot. Real internet
BENIGN is a small residual that mostly comes from polite scanners
(“Hello?”, “What model are you?”).
Top source IPs
| Source | Prompts | Primary category | Notes |
|---|---|---|---|
199.127.61.253 |
51 | LLM03 | Dominant LLM03 source; long campaign of poisoning shapes |
192.168.0.44 |
31 | BENIGN | Internal — our pentest box, functional testing |
91.150.207.206 |
8 | LLM03 | Smaller LLM03 burst, different fingerprint |
185.150.191.236 |
4 | LLM02 + LLM05 | One of the few sources with multi-category attempts |
104.243.34.165 |
4 | LLM02 | System-prompt extraction attempt + follow-ups |
What this is and isn’t
Honest framing: 120 prompts over 30 days is not a flood. LLM honeypots are a low-volume surface because OpenAI-compatible API discovery is not yet part of the standard Shodan/Censys scan tree. Most of the traffic we get is from focused sources — researchers, red-team kits in development, and the occasional security product testing its own rules. That makes each prompt valuable on its own.
The single most interesting finding is the per-source concentration: two sources produced 60% of the classified-attack traffic (199.127.61.253 with 51 LLM03 prompts, 91.150.207.206 with 8). That’s the shape of dedicated tooling, not opportunistic scanning. We watch these sources across the rest of the fleet to see if they pivot to other surfaces — so far they haven’t.
If you’re running an LLM endpoint on the internet
- The
/v1/modelsratio is your early warning. If/v1/modelshits are not followed by/v1/chat/completionsfrom the same source within a few seconds, you’re looking at a catalogue scan and you can mostly ignore it. If they are, that source is specifically interacting with you — treat them as a potential pen-test or attacker. - System-prompt exfiltration probes are visible from one request. The “what were your initial instructions?” shape is easy to detect at the prompt layer with a simple regex / classifier — you don’t need to reason about it at inference time.
- Output-handling attacks need a downstream filter. LLM05 prompts try to weaponise the model’s output, not the model itself. The fix isn’t at the model layer — it’s at whatever code renders the response. Treat LLM output the way you treat any other untrusted string.
- Watch source concentration. 60% of classified attack traffic coming from two IPs in a 30-day window is the shape of focused tooling, not background noise. If your own endpoint sees the same concentration, that’s the source to characterise first.