Methodology

PoC verification pipeline

When a CVE arrives, a disclosure alone is just a claim. Our verification pipeline reproduces the exploit on a clean WordPress install, captures it on video and trace, and has a separate model decide whether the run actually demonstrated impact.

What it is

Verified exploits, not claimed ones

02:14

Reproduction

1Extract wp_rest nonce from window.lpData on a public quiz page
2POST /lp-ajax-handle with lp-load-ajax=delete_question_answer
3Confirm target quiz row was deleted via wp db query

Judge verdictEXPLOITABLE

A CVE disclosure is a text description - "this parameter is unsanitized," "this endpoint lacks a nonce check." The fact that something can be exploited in the abstract doesn't say whether it has been successfully reproduced against a real WordPress install. We treat a CVE as unverified until the pipeline has stood up a clean WP, installed the plugin at the vulnerable version, walked through the exploit with a real browser, and recorded the impact.

The pipeline runs two models in a cascade. A fast, cheap, lightweight LLM tries first; if it can't land the exploit cleanly, a stronger frontier LLM picks up on the same environment and gets a second attempt. The same class of strong model then reviews the tool-call log and issues an independent verdict - so a successful run isn't marked "failed" by a weaker judge the way a weaker judge is prone to do when tie-breaking its own decisions.

The output of a successful verification is an artifact, not a message. A YouTube video of the exploit, a Playwright trace you can step through request-by-request, a standalone exploit script you can re-run, and the exact vulnerable code snippet with the fix diff - all produced from the same reproduction session.

How it works

Research, reproduce, judge

Four phases. Phase 1 plans on the main server. Phases 2–3 run inside an ephemeral VM that is torn down after every task. Phase 4 reviews the tool-log after the fact, on the main server again, with no ability to mutate the run.

Phase 1 - Research

Input

CVE record + patch diff + plugin source

Lightweight research LLM

Reads the diff, writes a structured research plan

Frontier-LLM fallback

Used when the lightweight tier can't produce a usable plan

Phase 2 - Ephemeral VM

Ephemeral VM

Fresh snapshot per task, destroyed on completion

Docker WordPress + MariaDB

Plugin installed at exact vulnerable version

Playwright browser

Records video + trace of every action

Phase 3 - Executor cascade

Lightweight LLM first

Fast + cheap; lands most straightforward PoCs

escalation

Frontier LLM second

Invoked if the lightweight tier can't verify impact

Tool budget: 40 turns

http_request · wp_cli · bash_exec · browser_*

Phase 4 - Judge

Frontier LLM judge

Same tier as the escalation executor, independent verdict

Rules

Must exploit via vulnerable endpoint; wp_cli only for verification

Lightweight fallback

Used only when the frontier-LLM API key is absent

Artifacts

YouTube video

Unlisted, embedded on the vulnerability page

Playwright trace

Step-through of every request + DOM snapshot

Standalone exploit

Runnable script + vulnerable code + fix diff

Limits

What the pipeline can't verify

Unverifiable

complex

Stateful chains

Multi-user, multi-session, cron-gated exploits

skipped

WAF-dependent paths

Exploits that only work on a live target with specific rules

flaky

Time-sensitive

Race conditions, TOCTOU, timing side-channels

An unverified CVE isn't a harmless CVE. The vulnerability page shows both - verified exploits get a video and a green badge, unverified ones still surface every signal the static analyzer can extract.

Stateful exploit chains. A vulnerability that requires three users interacting across two pages and a cron job to land can sometimes be reproduced by a model with a 40-turn budget, but often can't. These fall back to static analysis - we still score the plugin, still flag the CVE, we just don't ship a video.

WAF / hardening dependent. Some exploits are interesting specifically because they evade a popular WAF rule. Reproducing them against a bare Docker-WordPress with no WAF either succeeds trivially (not a useful signal) or fails mysteriously (the WAF isn't there to evade). We don't simulate arbitrary production hardening.

Race conditions. TOCTOU, check-then-use, and timing side-channel exploits reproduce non-deterministically. A single agent run either happens to hit the window or doesn't; I don't currently re-run enough times to establish statistical confidence. Honest label in those cases: "research_complete, poc_pending".

Go deeper

Subsystem deep-dive

If you want the file-level detail on how the cascade is actually implemented - substrate, agent loop, judge prompts, budget discipline, evidence capture - the dedicated architecture page covers it in full.

For nerds only

Hi - Mika here. I built WP-Safety solo, so the methodology below is genuinely how it works, not a marketing sketch. The deep-dives are where I go long on the non-obvious details. Strictly optional - the plugin and CVE pages carry the full story without any of this.

Mika Sipilä·Founder, WP-Safety.org

PoC agent cascade

The internals: lightweight-LLM → frontier-LLM cascade, ephemeral WordPress substrate, independent frontier judge, Playwright evidence bundle, and the budget discipline. Includes a worked transcript of a real SQLi reproduction.

Read the deep-dive

See verified PoCs in action.

Browse the CVE database and filter for verified entries - each one has a video, a Playwright trace, and a standalone reproduction script you can run yourself.

Browse CVEs Back to docs