Methodology

Security score methodology

Every plugin and theme in the database carries a 0–100 score. Here's exactly what goes into it, how the weights are assigned, and what the number does and doesn't tell you.

What it is

A single number, multiple signals

Example deductions

Patched high CVE (2024)−6
Raw SQL in 2 files−4
Slow patch velocity−3

Illustrative - real plugin pages show every deduction with a link back to the source evidence.

Security is a distribution, not a boolean - but users have to make a yes/no install decision. The score collapses the distribution into one reviewable number so the decision can happen quickly, while the underlying signals stay visible on the plugin's page for anyone who wants to audit how the number was produced.

Higher is safer. A fresh plugin with zero CVEs, strong escaping patterns in its code, and an active maintainer scores in the 90s. A plugin with multiple unpatched critical CVEs, weak sanitization at dozens of sinks, and an absent maintainer scores in the teens.

The score is recomputed deterministically from inputs that are themselves publicly auditable: the CVE feed, the plugin's own source code from WordPress.org's SVN, our AST analyzer, and our CommonCrawl-derived deployment observations. Given the same inputs, the pipeline produces the same score - no manual overrides, no human fingers on the scale.

How it's computed

From raw signals to a single number

Four signal families flow into a deterministic aggregator, which an LLM annotates with human-readable deductions. The score itself is always reproducible from the signals.

Inputs - 4 signal families

Vulnerability history

CVE feed

Wordfence Intelligence, NVD, PatchStack Core

Severity mix

Counts per Critical / High / Medium / Low

up to −20

Patch status

Patched vs unpatched, days-to-patch

Code analysis

up to −15

AST taint flows

Sources → sinks, with sanitizer coverage

Code signals

Dangerous functions, raw SQL, output escaping

Attack surface

AJAX / REST / hooks, nonce + cap checks

Developer signals

Patch velocity

Median time-to-patch across this dev's plugins

up to −10

Maintenance

Recency of last release, abandonment heuristics

Historical CVE rate

CVEs per year across their portfolio

Deployment reality

Install count

WP.org active_installs

Version distribution

% of live installs on vulnerable vs fixed

Hosting mix

From our CommonCrawl corpus

Deterministic aggregator

Normalize

Each signal → 0–1 scalar; missing data fails closed

Weight

Severity × recency decay × exposure

Sum & clamp

Subtract weighted deductions from 100, floor at 0

LLM explainer

Small explainer LLM

Reads the numeric deductions + source evidence

Generates prose

Human-readable 'why' for each deduction

Never alters the number

Score and deductions are computed before the LLM is called

Final output

Score (0–100)

Displayed on the plugin / theme page

Deductions list

Every point lost, with a link back to the evidence

Reproducible

Same inputs → same score, always

The actual weights

What each deduction costs

The aggregator starts at 100 and subtracts these ranges based on the evidence. Ranges (rather than fixed points) let the aggregator distinguish a single moderate issue from a pattern of them.

Signal	Deduction range	When it applies
Unpatched critical CVE	−15 to −20	Any CVE with CVSS ≥ 9.0 that is not yet fixed upstream.
Patched critical CVE	−4 to −8	Decays with time since patch; recent fixes count more.
Unpatched high / medium	−8 to −12	CVSS 4.0–8.9 without a fix available.
Critical taint flow	−10 to −15	AST-derived sink reachable from an unauthenticated source.
Raw SQL queries	−5 to −10	String-concatenated queries bypassing $wpdb->prepare().
Missing nonce / cap checks	−5 to −10	AJAX or REST handlers with no verify_nonce / current_user_can.
Unescaped output	−3 to −8	echo $var without an esc_* wrapper at >N locations.
Abandoned maintenance	−5 to −10	No release in 18+ months, no response to disclosures.
Developer trust drag	−3 to −8	Same author has slow patch velocity on other plugins.

Limitations

What the score can't see

Blind spots

0 sig

Zero-days

Undisclosed CVEs can't be counted

varies

Your configuration

Plugin × your WP setup interactions

WIP

Supply chain

Bundled libs surfaced, not yet scored

The score is a prior, not a verdict. Always cross-check the plugin page's attack surface and bundled-library list before installing on critical infrastructure.

Zero-days by definition. Vulnerabilities that haven't been disclosed don't move the score. A plugin can score 95 today and 40 tomorrow when a critical CVE lands - the score reflects what's known, not what's hidden.

Your specific configuration. A plugin with an AJAX handler that's dangerous only when combined with a rare WordPress setting might score fine in aggregate and still be catastrophic on your site. The plugin page's attack surface map shows the raw entry points; use the score as a prior, not a verdict.

Supply-chain risk. Bundled libraries are surfaced on the plugin page but not directly deducted in the score. A plugin shipping an outdated copy of a common library can be a real risk even when the plugin's own code is clean. I'm working on folding this into the score.

Go deeper

How the signals actually get produced

The scoring engine is the end of the pipeline; the interesting work happens in the stages that produce the signals feeding it. The deep-dive below unpacks the AST taint analyzer that provides the deterministic code-security signals.

For nerds only

Hi - Mika here. I built WP-Safety solo, so the methodology below is genuinely how it works, not a marketing sketch. The deep-dives are where I go long on the non-obvious details. Strictly optional - the plugin and CVE pages carry the full story without any of this.

Mika Sipilä·Founder, WP-Safety.org

Taint analysis

AST-level inter-procedural data-flow tracking across 7 superglobal sources, 31 sinks, and 47 sanitizers. The two-phase algorithm, the WordPress-specific special cases (prepare, array_map, nonce/capability guards), and the edge cases the analyzer intentionally leaves unchased.

Read the deep-dive

PoC agent cascade

The autonomous-agent pipeline that produces verified CVE reproductions. Lightweight-LLM → frontier-LLM cascade, ephemeral WordPress substrate, independent frontier judge, Playwright evidence bundle, and the budgets that keep everything honest.

Read the deep-dive

See the score in action.

Browse any plugin's page to see its score, every deduction that went into it, and the raw evidence behind each one.

Browse plugins Back to docs