Architecture

System architecture

The ingestion, analysis, detection, and delivery layers that make up WP-Safety, end-to-end. One diagram per layer, with the trade-offs that shaped each one called out explicitly.

What it is

Four layers, one pipeline

System at a glance

55k+
Plugins tracked
8k+
Themes tracked
34k+
CVEs indexed
4.9B
URLs per CC crawl

Core stack

Nuxt 3App + SSR
SQLite · WALMain DB
DuckDBCC observations
LLM cascadeResearch + PoC

The system is organized as four layers that run concurrently but loosely - each layer produces artifacts that the next one consumes, and each can be restarted independently when its inputs change. A failed LLM pass doesn't corrupt the CVE feed; a broken audit worker doesn't stop the CommonCrawl scanner.

Ingestion syncs from WordPress.org, the Wordfence Intelligence webhook, NVD, and the plugin SVN mirror. Analysis turns plugin source into AST taint flows, code signals, and LLM-annotated risk assessments. Detection scans the CommonCrawl corpus (and the occasional live site audit) to identify installed software. Delivery is the Nuxt app, the public API, and the Safety Radar WordPress plugin that ships the results to humans.

Data lives in three places that don't try to replicate each other: SQLite for the main product database (fast, simple, zero-ops), DuckDB for analytical queries over the CommonCrawl-derived observation dataset, and the filesystem for plugin source mirrors and generated artifacts. Each store owns the queries it's good at; nothing is kept in sync by hand.

Layer 1

Ingestion

Sources of truth for plugin metadata, CVEs, and source code. Each ingestor writes directly into the main SQLite database on its own schedule.

Ingestion layer

External sources

WordPress.org

Plugin API

api.wordpress.org/plugins/info/1.2/

Theme API

api.wordpress.org/themes/info/1.2/

source-of-truth

SVN mirror

Every version of every plugin, ever released

Vulnerability feeds

Wordfence webhook

Push-based; new CVE arrives within minutes

NVD sync

Nightly pull for official CVE records

PatchStack Core

Community disclosure feed

Main database

plugins

~55,000 rows · slug + metadata

themes

~8,000 rows

vulnerabilities

~34,000 rows · CVE + severity + patch diff

Layer 2

Analysis

Deterministic static analysis feeds LLM synthesis feeds verified reproduction. Each stage's output is the next stage's ground truth - the LLM only ever sees signals, never raw plugin source it might hallucinate about.

Analysis layer

Static analysis

PHP AST parser

php-parser, runs per plugin version

deterministic

Taint tracker

Sources → sinks, inter-procedural

Code signals

Escaping coverage, raw SQL, nonce checks

LLM research

Lightweight research LLM

Research-plan generation from patch diffs

Small fingerprint LLM

Risk-assessment prose + deduction explanations

Frontier-LLM fallback

Used when the lightweight tier can't produce usable output

PoC verification

Ephemeral VM

Provisioned from a snapshot per task, destroyed on completion

escalation

Agent cascade

Lightweight LLM → frontier LLM

Frontier judge

Independent verdict on every attempt

Back to the main DB

plugin_taint_flows

Graph per plugin version

plugin_risk_assessments

Score + deductions

vulnerabilities (enriched)

poc_*, research_*, verification artifacts

Layer 3

Detection

Two ingress points with shared fingerprint infrastructure. The CC pipeline runs in batch (monthly, at scale); the live audit runs on-demand (seconds, single site). Both resolve through the same matcher against the same fingerprint corpus.

Detection layer

CommonCrawl batch

WAT scan

Dedicated scanner VM, N-way parallel across CC shards

bandwidth-heavy

WARC range fetch

Targeted HTTP-Range downloads of matched URLs

DuckDB observations

Columnar store for analytical queries

Live audit

Public audit form

Any visitor, any URL, rate-limited

Residential proxy

Real-IP rotation, redirect-safe following

Worker process

Detached audit-worker.mjs, per job

Shared fingerprint matcher

Asset-path signals

/wp-content/plugins/slug/*

Direct DOM signals

JS globals, CSS classes, REST routes

Confidence scorer

Multi-signal agreement

Layer 4

Delivery

Three surfaces the human-facing data reaches through. Everything reads from the same main database - the app, the API, and the WP plugin never disagree because there's only one source.

Delivery layer

Main database (authoritative)

data/wp-safety.db

SQLite · WAL mode · every read path goes here

Public site

Nuxt 3 SSR

Plugin / theme / CVE / provider pages

Edge cache

Nitro SWR, per-route TTL

Dashboard

Monitored sites, alerts, saved scans

Public API

/api/v1/batch-lookup

Bulk plugin security scores

/api/v1/plugin-score/{slug}

Single plugin lookup

Bearer token auth

Rate-limited, SQLite-backed counters

Safety Radar WP plugin

Installed on WP

PHP + WP-CLI compatible

Reports site inventory

Plugin versions reported to /api/v1/site-plugins

Shows risk badges

Inline in the WP admin plugin list

Data layer

Three stores, zero hand-synced replication

The pipeline's data plane is split across three stores, each owning the queries it's good at. A write to one cannot corrupt the others, and a bug in any single store's computation can be fixed by re-running just that computation - nothing gets out of sync.

SQLite · WAL

main DB

Authoritative product database. Every page on the site reads from here; every ingestor and enrichment stage writes directly into it. Single-region, single-host, fast and simple.

Representative tables

plugins · ~55k rows
themes · ~8k rows
vulnerabilities · ~34k rows
plugin_taint_flows
plugin_risk_assessments
plugin_fingerprints_wat
plugin_fingerprints_direct

Not for

Wide analytical scans over tens of millions of observations. Those live next door in DuckDB.

DuckDB

analytical

Columnar analytical store for the CommonCrawl observation dataset. Produced by merging per-worker SQLite shards at the end of each crawl-processing run. Read-only to the application; new crawls rewrite it.

Good at

"Sites running plugin X, by provider"
"Version distribution of plugin Y across crawl"
"Plugins co-installed with Z, ranked"
"Adoption curve across historical crawls"

Not for

Row-level transactional writes. No OLTP semantics, no per-request mutation.

Filesystem

blobs

Plugin source mirrors (on-demand SVN exports today, full SVN mirror in progress), generated artifacts (Playwright traces, video bundles, fingerprint JSON), and anything else too big for a DB column and too static for a cache.

What lives here

Plugin source trees per slug/version
PoC Playwright trace ZIPs
PoC video MP4s + YouTube mirrors
LLM research-prompt exports

Not for

Queryable structured data. Files get referenced by path from the main DB; they never store the thing that gets queried.

Three stores, each owning the queries it's good at. Nothing is kept in sync by hand: SQLite is written by ingestors and the analysis stages, DuckDB is recomputed per CC crawl from a merge of per-worker shards, and the filesystem is referenced by path from rows that need it. A single bug in one store cannot corrupt the others.

Operational boundaries

What the architecture doesn't handle

Not in scope

by design

Real-time latency

CVE → scored plugin: minutes to hours, not seconds

intentional

Cross-region replication

Single-region SQLite - fine for current scale, not HA

standard SaaS

Multi-tenant isolation

Shared DB · user data separated by row, not by process

These are honest constraints. Each one is the result of a trade-off I made to keep the system small, auditable, and cheap to run at current scale. If you need microsecond CVE propagation or regional data residency, I'm not the right vendor today.

Ingest cadence. The Wordfence webhook is near-real-time; NVD sync and WP.org reruns are hourly-to-daily. A plugin's score can therefore lag a disclosure by up to a few hours if the ingestor queue is backed up or the risk-assessment pass is waiting on LLM quota. I publish the fetched_at timestamp on every plugin page so you can see exactly how stale the data is.

Single-region. The main SQLite database runs on one host. Better-sqlite3 in WAL mode handles our current read load comfortably, and we haven't needed the complexity of a Postgres cluster or multi-region replication. That's the right call at current scale; it's the wrong call if we ever need HA for enterprise SLAs, and I'd migrate if that changes.

Recrawl latency. CommonCrawl publishes new archives monthly. The hosting-mix and version-distribution statistics on plugin pages therefore update on that cadence - between crawls, the ecosystem view is frozen. Individual site audits through the public form are live; only the aggregate statistics ride the CC clock.

Go deeper

Subsystem deep-dives

One page per non-obvious subsystem, with the design decisions and trade-offs spelled out in enough detail to reproduce the thing from scratch.

For nerds only

Hi - Mika here. I built WP-Safety solo, so the methodology below is genuinely how it works, not a marketing sketch. The deep-dives are where I go long on the non-obvious details. Strictly optional - the plugin and CVE pages carry the full story without any of this.

Mika Sipilä·Founder, WP-Safety.org

ARCHITECTURE → POC AGENT

PoC agent cascade

Autonomous CVE reproduction: lightweight-LLM → frontier-LLM cascade, ephemeral WordPress per task, independent frontier judge, Playwright evidence bundle, and the budget discipline that keeps the whole thing honest.

Read the deep-dive

ARCHITECTURE → TAINT

Taint analysis

AST-level inter-procedural data-flow tracking across 7 superglobal sources, 31 sinks, and 47 sanitizers; the two-phase algorithm, the WordPress-specific special cases, and the edge cases it intentionally doesn't chase.

Read the deep-dive

Open data, auditable pipeline.

The methodology pages go deeper on each layer. Start with security scoring, detection, or PoC verification - whichever is most useful for what you're evaluating.

All methodology docs Browse plugins