System architecture
The ingestion, analysis, detection, and delivery layers that make up WP-Safety, end-to-end. One diagram per layer, with the trade-offs that shaped each one called out explicitly.
Four layers, one pipeline
- 55k+Plugins tracked
- 8k+Themes tracked
- 34k+CVEs indexed
- 4.9BURLs per CC crawl
- Nuxt 3App + SSR
- SQLite · WALMain DB
- DuckDBCC observations
- LLM cascadeResearch + PoC
The system is organized as four layers that run concurrently but loosely - each layer produces artifacts that the next one consumes, and each can be restarted independently when its inputs change. A failed LLM pass doesn't corrupt the CVE feed; a broken audit worker doesn't stop the CommonCrawl scanner.
Ingestion syncs from WordPress.org, the Wordfence Intelligence webhook, NVD, and the plugin SVN mirror. Analysis turns plugin source into AST taint flows, code signals, and LLM-annotated risk assessments. Detection scans the CommonCrawl corpus (and the occasional live site audit) to identify installed software. Delivery is the Nuxt app, the public API, and the Safety Radar WordPress plugin that ships the results to humans.
Data lives in three places that don't try to replicate each other: SQLite for the main product database (fast, simple, zero-ops), DuckDB for analytical queries over the CommonCrawl-derived observation dataset, and the filesystem for plugin source mirrors and generated artifacts. Each store owns the queries it's good at; nothing is kept in sync by hand.
Ingestion
Sources of truth for plugin metadata, CVEs, and source code. Each ingestor writes directly into the main SQLite database on its own schedule.
Analysis
Deterministic static analysis feeds LLM synthesis feeds verified reproduction. Each stage's output is the next stage's ground truth - the LLM only ever sees signals, never raw plugin source it might hallucinate about.
Detection
Two ingress points with shared fingerprint infrastructure. The CC pipeline runs in batch (monthly, at scale); the live audit runs on-demand (seconds, single site). Both resolve through the same matcher against the same fingerprint corpus.
Delivery
Three surfaces the human-facing data reaches through. Everything reads from the same main database - the app, the API, and the WP plugin never disagree because there's only one source.
Three stores, zero hand-synced replication
The pipeline's data plane is split across three stores, each owning the queries it's good at. A write to one cannot corrupt the others, and a bug in any single store's computation can be fixed by re-running just that computation - nothing gets out of sync.
plugins· ~55k rowsthemes· ~8k rowsvulnerabilities· ~34k rowsplugin_taint_flowsplugin_risk_assessmentsplugin_fingerprints_watplugin_fingerprints_direct
- "Sites running plugin X, by provider"
- "Version distribution of plugin Y across crawl"
- "Plugins co-installed with Z, ranked"
- "Adoption curve across historical crawls"
- Plugin source trees per slug/version
- PoC Playwright trace ZIPs
- PoC video MP4s + YouTube mirrors
- LLM research-prompt exports
What the architecture doesn't handle
These are honest constraints. Each one is the result of a trade-off I made to keep the system small, auditable, and cheap to run at current scale. If you need microsecond CVE propagation or regional data residency, I'm not the right vendor today.
Ingest cadence. The Wordfence webhook is near-real-time; NVD sync and WP.org reruns are hourly-to-daily. A plugin's score can therefore lag a disclosure by up to a few hours if the ingestor queue is backed up or the risk-assessment pass is waiting on LLM quota. I publish the fetched_at timestamp on every plugin page so you can see exactly how stale the data is.
Single-region. The main SQLite database runs on one host. Better-sqlite3 in WAL mode handles our current read load comfortably, and we haven't needed the complexity of a Postgres cluster or multi-region replication. That's the right call at current scale; it's the wrong call if we ever need HA for enterprise SLAs, and I'd migrate if that changes.
Recrawl latency. CommonCrawl publishes new archives monthly. The hosting-mix and version-distribution statistics on plugin pages therefore update on that cadence - between crawls, the ecosystem view is frozen. Individual site audits through the public form are live; only the aggregate statistics ride the CC clock.
Subsystem deep-dives
One page per non-obvious subsystem, with the design decisions and trade-offs spelled out in enough detail to reproduce the thing from scratch.

Hi - Mika here. I built WP-Safety solo, so the methodology below is genuinely how it works, not a marketing sketch. The deep-dives are where I go long on the non-obvious details. Strictly optional - the plugin and CVE pages carry the full story without any of this.
PoC agent cascade
Autonomous CVE reproduction: lightweight-LLM → frontier-LLM cascade, ephemeral WordPress per task, independent frontier judge, Playwright evidence bundle, and the budget discipline that keeps the whole thing honest.
Taint analysis
AST-level inter-procedural data-flow tracking across 7 superglobal sources, 31 sinks, and 47 sanitizers; the two-phase algorithm, the WordPress-specific special cases, and the edge cases it intentionally doesn't chase.
Open data, auditable pipeline.
The methodology pages go deeper on each layer. Start with security scoring, detection, or PoC verification - whichever is most useful for what you're evaluating.