16.5x precision gain at 90% recall sounds like a benchmark result from an ML paper, but it's actually what happens when you freeze a literature corpus with a known denominator. TopVenues, an open-source system from Sidney Barbieri, does exactly that for cybersecurity reviews.
The Corpus as an Executable Artifact
TopVenues declares a venue and year scope, pulls metadata from DBLP Computer Science Bibliography as its spine, then enriches with abstracts and BibTeX via open APIs and publisher-specific extractors. The May 2026 snapshot holds 9,925 papers from 11 cybersecurity sources covering 2017 to 2026. Abstract coverage hits 99.86%, BibTeX coverage 99.99%. A keyword search over the entire corpus finishes in under 31 milliseconds. A 250-test suite validates data-integrity invariants.
The output is a monotonic SQLite snapshot, accessible over CLI, web interface, or exports for review workflows. No more reconstructing the denominator from publisher portals that change their query semantics every quarter.
What the Fixed Denominator Reveals
Fixed denominators enable repeatable measurement. TopVenues shows that 29.2% of 2024-2025 papers from the four top-ranked security conferences in its scope appear as arXiv preprints, with a median of five months before formal publication. That's useful for anyone tracking the gap between preprint and peer review.
More impressive: a prior-author-track-record filter yields a 16.5x precision gain at 90% recall for triaging preprints that later appear in the same venue set. If you're curating a reading list or a systematic review, that filter alone saves you days of manual screening.
Why Reproducibility Matters Here
Cybersecurity literature reviews suffer from a moving baseline. Every API change, every publisher portal redesign breaks the process of reconstructing the initial paper pool. TopVenues bakes the corpus construction into an executable, inspectable, citable artifact. Anyone can clone the repo, run the tests, and verify the exact set of papers used in a given review.
The artifact lives at https://github.com/sidneibarbieri/topVenues. If the field adopts this approach, future meta-studies won't need to apologize for an irreproducible denominator.
Source: TopVenues: A Reproducible Corpus and Tooling Substrate for Cybersecurity Literature Reviews
Domain: arxiv.org
Comments load interactively on the live page.