Research & Articles
Sharing what the data shows us.
Watch our co-founder speak at VulnCon 2025
Watch our co-founder Jay Jacobs present research and facilitate industry discussions at CVE Program & FIRST VulnCon 2025 in Raleigh, NC.
Only Your Data Can Truly Anticipate Threats
In cybersecurity, understanding exploitation threats hinges on the quality and source of the data analyzed. Traditionally, vulnerability management has relied heavily on secondary source data, such as Known Exploited Vulnerabilities (KEV) lists, which compile vulnerabilities based on reported incidents. While these lists provide valuable references, relying solely on them leaves significant blind spots. Secondary sources, by nature, reflect past events, often with delays and incomplete context, leading organizations to respond reactively rather than proactively.
Explore Model Thresholds
Thresholds allow security teams to filter and assess which vulnerabilities are most critical to remediate. Organizations have to make tough calls when choosing which vulnerabilities to prioritize, and thresholds allows teams to make educated decisions based off global model data.
Explore Our New API Docs
Today we publicly launched our API documentation for Global and EPSS models. Our API is the primary way users interact with our data. Creating docs from scratch is a team effort. Here’s how we managed to draft, edit, and release our docs.
Supporting EPSS: Our Vision for a More Data-Driven Future
At Empirical Security, we have known for some time now that EPSS serves as essential infrastructure within cybersecurity operations (over 100 vendors incorporate it into their products today). Our support for EPSS aligns closely with our broader vision of evolving cybersecurity tools into a more rigorous and data-driven framework. Our longstanding position has been clear: all cybersecurity tools need to become significantly more data-driven to effectively handle the complexity of current threats.
Announcing The Empirical Security Global Model
We just launched a product that I believe is fundamentally different from anything in the market today. A solution that combines the largest collection of real time exploitation activity with years of experience in advanced vulnerability modeling. If you’ve used EPSS before and had a thought that started out with, “I wish that EPSS…” then hopefully this announcement is going to make you excited as well.
EPSS: Effort vs Coverage
One of the misconceptions is that the models we use are crafted somehow. I get it, it's a natural leap since many approaches in cybersecurity to measure risk and other measurements start out by picking some elements that feel important and then assigning a value as a weight then combine things with some basic arithmetic. This couldn't be further from reality as EPSS and our other models are trained on real world data. Using machine learning, statistics and perhaps even a dash of "AI", we allow the mathematics to tell us what's important and how important it is. So it shouldn't be surprising that things shift around when we go from EPSS version 3 to version 4 - where we've now added in thousands of vulnerabilities being used in ransomware and malware. Let's explore what that looks like in the data...
Introducing EPSS v4
The fourth iteration of the Exploit Prediction Scoring System (EPSS) is being released today. I have been working on EPSS for just over six years now. While I’d love to take you on a long meandering walk down memory lane, go into detail about all of the lessons we’ve learned along the way and introduce you to all of the wonderful people who’ve helped make EPSS better with each iteration, I’ll spare you the details and just offer a set of bullet points…
Probability and Prediction
Luckily EPSS v4 is scoring well over 250,000 vulnerabilities and each one of those is a probabilistic statement. We can measure the accuracy of the EPSS predictions by looking at all of those statements as a whole. One method of doing that is a calibration plot, which plots the prediction against reality. The horizontal on this plot bins up the predicted probabilities and the vertical looks up the proportion of each bin that were actually exploited in the wild.
AI In Cybersecurity: Looking Beyond The SOC
Artificial intelligence (AI) has become more than a buzzword in cybersecurity. Every new vendor, startup, and product marketing pitch touts advanced algorithms, machine learning, and real-time threat detection. While there’s no denying that AI can streamline operations in the Security Operations Center (SOC), an overemphasis on SOC-centric use cases creates unintended blind spots. The bulk of AI-driven innovation today aims to automate incident detection, speed up alert triage, and drive machine-scale resolutions for immediate threats. Although these capabilities are critical, focusing too heavily on the detect-and-respond phase overlooks the foundational predict-and-prevent side of cybersecurity. We’ll explore why the SOC is such a magnet for AI, what we risk by confining new technologies to that arena, and how the industry can shift toward a more proactive and holistic security strategy.
Cybersecurity is Ready for Local Models
To gain a real advantage in the age of generally available machine learning, defenders will have to move to the best possible models—and that means local models, using all the data at hand.
The Future of AI and ML In Cybersecurity
The future of AI/ML is the ability for a new way to interact with knowledge—one that does not require the skills that are already causing a workforce shortage in security. Vendors have access to troves of security data that the average customer or enterprise using their technology does not.
Exploit Prediction Scoring System
Despite the large investments in information security technologies and research over the past decades, the information security industry is still immature when it comes to vulnerability management. In particular, the prioritization of remediation efforts within vulnerability management programs predominantly relies on a mixture of subjective expert opinion and severity scores. Compounding the need for prioritization is the increase in the number of vulnerabilities the average enterprise has to remediate. This article describes the first open, data-driven framework for assessing vulnerability threat, that is, the probability that a vulnerability will be exploited in the wild within the first 12 months after public disclosure. This scoring system has been designed to be simple enough to be implemented by practitioners without specialized tools or software yet provides accurate estimates (ROC AUC =0.838)=0.838) of exploitation. Moreover, the implementation is flexible enough that it can be updated as more, and better, data becomes available. We call this system the Exploit Prediction Scoring System (EPSS).
Improving vulnerability remediation through better exploit prediction
We construct a series of vulnerability remediation strategies and compare how each perform in regard to trading off coverage and efficiency. We expand and improve upon the small body of literature that uses predictions of ‘published exploits’, by instead using ‘exploits in the wild’ as our outcome variable.
The Complexity of Prioritising Patching
As American journalist and essayist HL Mencken once wrote: “For every complex problem there is a solution that is concise, clear, simple, and wrong.” Anyone working in or around vulnerability remediation knows the apparently ‘simple’ task of applying a patch is anything but. The vulnerability lifecycle is filled with pitfalls and deceptively complex tasks.Anyone working in or around vulnerability remediation knows that the apparently ‘simple’ task of applying a patch is anything but. The vulnerability lifecycle is filled with pitfalls.The time and effort needed to remediate any single vulnerability across an entire enterprise are often underestimated. This creates an obvious and urgent demand for prioritisation, which requires we understand more about the world of vulnerabilities. Michael Roytman of Kenna Security and Jay Jacobs at the Cyentia Institute explore what the open vulnerability landscape looks like and investigate multiple factors contributing to the remediation efforts.
What I've Learned While Training Computers To Predict Cyber Risk
Organizations simply cannot reduce their risk and improve their security posture without having some way to predict, ahead of time, which threats and vulnerabilities will actually lead to an attack.
For Good Measure: Remember the Recall
We exist in a dualstage testing regime. We are subject to a low prevalence (rare event) environment. To act rationally in this scenario, the first test must remove as many false negatives as it can.
Exploring with a Purpose
We have the better, if harder, problem of the meta-analysis (“research about research”) of many observations, always remembering that the purpose of security metrics is decision support.
Measuring vs. Modelling
Using CVSS to steer remediation is nuts, ineffective, deeply diseconomic, and knee jerk; given the availability of data it is also passé, which we will now demonstrate.