It’s Not About Making a Scoring System
While talking about our modeling and predictive analytics, I was recently asked: “Why do we need another scoring system?” In a space where everyone/anyone creates a vulnerability score and says “trust me” (but with a lot more words), this may seem like a reasonable question, but it is not the question we should be asking when someone presents a score to support your decisions.
Most people think of modeling as the task to figure out how to convert data to a score. Perhaps best captured by Ben Franklin and his “weighted analysis” approach, which starts by identifying a list of things that seem important (“pros and cons” as ol’ Ben says), and then “endeavour to estimate their respective weights.” So figure out what is important, then come up with some weights and combine them and Voila! We have a score that feels pretty good to most people. But we have no idea if it’s helpful.
Let’s come forward in time by a few hundred years and try to benefit from all the advancements in science, statistics, and machine learning. We now have many very fancy methods that are all designed to learn through observations. They “study” the various inputs and attributes and the connected outcomes from some system (like vulnerabilities and their exploitation activity). With enough observations, they are designed to develop an approach that connects the input to the observed output states. While there are some amazingly interesting approaches, that’s not the real advancement, the real benefit came in advancing our ability to measure the performance. While anyone can create a method that feels pretty good, the real benefit is knowing just how much better one approach is than another. Being able to measure the strengths, limitations and general performance is what separates science from pseudoscience.
“Why do we need another scoring system?” is the wrong question. We should be asking about the performance. Real solutions, real predictive analytics focus on understanding the relationships and improving knowledge and in the end, it’s all about measuring performance.
How does the model perform?
The original question is totally valid because we don’t need multiple scores, we just need one good score. The problem is, for whatever weird reason, we have tried to find the “good” one through discussion, debate and worse, belief. We need measurement, we need comparisons and benchmarks. The time of “trust me” in cyber security needs to wind down now. Our next post will walk through an example of a model performance measurement (detecting exploits across GitHub).
We’ve had a tendency (to put it nicely) in cybersecurity to justify our scores with stories and anecdotes, which leads to questions like “Why do we need another scoring system?” Instead, we should be asking model producers how they know their approach is better than alternatives? What are the metrics used? We need to start asking these questions and demanding more if we want to move together as an industry!