Dec 28, 2022

The quest for the truth in cybersecurity data



(Photo by Chris Liverani on Unsplash)

As the saying goes, "if you torture the data long enough, it will confess." Interpreting cybersecurity statistics can be challenging, especially those that receive media attention. It is important to approach these statistics with a critical eye and consider the context in which they were collected, the potential biases of the data sources, and other factors that could impact their accuracy and relevance.

For example, it was recently reported that ransomware attacks in Finland have increased significantly in 2022. However, upon further investigation, I found out that while there were 3 ransomware attacks on essential service providers in 2021, there were 11 such attacks in 2022. This is a whopping 300% increase!

To understand if the increase is really significant, let's consider the total number of essential service providers in Finland, which is estimated to be between 1000 and 2000. Using the conservative number 1000, this means that in 2021, ransomware attacks targeted 0.3% of essential service providers, while in 2022, the number rose to 1.1%. Alternatively, the increase could be described as a 0.8 percentage point increase.

Four times more ransomware attacks this year, or 300% increase, or 0.8 percentage points increase or just saying that there were 8 attacks more than last year? Your pick depending on what message you want to deliver.

Analysing the trustworthiness of cybersecurity statistics or survey results can be hard work. My tips for a quick and dirty analysis are:
  • Do you believe that the source of the information is objective?
  • Is the tone of the message matter-of-fact rather than attention-seeking?
  • Is the method of data collection and analysis described?
  • Do the conclusions make sense based on your own view of the situation?
I would be much more inclined to believe the results if I would get Yes to all four questions. 

If you want to dig deeper, you may consider the following factors:
  • The context in which the statistics were collected and reported
  • Any potential biases of the data sources
  • Whether the study covers only successful breaches or also blocked attacks
  • The possibility of cherry-picking or random variation in the results
  • The source and size of the data and how it was sampled, as well as any explanation of uncertainty levels
  • The clarity of terminology, such as the use of terms like "breach," "incident," and "hack"
  • The understanding that correlation does not equal causation
  • The consideration of absolute risk, not just relative risk
  • The presence of other studies that support the results
Going back to that ransomware attack increase example. It's one thing to understand what has happened and another thing to understand why. My example just showed that conclusions can be delivered differently depending on an agenda. Reason for ransomware attacks increase could be for example Russian-Ukrainian war related activity, criminal activity, increase in zero-day vulnerabilities, changes to organizations infrastructure because of remote work or combination of many. The why would be important to know in order to understand risk and decide about possible actions.

Surveys and statistics can be useful in understanding the state of cybersecurity and trends in the field. However, it is important to approach these statistics with caution and consider all of the factors that could impact their accuracy and relevance.

With cybersecurity statistics and surveys, it also applies, that if the results sound too good or too bad - they are probably not true. 


No comments:

Post a Comment