Cybersecurity risk assessment is a nebulous process that requires a delicate balance between art and science. Typically, a risk assessor begins by collecting relevant information for all the identified risk factors. The assessor gathers information from logs, architecture diagrams, network topology, compliance assessments, incidents, vulnerability assessments, threat modeling and control assessment. Based on collected evidence and using an approved methodology, the assessor analyzes risk factors to identify risk severity using approved impact and likelihood tables. The accuracy of the risk assessment relies on the assessor’s ability to infer from the vast amount of collected information and make appropriate corrections to the computed risk severity. The assessor relies on experience, knowledge and observations of the system to contextualize the risk.
Cybersecurity risk assessments are qualitative. It is challenging and sometimes impossible to find accurate and reliable quantitative measurements for cybersecurity risk factors. Quantitative estimates are labor-intensive and time-consuming; therefore, they are not a scalable methodology for risk assessments in cybersecurity. Scaling qualitative and quantitative risk assessments are both constrained by a lack of skilled personnel, the number of staff required, assessment duration and business priorities.
Using machine learning (ML) to solve the scaling problem of risk assessment by predicting risk severity of new risk based on existing risk assessments could offer a solution. ML algorithms address assessors’ concerns in the form of intuition, insight and expertise inherently included in the current risk assessments to predict risk severity. ML can be used as a first line of action and then risk can be analyzed further if it meets a set threshold. With each manual analysis, the algorithm learns, eliminating manual interventions significantly.
Problem Statement
The question is whether, given previous observations of a cyberrisk register, it is possible to predict the risk severity for a new scenario using an ML algorithm without actually conducting a complete risk assessment. If there are three possible risk severity classifications (high, medium, low), can the ML algorithm consume risk from an existing risk register and predict a risk severity classification for a risk that it has not seen before (figure 1)?
Methodology
There are many ML algorithms in use, and each has its strengths for a specific type of application. In this particular case, Bayesian probabilistic theory works very well because of the probabilistic nature of risk. The Bayesian algorithm relies on the conditional probability stated as P(A|B) read as the probability of A occurring given that B has already occurred (figure 2).1, 2
In other words, this algorithm can be used to predict the likelihood that A will occur if a set of observations B has already occurred. Figure 3 shows an example3 of how Bayes theorem can be applied.
In the example shown in figure 3, the question being addressed is, “What is the probability that an unclassified datum is a movie or a book, given that it is a Western?” which makes it a classification problem. Figure 4 provides an explanation of the various components needed to answer the addressed question.
The solution to this problem is simply an extension that includes calculating two posteriors P(movie|Western) and P(book|Western) and classifying as the one with a higher probability or conviction. This technique can be applied when solving risk assessment problems.
Problem Statement Revisited
The problem is to classify a new risk into one of the three severity categories, given the new risk (evidence) containing a vector of risk attributes and its values. The existing data in the risk register can be used to compute the required prior probabilities. The intent is to classify new risk based on previous observations in the risk register.
Solution
The steps in figure 6 classify the risk evidence, which is a pseudo-code for the solution in figure 5:
- For each risk attribute in the cyberrisk register and for each possible value of that attribute, find the attribute_value/classification probability. For each attribute_value, there are three probabilities: low, medium and high.
- To predict the risk severity of the new risk, find the corresponding attribute_value/classification probability computed in the previous step for each attribute_value for each of the three classifications.
- Find the overall probability of each classification as a product of attribute_value/classification of the new risk severity that will be predicted.
- Choose the class with higher value.
Bayes theorem is also known as Naive Bayes. Naive comes from the fact that the algorithm assumes that the attributes in (r) are independent of each other. There is a naive assumption that the change in one attribute does not change the value of another attribute in the vector; hence, there is no correlation (direct or inverse) between attributes. In reality, the attributes are indeed correlated and change accordingly.4 This fundamental violation of Naive Bayes can be addressed by pre-processing the evidence (r) in the following way:
- Eliminate redundant variables (e.g., risk likelihood, risk impact is embedded in risk severity, hence, not required).
- Reduce the number of attributes by combining two or more and creating artificial attributes (e.g., threat attributes such as actor, intent and origin can be combined to form a threat vector and represented as threat vector [TV_ID] in the risk attributes).
To use Bayesian ML algorithm, the first step is to set up the threat vector and risk attributes and possible values it can take in the cyberrisk register. Threat vector and cyberrisk register attributes and available values are shown in figure 7.
Figure 8 shows an example of a cyberrisk register that contains risk assessments based on the attributes and available values described previously. (A real risk register will have many more attributes and possible values.)
Step 1: Calculate the Conditional Predictability
For each risk attribute and for each possible value
in the cyberrisk register, calculate the conditional
probability for attribute_value/classification
(these values are used to predict severity in the
following step).
Step 2: Apply the Calculated Conditional Probability
to Predict Severity of New Risk
One can predict the risk severity for the following
new risk: {TV_ID=2, Asset=ESD, Vulnerability=RDF,
Current_Control=MCP}
Conditional probability for each attribute in the new risk is read from figure 9 for each severity (classification) type. Conviction of belonging to each severity type is identified by calculating the overall conditional probability (figure 10 and figure 11).
Similarly, severity of another risk using the conditional probabilities from figure 9 can be predicted. It is necessary to find the individual probabilities for each attribute of the new risk for each severity type and combine them to find the conviction of belonging.
{TV_ID=1, Asset=DCS, Vulnerability=SI, Current_Control=MCP}
Step 3: Calculate the Overall Probability
(Conviction) for Each Severity Type by Combining
Attribute_Value/Classification of the New Risk
The last leaf in each branch in figure 10 and figure
11 shows the combination of attribute_value/classification to derive the conviction.
Step 4: Choose the Class With the Highest
Conviction
For the first new risk (figure 10), the conviction that
this risk belongs to high severity is the highest; hence, it is rated as high. Similarly, for the second
new risk (figure 11), the conviction is strongest in
the medium risk, and it is rated as medium.
Conclusion
Bayesian-based ML algorithms give us the ability to classify based on conviction and not just based on frequency. The probability of the evidence occurring was calculated in each of the classification types, and the one with the highest conviction was chosen. As the amount of observation increases, the error in classification decreases because the error in probability decreases. Importantly, the observations carry the intrinsic intuition of the assessor, and the algorithm models this intuition.
This algorithm can be used as the first line of action for initial classification and the assessor can spend more time vetting risk that meets a certain threshold, thus optimizing resource use and improving the algorithm gradually. There are many other use cases in risk management to which this technique can be applied to further improve the prediction. For instance, this ML technique can be used to predict control efficacy and vulnerability rating and be further incorporated into predicting risk severity. Furthermore, all of these calculations can be automated simply using Excel or a sophisticated combination of data science (data frames) and packages such as TensorFlow.
Endnotes
1 Kubat, M.; An Introduction to Machine Learning,
2nd Edition, Springer International Publishing,
USA, 2017
2 Brownlee, J.; “A Gentle Introduction to Bayes
Theorem for Machine Learning,” Machine
Learning Mastery, 4 October 2019,
http://machinelearningmastery.com/bayes-theorem-for-machine-learning/
3 Chun, L.; “Understand Bayes Theorem
(Prior/Likelihood/Posterior/Evidence),” 11 July 2013, http://www.lichun.cc/blog/2013/07/understand-bayes-theorem-prior-likelihood-posterior-evidence/
4 Op cit Kubat
Srinidhi Mallur, CRISC, CISM
Works in the industrial control systems security department at Saudi Aramco. He can be reached at srinidhi.mallur@aramco.com.