There are many ways to characterize problems that pop up with false positives and false negatives in science, health, statistics, forensics, and philosophy. From a security perspective, there are practical ramifications for both.  Security tools generate an alert either when nothing happened (false positive) or missed signs that something bad happened (false negative).

False positives can lead to a team becoming overwhelmed with and then ignoring the irrelevant alerts – sort of like the boy who cried wolf. Their appearance requires chasing down countless clues only to find out the issue is not really a problem. And they also require teasing out false-positive signals that indicate something did occur but is not an issue because protections against it exist. False negatives are much more dangerous. They could trigger an angry email from your boss demanding “why is our customer data being sold online?” Both false positives and false negatives are an inherent part of dealing with security. Everyone wants to avoid an angry boss, so they push for better security rules. But better security rules produce more false positives, which can result in complacency.

Better tools for creating the context for more sophisticated rules and automated workflows can help. We will get to that in a minute.

The error of type errors

Most deep dives into how false positives and false negatives arise seem to invariably delve into Type 1 Errors (false positives) and Type 2 Errors (false negatives). For instance, is a stork flying through the neighborhood as a baby is being delivered proof that storks deliver babies — or just a coincidence?

The same thing goes for any case where you are investigating a random pattern of behavior, such as whether a rare API access event signals a critical security incident or is just noise. Philosophers realized it’s much easier to prove that the opposite of your hypothesis is false rather than your hypothesis is true. This is called disproving the null hypothesis. If you can see that a baby was born on a day when no storks were around, then you know your hypothesis is false.

The philosophers settled on the idea of types of errors because they were not poets or engineers. No one has ever clarified what exactly a “type” is except that perhaps those pioneering philosophers struggled with a more helpful concept. Later, other philosophers came along with more elaborate Type 3 and Type 4 errors characterizing more subtle cases. For example, a conclusion might be accurate, such as when a security breach occurred, but the reason is wrong. Or that the conclusion is correct but not crucial because other security measures are in place. Fortunately, these more complex ideas never caught on among philosophers.

But sometimes you might stumble upon a friend at a cocktail party who feels compelled to talk about “type errors.” Perhaps you should ask them about the implications of quantum computers capable of executing statements that are simultaneously true and false. They might have an interesting perspective on whether the improbability of Schrödinger’s cat will end up trumping Gödel’s incompleteness theorem.  Gödel was confident that a program could never account for statements such as “this statement is unprovable.”

More useful terms

World War II radar engineers came up with a better way of framing the problem, which is also useful for security teams:  sensitivity and specificity.

Sensitivity is the true positive rate that characterizes the proportion of cases in which a test detects something like a security event or disease.

Specificity is the true negative rate which characterizes the proportion of times in which something like a disease or security event has not occurred and receives a negative result from the test.

There are clear trade-offs between any test, and it’s often helpful to combine them to mitigate the limitation of any specific test. For example, you might create one highly sensitive test to identify opportunities for further investigation. These results could be whittled further by a highly specific and perhaps slower test to identify cases that need human attention.

One of the most common debates in the medical community surrounds how different countries analyze mammogram test data. In the US, about 15% of women falsely test positive for cancer. In the Netherlands, even though they start with a similar test, these initial positives are further vetted by a second analysis of the same data resulting in only 1% of women falsely testing positive.

The paradox of false positives

One common misconception with false positives arises when trying to work out how low false-positive rates can lead to an overwhelmingly high percentage of bad alerts. At first glance, it may seem that a 15% error rate means that only 15% of the women that receive a notification actually have the disease.

But it can actually be much worse than that, depending on the rarity of the event. If only 1%  of the population actually has cancer, then 15-times as many people would get notified they have the disease for each one that actually has it! And this ratio goes up for even rarer phenomena.

For example, if a particular type of security event occurs .01% of the time and the rule has a 1% false-positive rate, then the team would get 100 alerts for each actual incident.

There are many mathematical approaches for arriving at the same conclusion in more complex cases, such as fault tree analysis and Bayes’ Theorem. Mathisfun has a simple walkthrough of how this works out in practice. But the upshot is that teams often end up with far more false positive security alerts than might be suggested by the false positive rate of a particular rule.

Dealing with them effectively

After a certain threshold of investigating false positives goes nowhere, your team may get complacent. Eventually, it may be tempting to ignore certain classes of security events, particularly if false positives significantly outnumber true positives. At the same time, it’s worth considering that the particularly loud alerts, such as a flood of signals resulting from a distributed denial-of-service attack, might be distracting your team from more subtle and sophisticated efforts in the background to drain bank accounts.

Koen Van Impe, incident response and threat intelligence manager at NVISO security, has suggested that teams take a measured approach to limiting the frequency and impact of false positives on their incident response plan. You want to keep false positives off your threat intel reports. You also want to find ways to use this data to improve the context for eliminating future false positives.

Dealing with false negatives requires finding ways to craft better detection algorithms. It is definitely preferable to learn from other people’s experiences than to discover that you have been breached after the fact. Ideally, this is all automated for a particular domain by your security tools provider. For example, the Traceable AI API security and observability platform automatically updates rules for detecting API security vulnerabilities as new attack patterns are discovered.

Automate the policy lifecycle

The more automated your process for streamlining the security policy lifecycle, the more likely it is to improve over time. If there are manual steps involved, teams will be vulnerable to alert fatigue and miss essential indicators.

This process needs to approach policy updates like DevOps approaches software updates. This includes automating the processes for characterizing false positives, identifying the appropriate context to reduce these, and then updating your security policies.

Gartner has described these capabilities as security orchestration, automation, and response (SOAR) tools. John Otsik, principal analyst at Enterprise Strategy Group, suggests that teams take this a step further with an architectural approach he calls security operations and analytics platform architecture (SOAPA).

It’s also essential to find ways to integrate data from across as many different sources as possible to make it easier to winnow down false positives. Otsik suggests teams use a layered approach to create a standard data pipeline for correlating security data across all kinds of analytics. This allows teams to think about how capabilities like API security, security information, and event management (SIEM) can be integrated.

Keep in mind that there are always trade-offs between tools. For example, tools focused on preventing API business logic attacks, like Traceable API security, analyze API security data across APIs more efficiently than is possible when this data is handed off across third-party tools. It can also take advantage of automated updates derived from the latest API-specific security research to reduce the rules update pain.

Another advantage of Traceable is that it reduces the need for manual updating of rules. As Traceable’s artificial intelligence (AI) catches anomalies, it checks them against common rule sets. This enables the system to not only protect against known attacks but also unknown ones, without the need for anyone to update rules. The AI updates itself.

With traditional Web Application Firewalls, once the team discovers a new way to differentiate false positives from true positives, someone has to go in and manually update a rule. In other cases, someone may have to manually note all of the characteristics of false positives and submit this back to the vendor. With Traceable’s design, the team just has to note which alerts were false positives and the system automatically correlates that feedback into its AI model update. This reduces the burden on IT teams in both reporting issues but also reduces the number of false positives they need to investigate in the future.

Storks, Schrödinger, and API security

For all their power, modern cybersecurity methods are still full of uncertainty and paradox. Just as Schrödinger’s cat could theoretically be dead or alive depending on the presence of an observer, a network security alert could signal imminent danger — or be nothing more than an annoying false negative. It may also be a distraction from the main attack going on in the background.

No security tool will ever catch every incident. And even if they did, hackers will find a way around them. However, teams can improve their process for improving their security policies.  This means simplifying the workflow for investigating alerts, indicating when they are false, no longer relevant, or somehow connected to other alerts as part of a single problem. The kinds of AI workflows built into Traceable’s observability tools can then use this data to train better rules so people don’t have to.

About the Author

George Lawton is a technology writer and regular contributor to The Inside Trace.

View a recorded demo of Traceable Defense AI.