Last month, in preparation for a panel, I was asked to put together a list of pros and cons with respect to using Big Data techniques in the context of information security technologies. While Big Data has benefits that span many security disciplines, it’s important to look at this from the perspective of what we call Advanced Malware Protection, which is the ability to discover, analyze and block advanced malware.
With that context in mind, let’s look at the "pros" of Big Data.
Pro #1: Security is often about detecting anomalies, and to do so, you need to have a full spectrum view that you typically can only get if you have enough data to know what constitutes "normal" versus "abnormal".
Nowadays, you infrequently see the same piece of malware on a large number of machines. Furthermore, malware is highly ephemeral. In fact, about three quarters of the malware we see has a lifetime of zero -- the first time we see it in the wild is the same as the last time we see it. From the perspective of these metrics, malware behaves aberrantly. A Big Data approach helps to better pinpoint malware that other solutions miss.
Pro #2: The goal with many information security solutions is to translate "back-office intelligence" into "customer-facing protection". In recent years, the amount of back-office intelligence security firms are dealing with has grown tremendously (e.g., growth of malware samples, large volumes of sensor data, etc.). Big Data techniques lend themselves nicely to this domain.
When people talk about using big data in the context of information security, the emphasis is typically on correlating a plethora of information sources to provide more intelligent decision-making capabilities. However, being able to speed up existing operations is especially critical and is a place where big data techniques can also help. Given the highly ephemeral nature of modern malware, eliminating data flow bottlenecks is a necessity rather than a nicety.
Pro #3: To make the most accurate (security) decisions, we need to take advantage of all the intelligence available to us -- from sensors, logs, user activity, etc. Big Data techniques can be used to extract the most value from this wealth of information.
The right kind of solution doesn’t just concern itself with processing samples like a typical security vendor tries to do. Rather, it looks at numerous data sources and provides useful intelligence back to our customers. It should look at which systems a given file has touched, what we know about those systems, when those systems encountered the file, how the file got on that system and so on.
Pro #4: Big Data techniques are also useful in doing more broad visualization of security-related metrics. Having a big picture understanding can help identify root causes to problems. In contrast, many "traditional" approaches only address symptoms.
Data visualization is an important part of data analytics. With visuals, it’s easy to spot larger trends. For example, in the case of an enterprise, just knowing that 1,000 systems have been infected is nice. But, this type of knowledge alone may not be sufficient for identifying what actions you need to take. With a visual, you may be able to spot underlying patterns. For example, suppose the majority of these infections come from a single remote office. Or perhaps a handful of users are responsible for the lion’s share. These patterns can indicate root causes that, if remedied, can mitigate the risks of future outbreaks.
Pro #5: Big Data techniques can lead to entirely new sets of security capabilities.
Ultimately, Big Data techniques are all about trying to extract the most useful knowledge from a large data set. The premise is that correlating data coming from different sources yields a more accurate view of the underlying threat landscape. With such a view in place, we may now be able to act on these new insights automatically. The result is a novel set of security capabilities that can significantly reduce risks enterprises face to their security assets.
With this marriage of Big Data and security, I believe we are only scratching the surface in terms of the benefits.
Photo Credit: almagami/Shutterstock
Zulfikar Ramzan is a chief scientist for Sourcefire, a world leader in intelligent cybersecurity solutions, focused on transforming the way global mid- to large-size organizations and government agencies manage and minimize network security risks.