Author: Nilesh Dherange, Senior Expert Consultant & Reviewer, CTO, Gurucul
In the 1990’s the data driven approach to machine learning in business applications was offering widespread value in verticals such as finance, manufacturing, marketing and more. The benefits were:
- Accelerated time in model development, model testing and delivery of actionable insights
- Delivery of an optimal balance between predictive accuracy, performance and cost
- Utilization of streaming data to deliver real-time analytics
- Reduction of risk with enterprise-grade machine learning
- Acquisition of best insights in model performance and outcome
Machine learning (ML), however, was a comparatively late arrival to predictive security analytics due to one critical factor: accuracy. In the introduction of CISO at Large Leslie K. Lambert’s“Borderless Behavior Analytics” chapter on machine learning, the following observation was made “Predictive security analytics had a higher bar for proving value.”
Failures in security analytics held the potential for impacting an organization’s bottom line due to data exfiltration, espionage or regulatory fines. These are serious consequences if the security solutions don’t work as promised by the vendor. As a result, security teams needed to see the unassailable proof points for ML. They were hesitant to adopt unless relevant solid results could be shown. As late as 2010, some analysts thought the challenges were so great that they didn’t believe ML in predictive security analytics was a viable solution framework for the foreseeable future.
Yet 2014 represented a breakout year for decisive strides in advanced development and crucial achievements for the next generation of user and entity behavior analytics (UEBA) and identity analytics (IdA), which are the foundation of predictive security analytics. The areas analysts described in 2010 as representing unresolved gaps were where the leading solutions vendors made impressive improvements. They included:
- Outlier detection – Reliable models of normality have become well established through self-learning and self-training ML models of users and entities with much faster ingestion of existing user histories. Linking their behavior with static and dynamic peer groups further refined the modeling, eliminating the issue of bad behavior in baselines. ML in Identity analytics has also drastically reduced the number of unaccounted for access risks.
- High cost of errors – The comprehensive ability to classify users, entities and their behavior across the broad spectrum of hybrid environments, applications and devices has decisively addressed misclassification issues. False positives and negatives were demonstrably reduced, with critical context from big data, and the increased number of models for specific use cases.
- Semantic gap – Numerous innovations have closed the gap between UI (user interface) and API (application program interface). UI’s were enhanced with color-coded graphics. Business and user friendly descriptions of anomalies facilitated speed and ease of use. Accurate and normalized risk scores with thresholds for specific alerts were provided in numerous UEBA solution categories. The solutions now provide precise actionable intelligence and clearly define the difference between abnormal activity and attacks in a range of relevant categories. Analytic response codes (ARC) with risk scores, instantly categorize the type of anomaly and its severity. This communicates directly to the solution in the API relationship, and facilitates accuracy and optimal automated risk response times.
- Diversity of network traffic – Mature solutions have migrated to big data, supporting both structured and unstructured data. They include predefined data connectors for ease of data ingestion. They enable customization and addition of new attributes with flexible metadata. In essence, a mature UEBA solution can ingest any data for any desired attributes.
- Difficulty with evaluation – The advancement of the normalization and correlation of data has enabled shorter proof of concept cycles. Using historical data to speed-train models has delivered faster time to value for the UEBA and IdA use cases. More mature UEBA experience in hybrid environments have learned on-premises data is often more varied, as well as ‘dirty’, while cloud data is less varied and consistent.
The core value of machine learning within predictive security analysis is its ability to extract critical context from big data, to provide comprehensive visibility of an enterprise’s hybrid environment, and to deliver risk-based actionable results in in the fastest time possible. Machine learning is still comparatively early in the adoption phase for predictive security analytics solutions. Yet those wondering about its long-term value potential should consider Tim Armstrong, CEO of AOL’s observation about data and security: “The world has changed. Security and data will be something that goes on for hundreds of years in the future. We are at the beginning of that stage.” When one considers that over 90% of the world’s data was created in the last few years, with data doubling during that time period, it’s clear that the age of human resources manually managing this growing scale of big data has long passed. Machine learning in predictive security analytics will continue to deliver increasing value for the foreseeable future.