June 17, 2015

Finding the ‘Needle in a Haystack’ with Helix Threat Detection

IP Protection

Software development projects in bigger companies typically involve large teams collaborating across multiple locations. A large corporation may employ tens of thousands of developers working on thousands of projects over a span of many years. 

For many companies, developer access to older software projects and files may continue long after the project has been completed, sometimes because of lax processes and stagnant access control policies. Yet, these projects can represent valuable IP worth tens of millions of dollars. In light of the ramifications of a competitor getting ahold of these files, what can companies do to better protect their crown jewels from theft?

The answer might be found in the source code management (SCM) or version control tools companies use to drive their development workflows. SCM tools typically track access to key projects and files via audit logs. However, the sheer volume of these logs can overwhelm security teams. A month of log data might yield millions of different interactions with files and projects, making it virtually impossibe to find important clues.

Done the right way, however, this approach can bring the real threats to the surface. A recent Fortune article entitled Using Log Data and Machine Learning to Weed out the Bad Guys shares how a large company applied our Helix Threat Detection capabilities to quickly identify data theft. Likening this approach to ‘finding a needle in a haystack,’ the article describes how effective it can be to apply behavioral analytics to the audit logs in our Helix Versioning Engine. 

Leveraging Machine Learning to Establish a Baseline

Conventional security tools (e.g., SIEMs) are often rule-based and require time-consuming manual setting of thresholds and iterative tuning of multiple parameters in order to identify anomalous behavior. Yet manually setting alerts to trigger when developers access an arbitrary number of files may be problematic for large projects and can inundate security teams with too many false positives.

A better approach is to use machine-learning algorithms and risk-based-behavior-analytics models to audit logs to first establish a baseline understanding of normal behavior. It’s possible to create cluster models that group similar users based on their past activities. Continuous self-learning more accurately identifies high-risk events, like someone accessing a project he or she doesn't normally work on, putting a spotlight on threats to an organization’s most sensitive assets.

Identifying High-Risk Behaviors

Once you've establised what's normal behavior, the next step is to apply advanced mathematical models that generate a behavioral risk score. This score represents multiple factors, including the importance of an asset or file, the method of access, the activity (e.g., volume or type), and the user. These behavioral analytics models can then be used to find anomalies by:

  • Comparing access patterns, data usage patterns and data movement patterns against historic behavior
  • Determining similar user patterns across the environment and comparing behavioral patterns between users and groups of users
  • Detecting dissimilar patterns among members of the same project group or job role
  • Comparing individuals against the entire user group

To learn more about the behavioral analytics models used in Helix Threat Detection, download the white paper Helix Threat Detection: IP Security and Risk Analytics.

To learn more download our white paper:
A Unified Approach to Securing and Protecting IP.