Blog
February 6, 2019
Did you know that automated data discovery is your secret weapon to identifying sensitive information quickly and accurately?
What Is Automated Data Discovery?
Automated data discovery leverages advanced algorithms and smart profiling techniques to locate and identify sensitive data across all environments — from databases and cloud storage to applications.
By automating this process, businesses can eliminate inefficiencies, reduce risks, and enhance data management operations without compromising speed or accuracy.
Why Is Automating Data Discovery Important?
For enterprises, the first step in protecting sensitive data is understanding where it resides. Given the rapid growth in data volume, variety, and complexity, manual methods are no longer adequate. Automated data discovery enables a faster, more reliable way to safeguard sensitive information, paving the way for secure operations and compliance.
Key Benefits of Automating Data Discovery
- Time Efficiency: Quickly scans vast environments for sensitive or regulated data
- Improved Accuracy: Reduces manual errors by identifying data with precision.
- Cost Savings: Frees up valuable resources, allowing teams to focus on higher-value initiatives.
- Compliance Simplification: Aligns with data privacy regulations like GDPR, HIPAA, and PCI DSS.
How Perforce Delphix Automated Data Discovery Works
When it comes to identifying and managing sensitive data, manual processes are slow, inconsistent, and require significant involvement from multiple teams. Perforce Delphix transforms this approach by offering an automated, out-of-the-box solution for profiling data across complex environments.
Profile sets are made up of a number of profile expressions (REGEX) to scan both the column names (metadata) as well as a sample of the actual data to look for data patterns, including credit card numbers, social security numbers and telephone numbers, among many others. These expressions have been tested and validated across many engagements with Fortune 500 companies with a large, complex data portfolio of databases.
A great feature of Delphix’s data discovery is that the profile expressions of a profile set are directly mapped to algorithms. The profile discovery data can then be used immediately for masking that data.
For example, our out-of-the-box profile sets align to specific applications, including SAP and PeopleSoft, in addition to ones for specific regulations, such as HIPAA, PCI, and more. Our team has designed a way to locate and identify where sensitive data resides within complex tables and flag specific fields. In short, this process can help save the effort and time, so you can speed up implementation and feel confident about complying with regulations. Customers can modify profile sets and/or create their own as required and use the respective profile sets for the discovery of sensitive data.
Profile sets are also a critical element for identifying the correct data patterns required for security exposure and masking algorithm requirements. This is an area where time and investment of resources are needed to ensure the data structures that represent the business’ sensitive data are properly defined and represented in the desired profile set.
Profiling One Environment
Let’s jump right into profiling one database through the GUI using the steps below.
After creating the required environment objects, a profile job is then created, tying the rule set and the selected profile set together to perform the profiling. A rule set is a group of flat files (or tables for databases) within a particular data source (which you have connected to by creating a connector) that a user may choose to run profile, masking or tokenization jobs on.
Once the profile job is completed, the results are shown for the rule set in the inventory page.
The discovered sensitive data and respective domain/algorithm can be exported to a spreadsheet as required for documentation or further analysis.
The Bigger Challenge: "That's great for one environment, but we have thousands!"
While profiling one data source/environment is fairly straightforward and simple, what if you have hundreds, if not thousands, of environments? That's where the Delphix masking REST APIs can be used to fully automate profiling across thousands of environments.
Delphix provides an API Utility UI that provides an interactive way to learn the individual APIs as well as the URL and inbound and outbound JSON body content. With the APIs comes the coding. The basic logic that was executed via the web application user interface can now be programmed.
To automate profiling, Delphix provides a number of open source repositories for working with the Masking APIs, including dmx-toolkit and dxapikit, which include basic shell scripts examples to encourage customers to learn, understand and get up to speed quickly with the Delphix APIs. An example of profiling a set of scripts is available in the “dxapikit” repository, and you can also download the repository.
Let’s review the logical flow of the profile.sh script in dxapikit and the calls to the other scripts.
The script takes the profile databases connection information and then performs all the manual steps shown earlier via the Masking APIs and exports the results to static HTML files. The code provides different options for providing the connection string information and respective profile set to use. Here’s a sample connection string information (CSV) file.
The file also includes a parallel option. To improve the performance, this option will split the connections into equal number of connections per parallel job. The script then launches subsequent batch.sh scripts and waits for all the scripts to be completed before writing the final HTML report page (report.html).
Sample Results
Here’s the summary page with links for each source profiling results.
This individual page provides a detailed view of the results along with the ability to download the results into a CSV file.
Each page shows which database/column was identified with sensitive data, isMasked value, and also the respective profile set mapping to the Delphix Masking domainName and algorithmName values. To demonstrate future potential functionality, a different report is included ONLY for the first two results.
Get Started with Automated Data Discovery and Masking
Delphix delivers automated data discovery and masking capabilities to help organizations securely manage sensitive data with speed and precision. With Delphix, you can identify sensitive data values — such as names, email addresses, and payment information — across extensive environments automatically. Then, leverage powerful masking algorithms to transform those values into realistic yet fictitious ones, all while maintaining referential integrity in your datasets.
Automate Discovery & Masking
Centrally define discovery and masking policies to achieve compliance with key privacy regulations like GDPR, CCPA, HIPAA, and PCI DSS. By automating both the discovery and masking of sensitive data, Delphix helps neutralize the risk of breach, especially in non-production environments that host vast amounts of vulnerable data.
Streamline Data Delivery
Delphix integrates cutting-edge automated data discovery with data delivery. The Delphix DevOps Data Platform ensures sensitive data is identified, masked, and virtually delivered to downstream environments for development, testing, analytics, and AI processes. Get the speed, efficiency, and compliance businesses need without compromising security.
Get Your Demo
Discover how automated data discovery can revolutionize your approach to compliance and security. Request a no-pressure demo of Delphix today and experience how industry leaders leverage automated discovery to mitigate risks, accelerate innovation, and maintain regulatory compliance.