Video
Overview: Perforce Delphix for Snowflake
Delivering compliant data for analytics and AI is mission-critical, but manual efforts often introduce risk and delay. Perforce Delphix empowers organizations to automate sensitive data discovery and masking in Snowflake, ensuring you meet stringent compliance requirements without sacrificing development velocity or data quality.
With Delphix masking for Snowflake, enterprises can:
- Automate Discovery of Sensitive Data: Effortlessly scan Snowflake databases to identify PII and confidential information — such as names, birth dates, and email addresses — across diverse sources and schemas. This allows you to gain full visibility of sensitive data and control risks in non-production environments.
- Enforce Enterprise-Grade Compliance: Irreversibly replace sensitive data in Snowflake with realistic, usable values while preserving referential integrity across complex systems like Salesforce and SAP. This ensures you remain in full compliance with GDPR, CCPA, HIPAA, and other global regulations.
- Accelerate AI & Analytics Workflows: Deliver secure, compliant, and high-fidelity data to data scientists and developers with speed. By eliminating data bottlenecks, you can accelerate AI model training and analytics projects, enabling your teams to innovate faster without compromising data privacy.
- Protect Data at Scale: Whether you're dealing with massive data volumes or complex hybrid architectures, Delphix offers scalable solutions, including Hyperscale and API-driven masking, to maintain consistent, compliant data..
See What Delphix Can Do For Snowflake
Ready to see how Delphix can transform your masking strategy for analytics and AI? Request a personalized demo of Delphix for Snowflake today.
Full Transcript
Hello, everybody. I'm Jatinder Luthra, Advisor to the Solutions Engineering Team at Perforce Delphix.
In today's session, we will discuss protecting sensitive data in Snowflake.
Let's start with why data masking is important in Snowflake.
In our data-driven world, protecting sensitive information is more critical than ever. That's where data masking in Snowflake comes into play. It allows organizations to obscure or anonymize sensitive data, like personally identifiable information (PII), while still keeping it usable for analytics and AI model training.
This means you can harness the power of your data without compromising privacy.
Snowflake's dynamic data masking feature automatically adjusts what users see based on their roles, ensuring that sensitive information is only accessible to those who need it. For instance, a data analyst might see a masked version of the data, while a data administrator has full access.
This flexibility helps maintain security without sacrificing data integrity.
Moreover, as organizations create copies of data for development and testing—whether for analytics, reporting, or AI initiatives—data masking ensures that sensitive information remains protected.
This is essential for compliance and safeguarding users' privacy during the development process.
In short, data masking in Snowflake empowers organizations to leverage their data safely and effectively, enabling insightful analytics and robust AI solutions while keeping sensitive information secure.
Now, let's move on to understanding the challenges of protecting large-scale data for analytics and AI. What are the primary use cases in Snowflake?
First, there are sensitive data risks. The majority of people are using sensitive data in analytics and AI environments. That data is exposed in AI model training and analytic workflows. It can leak, get stolen, and cause audits to fail. There are many risks.
When organizations use reversible masking techniques like dynamic masking, sensitive data can be reidentified.
Many modern analytics stores, like Snowflake or Databricks, operate in the cloud. The cloud has become a primary target for cyberattacks and presents significant compliance challenges.
Moving to the next point, analytics stores are large-scale, often unstructured, and contain many file types, which increases the challenges of discovering and protecting sensitive data.
Given all these risks, there is a rising need to protect this data.
However, misguided compliance efforts, like manual or native masking, can be time-consuming and bottleneck your analytics efforts, slowing down your SLAs. Attempts to protect petabytes of data at scale slow things down.
Finally, there is another problem: data quality. In the analytics world, quality is key. Suboptimal approaches to protecting sensitive data may distort the data and strip it of analytical meaning, rendering it useless.
Additionally, large enterprises need to protect data consistently across solutions, from on-premises to the cloud. If they try to protect Snowflake using a one-off solution that differs from how they protect their Oracle databases on-premises, this can result in quality issues.
All these challenges can lead to quality concerns and data trust issues for analysts and data scientists, who are the ultimate consumers of data. They can also lead to compliance exceptions due to data quality issues, as well as introduce security risks.
Let's talk about referential integrity.
Once we have automatically identified sensitive data, we apply regulation-specific algorithms to mask it, changing sensitive fields to synthetic values that eliminate the risk of data breaches. We do this consistently across apps to maintain referential integrity—a point I want to emphasize because it's really important.
Masking consistently while retaining referential integrity means preserving data relationships so that the data still works for testing and analytics.
As you can see, there are three different systems with dependencies between Snowflake, Salesforce, and SAP. For example, "Lee" turns into "Yang" in both the CRM and ERP environments.
Now you have secure, realistic test data automated in your test, AI/ML, or analytics processes.
So, we've discussed referential integrity. Let's talk about how we can help you and what products can help you achieve these use cases and overcome the challenges.
Managing data at scale isn't just a technical challenge—it's a climb. Depending on your environment, you might be navigating anything from well-worn trails to 100-terabyte alpine ascents.
Down in the foothills, you have the most common enterprise cases: finance systems, HR platforms, ERPs, and CRMs. They're typically structured, JDBC-based, and under 10 terabytes in scale. For these, core data compliance is your go-to. It's fast, supporting up to 60 million rows per hour. It's self-hosted and designed to fit naturally into your dev/test workflows. Think of it like your daypack and hiking boots—reliable, efficient, and just right for the elevation.
But once you cross the 10-terabyte threshold, things get steeper.
Suddenly, you're dealing with big data ingestion, large-scale test data management, and masking that has to scale into tens or hundreds of terabytes.
This is where Hyperscale and Delphix Compliance Services come in. Hyperscale is built for bulk throughput—2 to 4 billion rows per hour, depending on the source. It uses scale-out architectures and runs on your infrastructure. Think of it as your mountaineering gear—purpose-built for extreme altitude and designed to let you tackle big data compliance on your own terms with critical SLA support.
And what if you want to skip the gear altogether? We've got you a jetpack.
Delphix Compliance Services is a hosted solution embedded in the Azure ecosystem. It's cloud-native, API-driven, and optimized for workloads like AI/ML pipelines, data science platforms, and SaaS compliance.
Performance is elastic, with 4+ billion rows per hour, and it is fully managed. You get velocity without operational overhead. So, wherever you are in your journey—from managing your first structured workloads to operating lakes and warehouses at massive scale—we've got the right tools for the terrain.
Let's take a closer look at Hyperscale and Delphix Compliance Services and what makes them such powerful engines for compliance at altitude.
With Hyperscale, the key benefit is that you get multiple compliance engines, which distribute workloads. We unload the data from your different Snowflake systems into files, mask it by distributing the masking workload across a pool of engines, and then load the masked data back into Snowflake. This gives you the efficiency to mask data at scale based on your SLAs. You can provision the masking engines as needed.
It provides discovery and masking APIs as a service hosted by Delphix.