Blog
July 1, 2026
Non-Negotiables in an Enterprise Synthetic Solution: #1, Referential Integrity
Data Management,
DevOps
At Perforce Delphix, we have found that referential integrity is very often a deciding factor for enterprises evaluating data masking and test data management solutions. That same requirement is emerging in conversations about synthetic data, as well.
Mayank Ahluwalia, Senior Product Manager at Perforce Delphix, has seen this need firsthand in his conversations with enterprise leaders. I recently spoke with him about why consistent data relationships matter so much when generating test data and where teams run into challenges.
Back to topQ: Why does referential integrity become a critical requirement for synthetic data in enterprise environments?
Mayank:
It always starts the same way. A team runs a successful proof of concept with synthetic data on a single database, gets excited, and then tries to apply it to their actual application environment. That is when things fall apart.
Enterprise applications often span multiple databases. A customer record lives in one system, their orders in another, their payments in a third. These systems are connected, and test data has to reflect those connections.
What I kept hearing was some version of: "We can generate data. We just cannot generate data that works together across our systems."
That gap between data that exists and data that means something in context is where referential integrity becomes the conversation.
Back to topQ: What changes when teams who use masked production data add on synthetic data?
Mayank:
Masked production data carries its relationships automatically because those relationships were real to begin with. You change the sensitive values but the underlying structure, the connections between systems, the way entities relate to each other, all of that is already there.
Synthetic data starts from nothing. You have to rebuild those relationships intentionally. And that is harder than it sounds, especially when data spans multiple systems. A customer ID that exists in your CRM needs to mean something to your billing system and your fulfilment system too.
If each system generates data independently, without awareness of the others, you end up with data that looks right in isolation but breaks the moment your application tries to use it across systems.
Q: Is that problem — of data not being referentially consistent — hard to solve with separate tools?
Mayank:
Yes, it is.
When teams use different tools for generating and masking data, that leaves them trying to reconcile everything after the fact. They have to manually align keys, fix broken relationships, and troubleshoot inconsistencies across environments.
It quickly becomes difficult to scale.
Q: How does Delphix’s unified approach change the outcome?
Mayank:
With Delphix, masking and synthetic data aren’t treated as separate workflows. They are part of the same platform, using the same understanding of data relationships across systems.
That means you can start with masked production data for known workflows, generate synthetic data for new scenarios, and bring them together into a single, consistent environment.
Because those relationships are preserved across both masked and synthetic data, the data behaves like one system. It makes it possible to scale testing across complex enterprise environments without constantly fixing data issues.
Enterprises Are Scaling Synthetic & Masked Data Together
Most organizations aren’t choosing between masking and synthetic data — they’re using both, with 51% already adopting synthetic data alongside 86% using static masking. At the same time, 57% report growing sensitive data volumes in non-prod, making it harder to maintain quality and consistency across environments.
Find more insights from 500+ enterprise leaders in The 2026 Test Data Management Report for AI-Ready Enterprises.
Back to top
Q: When does referential integrity typically surface as a requirement, and how critical is it?
Mayank:
It comes up at one of two moments. Either right at the start, from teams that have already been through a failed synthetic data rollout and know exactly what went wrong. For them it is a gating requirement before anything else gets discussed.
Or it surfaces during a proof of concept, when a team moves beyond a simple demo and tries to generate data for their real environment.
Q: What happens when referential integrity isn’t supported?
Mayank:
At that point it frequently does become a deal-breaker, because it is not something you can work around. It is either built into the tool or it is not.
Back to topQ: Which synthetic data use cases break down without strong referential integrity?
Mayank:
Integration testing is the clearest one. If the data in one system does not relate correctly to the data in another, you are not really testing integration at all. You are testing components in isolation with made-up inputs, which means entire categories of real-world bugs go undetected.
End-to-end testing has the same problem at a larger scale. A user journey that touches multiple systems needs consistent data at every step. Break one connection and the test fails for the wrong reason, which erodes trust in the testing process itself.
Q: Why does this matter so much for new application development?
Mayank:
New application development is an underappreciated case. When a team is building something new that needs to connect to existing systems, they need synthetic data that looks like it came from those systems working together. Without that, developers are working blind on integration until very late in the cycle.
Back to topQ: Where do most synthetic data tools typically fail to preserve referential integrity?
Mayank:
The most common failure is simply scope. Most synthetic data tools are designed around a single database. They handle relationships within that database reasonably well, but have no way to coordinate across database boundaries. Each generation run is independent, and keeping things consistent across systems becomes a manual problem for the user.
The second failure is around existing data. Many tools assume they are populating a clean, empty environment. That works fine for greenfield scenarios but breaks down in practice, where test environments already have data in them.
Q: What impact does this have on testing and delivery?
Mayank:
The impact in both cases is the same. Either you get hard failures at insert time, which are at least visible and fixable. Or worse, the data loads successfully but the relationships are quietly wrong. You discover that later when your tests fail for reasons that have nothing to do with your application.
That second outcome is more costly because it is harder to diagnose and erodes confidence in the whole testing process.
Q: How does Delphix address these referential integrity challenges?
Mayank:
These challenges come down to fragmentation. When generation, masking, and data management happen in separate tools, maintaining consistency becomes a manual effort.
With Delphix, those capabilities are unified. The platform maintains a single understanding of data relationships across systems and applies that consistently whether you are masking production data or generating synthetic data.
That means you can work with existing data, add new synthetic data where needed, and still preserve referential integrity across the full environment, without stitching things together afterward.
Back to topQ: How do different synthetic data approaches affect referential integrity?
Mayank:
The generation approach shapes what is even possible. Tools that require users to manually configure every relationship put the burden of maintaining integrity entirely on the user. That works in controlled environments but does not scale.
AI-driven tools reduce the setup burden by inferring relationships from schema and context. But inference is only as good as what the tool can see, and most AI-driven tools are still bounded by a single database at a time.
Metadata-driven approaches (like Delphix’s), where the tool reads schema constraints and user-defined relationships as its source of truth, tend to be the most reliable foundation. Especially when that understanding spans multiple databases and the full dependency graph.
Back to topQ: What capabilities should enterprises look for to ensure referential integrity?
Mayank:
Here are a few things I would treat as non-negotiable:
- The tool needs to support relationships that cross database boundaries, not just relationships within a single schema — for both masked and synthetic data. If the demo only shows you single-database scenarios, push on what happens when you have two databases that need to share keys.
- It should enable masked and synthetic data to coexist in one unified dataset without breaking relationships. In real enterprise environments, teams often use masked production data for known workflows and synthetic data for new scenarios. These data sets must work together as one system, not as separate, incompatible sources.
- It needs to generate in dependency order across all connected systems simultaneously, not one database at a time. Parent entities before child entities, across the full graph, in a single coordinated pass.
- It needs to work with tables that already have data. Requiring a clean environment is a significant constraint in real enterprise deployments.
And finally, ask how the tool handles schema changes. If adding a column requires manual reconfiguration, that becomes a maintenance burden that introduces failures over time.
In practice, very few tools meet all of these requirements consistently. This is where Delphix stands out. It is designed to maintain referential integrity across systems, across existing data, and across combined masked and synthetic datasets, without requiring teams to rebuild relationships manually.
Back to top
Q: Final Thoughts: Referential Integrity's a Must in Synthetic Data
As teams push synthetic data beyond isolated use cases into full enterprise environments, referential integrity quickly moves from a “nice to have” to a foundational requirement.
Without it, test data may look realistic, but it won’t behave realistically in the systems that matter most. And as Mayank highlighted, that gap can undermine everything from integration testing to release confidence.
In the next post in this series, we’ll explore another non-negotiable capability for enterprise synthetic data — determinism and realism — and how it impacts the scalability of modern DevOps and AI-driven development workflows.
Book a Demo
Power Enterprise Synthetic Data with Perforce Delphix
Perforce Delphix is the intelligent data automation platform that delivers fast, trusted, AI-ready data environments — combining synthetic data, masked production data, and automated delivery to support enterprise-scale testing and development.
See how it works:
Accelerate AI-Driven Delivery
Move faster with on-demand, production-like synthetic data for new features, edge cases, and integration scenarios. API-driven data access and DevOps integration help teams validate continuously at the pace of AI-generated change.
Delphix has a long history of helping teams develop faster with reliable test data. According to an IDC study, Delphix users developed applications 58% faster and experienced a 408% ROI.*
Govern Data Trust and Compliance
Protect sensitive data while maintaining realistic testing with high-quality data, both masked production data and synthetic. Delphix enforces policy-based governance, secure data generation, and referential integrity across systems to reduce risk and support compliance.
The same IDC study found that organizations protect and mask 77.2% more data and environments with Delphix.*
Scale Efficient Data Environments
Deliver and manage data across complex, multi-system environments with less overhead. Virtualized, self-service data environments reduce manual effort, storage costs, and infrastructure sprawl.
Experience Delphix Synthetic Data Firsthand
Discover how Delphix helps you generate high-quality, referentially accurate synthetic data at enterprise scale. //
*IDC Business Value White Paper, sponsored by Delphix, by Perforce, The Business Value of Delphix, #US52560824, December 2024