3 Hidden Benefits of Replication for Developers
The textbook definition of replication is “frequent copying from one computer or server to another. This creates a distributed system in which users can access data that’s relevant to their tasks without interfering with the work of others.”
The word “frequent” illustrates the difference between replication and basic copying. Frequent means often, but it also implies speed. For replication to work, it must happen constantly and quickly (i.e., very frequently). Replication depends on automation, because the value in replication is the unattended aspect of it, which makes it work seamlessly for users.
There are three areas where replication delivers benefits to large and/or global development teams. They are:
- Backups and Disaster Recovery
- High Availability and Performance
- Continuous Integration
Most administrators understand backups. The need to replicate data for backups is well understood, especially when there’s a large volume of data that’s mission critical. Backups provide insurance for disaster recovery.
Database replication and clustering – which many enterprise IT professionals are familiar with – bring high availability. If configured properly, these two deployment types can also support disaster recovery. In addition to supporting high availability and disaster recovery, these techniques are used to achieve better performance for global teams and continuous integration.
In the world of version control, these areas are less clear for many administrators because only a few version control tools offer such capabilities today. Regardless, this is a need that should be considered as organizations’ dependence on the IP stored in VCS systems grows.
Benefit 1: Backups and Disaster Recovery
Many version control systems only use replication for backups. For example, to back up Git, scripts are used to clone each repo. There is plenty of online discussion about how to do this. Generally, the next step is to create an archive of the repo data. Then, it’s just a matter of following established best practices for backing up any kind of data. For example, you should put the archives somewhere safe offsite and rotate them to maximize your redundancy while economizing storage usage.
Large VCS backups are often done overnight. This is because the copying slows down the production server and storage systems. (Yes, the above example involves plain old copying.) When the number of repos and the size of repos increase in size, organizational strategies need to be applied to the repos. This allows copying to continue overnight.
Remember the definition of replication included “…without interfering with the work of others”? In this scenario, as VCS workload and the size and number of assets grow, it’s easy to violate that rule, by slowing down the work of others.
Complexity grows when, for example, a company has three offices on three continents, with 20 developers in each one. This makes strategies for managing the backups, disk space, and cloud storage more complex. And this is especially true when global development is “following the sun,” and there is no longer an “overnight” because developers need access to code at all times.
Using replication allows teams to back up their files without interfering with the work of others.
Benefit 2: High Availability and Performance
For companies whose products depend on the software they’re building, there's a mission-critical need to provide global access to your version control server. This means a VCS server with high availability and high performance.
Some Git solutions offer a “high availability” configuration. They put two Git servers side by side, so a secondary server can take over when the primary server fails. This involves basic replication, which keeps the data up to date between the two machines.
In the event of a failure, an administrator issues commands to cause the secondary server to take over serving users. This can also be used to offload some of the backup processing to the second server.
High Performance With Federated Architecture
Perforce takes a slightly different approach to replication in order to help organizations achieve high performance. It’s called Federated Architecture.
In a federated architecture, also known as “Commit/Edge” environment, each development location has its own server.
The server can be Linux, Macintosh, or PC, and assets are constantly replicated automatically. This makes them available where and when designers and developers need them. This lightweight, intelligent replication offers a sharp contrast to “copying.”
A commit server stores the canonical archives and permanent metadata. This goes in a data center in your corporate headquarters or in your private cloud. Then, an edge server contains replicated copies of the commit server data and a unique, local copy of some workspace and work-in-progress information. To achieve high performance, edge servers process read-only operations and operations that only write to the local data. You can connect multiple edge servers to a commit server.
The beauty of this model? The edge server offloads a significant amount of processing work from the commit server. It also reduces data transmission between commit and edge servers. As workloads grow, additional CPUs and memory can be added, and performance continues to improve in a linear fashion. There’s virtually no ceiling to performance improvements.
From a developer perspective, most typical operations (until the point of submit) are handled by the edge server. Read operations, such as obtaining a list of files or viewing file history, are local. In addition, with an edge server, syncing, checking out, merging, resolving, and reverting files are also local operations. Developers don’t even know there are multiple servers. To them, it’s all transparent so they can focus on creating great code.
It’s easy to understand this federated architecture when you consider the use case of multiple offices, but it is also used inside a single office. There’s often one system for developers to work against, checking in/checking out, and another for automation. This automation can include the backup process.
Combining backups and replication gives you enterprise-class disaster recovery capabilities. Plus, using replication reduces the amount of processing work that the main server needs to do. This helps ensure there’s no limit to the availability or performance of the server.
Benefit 3: Enabling Continuous Integration
One of the biggest challenges we’ve heard from Git users is that the server can’t keep up with the demands of large numbers of developers working simultaneously in a CI workflow.
In Git, the automation can be as much as 90 percent of the traffic on the server. This slows the response time for developer requests and detracts from their productivity.
Since Git doesn’t have replication, the slow response time can’t be solved by throwing more CPUs and memory at it.
Helix Core executes CI faster – as much as 80 percent faster. It does this with multi-threading and parallel sync between the repo and the workspace that’s being used by the build runner. It also supports artifacts and other binaries, to speed performance. Often, a separate edge server is configured and dedicated to builds. This results in dramatic performance improvement.
Using separate servers for builds helps ensure technology is not the factor that prevents teams from achieving Continuous Integration.
Is It the Right Time to Implement Replication?
Everyone agrees that replicating data is important – especially for a large volume of data that’s mission critical. Replication offers the biggest benefits for large or global teams. It allows teams to back up their files without interfering other people’s work. It reduces the amount of processing work that the main server needs to do – which helps ensure high performance. And it ensures servers are not the limiting factor that prevents teams from achieving Continuous Integration.