Git began as a tool to manage source control for the Linux kernel. Today, it’s the de facto standard among software developers. In fact, its ubiquity is the reason developers have grown to depend on it for code collaboration. They know Git offers easy access to all their projects from virtually any platform.
Git version control was never intended to scale so massively. But that’s exactly what most organizations need it to do. Enterprises now face the daunting task of balancing developer needs against the implementation of Continuous Integration/Continuous Delivery (CI/CD) for better, faster, higher-quality releases.
If you’re currently responsible for driving your DevOps pipeline, you probably don’t care about any of that. You just want things to work and meet your needs. You need to:
- Simplify and unify change management across all DevOps processes.
- Version all your build artifacts, code, and binaries — preferably in the same place.
- Document, trace, and audit from end to end, to identify vulnerabilities and defects.
But it’s not enough just to do all that. You need to do it all quickly. You need to be faster than the competition at a bare minimum, and you need to keep your developers happy along the way.
Accomplishing this with Git, especially Git alone, presents challenges that threaten the entire DevOps pipeline. So, how do you overcome the challenges Git poses to enterprise DevOps initiatives?
Custom large-scale tooling is only an option for the tech giants of the world — those with the resources to dedicate entire teams to building and maintaining their custom solutions. For most organizations, however, it’s more cost effective to satisfy developer needs with existing tools that allow the organization to overcome Git’s challenges at scale.
In short, you need an integrated environment for design, code, and build artifacts. Helix4Git is an out-of-the-box solution for scaling Git, and we’ll use it as an example throughout this eBook. You can also add Helix TeamHub for code management.
It’s possible to make Git work for even the largest of enterprises. You can make Git scale to feed giant projects — GBs of assets across millions of files — into the DevOps pipeline.
This eBook aims to uncover the details of these challenges. And we’ll introduce five surefire ways to set your organization up for success with Git at scale.
1. Delight All of Your Contributors
It all starts with contributors — artists, designers, developers, and others. Today’s multi-disciplinary product development environment involves a lot of input and iteration, both of which lead to explosions of files and collaboration. You need a simple, elegant way to manage multiple projects and multiple teams. You also need something for the team members who don’t have the same technical skill or familiarity with Git.
Git was built by a developer for developers. As a result, it can be extremely daunting, and it has a steep technical learning curve. A true collaboration solution unites all your contributors in one place. A collaboration solution should have:
- A user experience that’s simple to use.
- Support for projects with multiple repositories and built-in support for assets that Git doesn’t handle well, such as artifacts, graphics, audio, and video.
- A code review workflow with CI builds to keep developers productive. The appropriate integration power under the hood to feed your DevOps pipeline.
Helix4Git provides all of the above. It takes what developers like about Git and extends it in significant ways to serve enterprise DevOps needs.
Helix4Git lets distributed teams of developers around the world enjoy LAN-speed performance for their daily work. At the same time, all their commits are synchronized automatically over WAN links. That’s a huge performance boost.
Speaking of performance, developers need feedback quickly. The tighter the iteration cycle, the greater the productivity. When developers believe a defect has been fixed, they test it, submit it for code review, and wait for a positive response. The sooner it’s released into production, the sooner they can move on to new work.
But developers don’t blame Git when the build is slow and it negatively impacts their productivity. They blame the admin, who is tasked with addressing Git’s shortcomings.
This isn’t exactly a secret; it’s a major topic in online publications and forums. The known reality is that Git can be slow when you're moving files. And over WANs, because of distance, latency, and single threading, this performance hit is often magnified for remote developers.
One way to tackle these problems is to introduce a local proxy/cache at each site. This improves performance by enabling developers to clone from a local server. Conversely, write-back to the master server is accomplished by setting up the remote site properly. With most Git implementations, this is a do-it-yourself solution, albeit one that is relatively well-documented online.
A better solution is to leverage Helix4Git to enable reliable and fast content replication around the globe.
2. Examine Your Branching Strategies for CD
The correct branching strategy is essential to automating and improving the quality of your software. Even though changing branching strategies can be difficult, your current strategy may not be compatible with (or optimized for) automation.
If you routinely spend time trying to figure out what went wrong when you merged, you’re not ready for automation. Smaller, short-lived branches can minimize risk and ensure fewer delays. Let’s look at two options.
A Successful Git Branching Model
Many Git teams use variations of Vincent Driesen’s “A Successful Git Branching Model.” Its greatest benefit is that it’s widely used, and there are numerous variations on the general theme. This is a very good model, because it creates a central repo that is the single source of truth for your project. Van Driesen is quick to point out that this repo is only considered to be the central one because Git is a DVCS, and all repos are created equal at the technical level.
Key Components of Driesen’s Model
- One centralized Git repo called “origin.”
- One production-ready branch called “master.”
- One integration branch called “develop.”
- Developers work locally, pulling and pushing from/to “develop.”
- Collaborators set up Git remotes so peers can pull changes as needed.
- The CI/CD pipeline frequently merges changes from “develop” to “master,” with as much automation as possible, then releases a new version.
Trunk-based development (or TBD) is well-regarded in the DevOps community. You may benefit from trunk-based development if you’re a large organization trying to achieve better quality and faster releases, and/or you operate in an environment where compliance, governance, and security are highly valued.
Key Components of Trunk-Based Development
- A single, shared branch called “trunk.”
- Short-lived feature branches.
- A mono-repo strategy.
- Developers check out very small portions of code, which simplifies security and traceability.
- Developers collaborate in the “trunk” and either commit/push (small teams) directly thereto or use pull-request workflow (large teams).
Note: Teams that produce a high-commit rate or have many members favor short-lived feature branches for code review and build checking (i.e., CI) before committing work to “trunk.” These branches accelerate code reviews, gating what gets added into “trunk.” Small, short-lived branches minimize merge conflicts.
DevOps pipeline goals should revolve around making it easier to introduce new features, fixes, and improve overall code quality. Addressing tooling challenges and adjusting workflow to support these goals go hand in hand.
3. Scale Git in the Build Process
Bringing together multiple Git repos has emerged as the biggest challenge to impact performance in CI/CD pipelines. This is because Git processes each file individually (i.e., slowly) during such an operation. This happens whether developers are working in the same room or remotely. The entire repo and all the history comes down.
Because of these facts, Git becomes slow with repos larger than roughly 1.8 GB of content. Splitting a large repo into many small repos can help, but only at the expense of bringing everything back together in the DevOps pipeline.
Google recognized this problem with Android and allocated significant resources to create their own repository management tool, Repo. Repo sits on top of Git to handle the very large number of Git repositories associated with Android. However, it only addresses the Android use case, and it adds significant complexity at every stage of the pipeline. And it doesn't address the performance of Git itself at all.
This custom, large-scale tooling works in some cases. But most companies find that it’s a costly, ongoing expense, and it takes focus away from building and shipping revenue-generating products.
By contrast, Helix4Git implements a more efficient way of storing and moving Git data. Helix4Git supports multiple parallel threads for faster file transfers. This optimizes the DevOps pipeline.
Helix4Git offers a cost-effective, out-of-the-box solution that overcomes the performance challenges associated with having many and very large Git repos. At the same time, it simplifies CI/CD pipelines by keeping everything together in that all-important single source of truth. (This is illustrated in Figure 3.)
All tests performed with shallow clone of Linux kernel on a 1 Gbps link and four parallel threads of p4 sync. For WAN test, a 200 ms round trip latency was added between client and server.
4. Manage Change Across the Entire DevOps Pipeline
In today’s market, organizations must continually streamline and automate their DevOps pipelines to remain competitive. This can be especially challenging for complex projects that have developers, non-developers, and digital content beyond source code — such as graphics, video, audio, and other binary files.
Managing build and release artifacts is just as important as handling source code and other assets. It’s crucial to have a single source of truth where everything can be versioned and audited to drive good decisions.
Git’s design did not anticipate storing and handling large objects. Even the tools that attempt to address this gap, such as Git LFS, are almost impossible to manage at scale. Plus, they require designers and less-technical users to learn Git, which doesn’t usually interest them.
You need an actual solution. Something that easily accommodates all types of assets and makes the DevOps world transparent to designers.
Of course, you could rely on tools like Git LFS or build your own integration with file-sharing technology like Dropbox. But you would still have disparate workflows and steep learning curves for non-developers. And in the end, you’d still have to figure out how to move those large assets around the internet, through your own network, and into your DevOps pipeline after unifying it all.
Thankfully, Helix4Git lets you to manage large binary files alongside Git source code. This dramatically accelerates large file transfers and improves pipeline performance. (This is illustrated in Figure 4.)
For teams that want to use Git LFS, Helix Core can support that too. You can store or mirror external Git LFS repos in Helix Core, helping you to increase the speed of large file transfers and other server requests. Helix4Git supports parallel threads, which means it can handle both large teams and large files faster than Git LFS with open source Git servers.
5. Make the Leap From CI to Enterprise CD
Many organizations already enjoy the benefits of Continuous Integration (CI), but the largest benefits for the enterprise are reaped from Continuous Delivery (CD). This kind of automation decreases risk and improves flexibility. Once it’s in place, it also lowers cost and exposes process inefficiencies in real time.
But CD is complicated if you need to support Git teams. CD requires teams to automate everything — integrating code (at least daily), building, storing binaries back into version control, deploying those binaries to QA, testing, and ultimately pushing to production.
Business stakeholders want the benefits of CD, but they also fear that one false move could halt their mission-critical production systems. DevOps professionals know that rewards outweigh risks, especially after taking steps to mitigate those risks. With a solid CI foundation and some hard work, you too can join the ranks of the CD elite.
Version control is the bedrock technology for CI/CD because it’s the conductor by which the entire pipeline is orchestrated. From commits to successful reviews or failed tests, there are so many events that dictate which actions your systems should take.
The overarching goal of CI/CD is to deliver on the promise of DevOps. Version control drives the delivery of that promise by providing the performance and insight needed to improve software release cycles, software quality, security, and the ability to get rapid feedback on product development.
Versioning 101 for Enterprise CD
Version control enables CD. You need a version control robust enough to handle:
- Design assets
- Database scripts
- Build tools
- Build artifacts
In a perfect world, your DevOps pipeline would be an out-of-the-box, all-in-one solution. But that tool doesn’t exist. That’s why the Git world has such a large community of active contributors, whose creative work allows other users to integrate virtually any tool into their DevOps pipeline.
Similarly, you’ll find many open source software solutions to improve automation and efficiency. And, of course, add-ons and other systems (Maven, Repo, Artifactory, or others) can help support many large repos, design assets, and build artifacts.
Thankfully, there’s an approach that’s more streamlined. Bring Git users and their assets into your existing Helix Core infrastructure and build pipeline. You can give developers whatever solution they prefer. Perforce fans can continue using Helix Core with its file-locking and exclusive-checkout benefits. And Git loyalists can enjoy faster operations (clone, pull, and fetch) at remote sites. Win-win.
Helix Core’s high-performance server merges with Helix4Git to support all of your development needs. Now you can manage of all your code — including large graphics and binaries — better than Git LFS. Plus you have access to all your build artifacts.
Making Git work in the enterprise is a balancing act between developer satisfaction and overall productivity. But with Perforce, you have many options, including tools that were built to withstand petabytes of data worked by tens of thousands of concurrent users in even the most complex and secure DevOps environments.
Helix Core has been solving customers’ performance and scale challenges from the beginning. Our solutions were built from the ground up to support distributed teams that need to move large data sets over long distances and accelerate multi-file, multi-repo product builds.
Helix4Git makes it easy to scale, automate, and gain visibility into your DevOps pipeline while letting developers choose their front-end tools. We also offer Helix TeamHub for code management. These tools support large-scale, Git-based projects with continuous build, integration, and test processes.
By providing a more efficient way to handle Git data, Helix4Git lets you to close the feedback loop to your developers faster and achieve 40-80% faster builds.
Want to learn more? Explore Git best practices.