September 8, 2015

Cover Your Assets for Success with Git in the Enterprise

Git at Scale

(Part 2 of a 7-­part series)

Throughout this series of posts, we are examining some of the challenges faced when adopting Git in the enterprise space, and presenting tips and best practices for successfully managing the task. In the first entry in the series, we saw the importance of workflow and branching strategy when adopting Git in the enterprise. Today’s second installment deals with the issue of content management and best practices for storage of your vital assets.

Content Management

It is a popular dictum of Agile development that you should be keeping all of your intellectual property (IP) in the same place, a “single source of truth.” Yet increasingly, multi­-disciplinary teams and requirements have exploded the sheer number and size of assets for a typical project. Source code is often a tiny drop in the binary bucket compared to documents, images, models, audio, video, even entire virtual-machine (VM) environments for the sake of testing and deployment.

This expansion poses a serious challenge for enterprise Git adoption because the design of its internal file system mandates a practical maximum repository size of a gigabyte or two at most. Even repositories far short of the practical limits can exhibit relatively poor performance, depending upon the type and size of assets and operation at hand; executing a simple Git “blame” command against a repository with large digital assets can provide a painful demonstration of the point.

In addition, these binary assets are being developed by product designers or artists who may lack technical skills and may be unlikely to use the Git command line (even if it could handle their work). Your Git management plan should include how these non­technical contributors will be storing their work, how those files will be managed alongside the code in Git, and how all the correct revisions of files will be brought together in your build and release systems.

The most popular ways of handling large binary assets are to move large files outside the repository or divide the content among multiple repositories, which must be unified through DevOps magic for builds, testing, releases, and other tasks. Tools such as git­-annex and Git LFS can be of significant help in moving digital assets outside the repository while leveraging Git submodules, and a variety of often home­brewed scripting techniques can be useful in taming “Git sprawl”[1]. However, these approaches can introduce new challenges when considering backup processes or distributing content across different locations. Where possible, companies should look for a version control system that provides the benefits of a distributed version control system (DVCS) but can keep all the assets in a single store.

Content Best Practices

  • Keep all of your content in one place, ideally in a single version management tool. Failing that and if you choose to use Git, then consider adopting a “monorepo”[2], instead of settling for tools that fragment your IP.
  • Use a Git management suite that can handle all of your files, not just source code but also digital assets.
  • Have a strategy for archiving older content that dovetails with your branching strategy to keep working repositories clean and easier to use.

Keep Reading

Part one of this series, on workflow and branching strategy, is available here. In the next installment, we’ll tackle the issues of testing and continuous delivery and best practices for managing server loads.

For the complete set of Git in the Enterprise tips and best practices, download our free eBook here.

 

[1] Git sprawl refers to content divided into multiple repositories to keep size down and performance up.

[2] A “monorepo” is a single large repository. It can make management easier in some respects but also risks running into the limitations of large repositories in Git.