by Brigid Kilcoin
Today's guest post comes from developer Sergey Mikhtonyuk. He discusses large-scale development at his blog Core Architecture. Check it out!
I recently wondered how our infrastructure might look if we used Git instead of Perforce as our version control system. So here's the summary:
Version Control for the Real World
We store all of our assets in Perforce. Can Git handle it? No. Many say Git was created for alternative purposes. Indeed, its distributed nature and its conceptual model requiring rehashing of local files for most operations does not work well for big files at all. But I do not assess Git and Perforce for the "things they are meant to be used for;" I am looking at how they satisfy my needs and how they can be used in real projects. If you were "making it for another purpose" but it does not satisfy common needs, maybe you were making the wrong tool.
I think of a VCS as a heart of a company. It is not just code or just assets: everything should be there. If an artifact is not in version control, it does not exist. Every tool, every script you write, every configuration file, everything should be backed up and (preferably) automatically synced with VCS. It's convenient, it protects you from situations when your machine fails and important scripts are lost and it simplifies understanding of tools and processes, as you do not need to access the specific machine and reverse all the file dependencies to find out which configs, scripts etc. are actually used.
Back to the topic. Git can't handle big files, but what can we do about it? One approach that is often mentioned is using separate VCS for code and assets. But one of the important things VCS gives us is project consistency. (By consistency, I mean the ability to sync Main to any changelist number, build it, and start the application.) All changes should ideally be atomic and leave the project in a working state. This also means that when we submit a new feature to Main, this has to be done in a single changelist for both code and data. How do you achieve consistency using multiple VCS when there is always a state where either data or code is out-of-sync? It would be interesting to hear some success stories about setups like this.
We use a lot of Perforce clientspec magic in our infrastructure: Automated builds, resource compilers, stress-testing, submit checkers, etc. All those tools are designed to operate on some subset of depot, sync the data, perform some operation and submit the result. Clientspecs allow us to specify which portions of depot to get and where to place them.
How do you do something like that in Git? You surely don't want to clone the whole repository on every server. Even if it is vanilla source-code-only repository (which is relatively small) I don't want to fetch things I do not need! Cloning with zero-depth history will save some space, but it is not a full solution.
Some of the tools use complex mappings to compose a completely custom view of depot. For example, stress testing systems can operate on any of the feature branches and generate clientspec for a specified branch that maps all needed binaries and configuration files to the server on start. It maps all stress-testing scenarios and configs from Main overriding with branch-specific configs if they are present (using '+' in clientspec). Pretty complex.
You can't have two branches opened in Git simultaneously, so it would require branching all configs from Main, which is probably not a big deal. But you're still unable to customize the layout of things. This can be a serious issue when you bind your version control with third-party tools and will probably require symbolic links twiddling.
Locks are another interesting topic. In distributed version control you just don't have them. Now imagine having several designers tweaking weapon parameters or developers editing the localization files. Even if those files are in text format, tools like Excel put huge amounts of metadata in it, making it practically unmergeable. Perforce is configured to lock such file types for exclusive editing.
As I said earlier, I'm thinking of VCS as one of the central systems in the company, not just a developer toy. When you have project planning docs, agreements, publisher SDKs etc. checked in to VCS, the access control is crucial.
But hey, even if you take the vanilla case for Git (source code in repo only) how will you organize the stabilization process? Git manages permissions on a repository basis only, so you would have to manage permission on a push-pull level between Stabilization, Main, and Release repositories.
Git is decentralized! Haha! Eat this, Perforce! ...you may say.
Yes, it is. So what? Look at Github, the most popular hosting for Git—all it does is take the branching models for Git and centralize it. You can do a peer-to-peer exchange, but do you really need it? Most people just fork the existing projects and work on them just like in centralized VCM.
So why do you need to throw away all the benefits of centralization like simple solutions for big files, change tracking, shelves etc.? Yes, I will not be able to read the whole history of the project while flying on a plane, but I prefer to read a book while flying anyway. And I feel safer knowing that whole project can't be stolen that easily. :)