July 22, 2013

Using Git as an API for Perforce - Part 2

Git at Scale
Version Control

git fusion icon


In the first part of this series, we installed and configured a new Git Fusion instance, by running from the OVA that Perforce provides for free download here.

We added an ecdump perforce account, and configured it using ~/.ssh/config to use our custom generated ssh keys, disconnecting that configuration from other ssh identities.

In this part, we will continue with our project, by setting up some intermediate Git repositories, and by demonstrating the Git commands we need to automate changelist submissions to Perforce via Git Fusion.

The Project: ECDUMP

I wrote the ecdump program to capture all the metadata (source code and configuration properties) from our Electric Commander (EC) build automation framework.

Having so much unversioned code embedded in the EC framework, was unnerving to a company whose motto is Version Everything. The backup solutions the team had developed previously were designed for disaster recovery, not for daily development.

We had already used Git Fusion to solve a similar problem with Jenkins. Jenkins has the handy SCM Sync Configuration plugin, which interfaces to Git – we simply added a Git Fusion client, so we can track all the configuration changes users make to Jenkins, all in Perforce.

The ecdump program provides a similar capability for Electric Commander. The output from ecdump is a configurable directory tree, containing all projects, procedures, properties, and resources of interest. For our environment, it creates about 115,000 directories and files, containing about 215 MB of data per dump. Most of these files are very small, and there is a high degree of redundancy, making it a perfect candidate for the object compression designed into Git and Perforce.

Git Repository Layout

I opted for a 2-layer Git repository structure for my development work.

The ecscm-master clone is a bare Git repository used to aggregate all of the checkins from the ecdump program. This is the repository used to push hourly changes to Perforce, via the Git Fusion agent.

I then have a standard (non-bare) Git workspace, cloned from ecscm-master, that is used to commit the changes for each run of ecdump. This second repository is named ecscmwork

A nice feature of this design is that I can remove the local Git master or local Git work repositories at any time, as well as defer any pushes to Git Fusion until all the underlying automation scripts are fully debugged. This layout isolates interactions with the remote Git repository to the local master, which is a recommended pattern for working with Git – i.e., keep the clone of the remote “pure”, and do your branching experiments in a clone of the clone. That way you can remove your working clone at any time without incurring the expense of recloning the remote, and you can also have multiple working clones that you isolate for different purposes (running different test suites; build in one test in another, maintaining builds of different branches, etc.).

The dual-tier layout is not strictly necessary, but often proves convenient. If you use a bare layout for the local master as I did, the cost of this convenience is quite small – after the initial push, my local master was smaller than the dump tree by a factor of 60! After several weeks of operation, the compression factor is still more than 20. Having two repositories allows me to keep the bare clone on an SSD partition, which makes the local operations particularly speedy (on the order of 1-2 seconds).

To improve the speed of the dump source-sync phase, I keep ecscm-work up to date with respect to ecscm-master, and just move the .git directory from ecscm-work to the top-level of the latest ecdump output directory for each run. Once the changes are pushed to the ecscm-master, the .git directory is then parked back in its home under ecscm-work.

Designing the Perforce Client for ECM-Master

One of our first tasks is to decide where to add our new project to the perforce repository. This is as simple as adding a new perforce client:

Client: ecscm-master
Owner: ecdump
    SCM area for ecdump updates
Root: /tmp/ecscm-master
    //depot/ecdumps/ecscm-master/... //ecscm-master/...

I have included “master” in the directory and client name to indicate that the repository will map to a Git master branch. It is not a bad idea to establish some sort of precedent for mapping Git branches to Perforce directories. Git allows any number of branches in one repository, but in Perforce, branches are handled as directories – strict subtrees. Git branches can become somewhat more tangled. Fortunately, Zig Zichterman has written an article posted here that will help you untangle the concepts.

Initializing the ESCM-Master Client View

Once we have our client view, the next step is to add a top-level file in our client to force p4 to create the corresponding top level directory in the depot. This is necessary, because neither Git nor Perforce track empty directories.

$ p4 client -i < ecscm-master.spec
$ rm -rf /tmp/ecscm-master
$ mkdir -p /tmp/ecscm-master
$ cd /tmp/ecscm-master
$ echo “This is the SCM location for ecdumps.” > README.txt
$ p4 add README.txt
$ p4 submit -d "add README file"

After doing this by hand a few times, I encapsulated the above steps in a little shell script. This allowed me to easily revert the Git Fusion OVA back to it’s base state, by first restoring the back-up tar we created in Part I, and then running this new shell script to recreate the client view and submit the README file.

Git Commands

To accomplish the automation for each new ecdump:

  1. Run 'git status' to see if anything had changed
  2. Run 'git add –all' to add new changes
  3. Run 'git commit' to submit the changes locally
  4. Run 'git push' to push the changes to the local master.
  5. Run 'git push' in the local master to push changes to Git Fusion.

After step 5), the changes are immediately visible in the Perforce depot, and are associated with a standard Perforce changelist number.

As mentioned earlier, I wanted to avoid step 5) until all the other steps were running smoothly. Conveniently, I was able to replay the data over and over again in my local development environment, until I got it right. This was accomplished by adding –push and -clean flags to the ecsync script, which wrapped the above command sequence into an operational program.

Moving From Development to Production

Once my initial development work was done, the final task was to adapt the new ecsync script to the production environment. This turned out to be pretty easy – I just had to incorporate a call to ecsync from the run-ecdump framework I had already implemented.

As during development, I avoided calling the final remote push until the local pushes were all working smoothly.

During development, I used only 3 sample dumps, but I had over 75 dumps to process for the production job. As with most things, I hit a few minor snags running the data, but since I wasn’t pushing my experiments to the production Perforce server, I was able to debug these problems with zero impact on that server or its users.

After the debug cycle, I simply changed a flag in the scripts to activate the final push to the production Perforce server.

And yes, it all ran smoothly, much to my relief!


I hope that after reading these first two articles demonstrating the power of Git Fusion as an API for Perforce, you will feel empowered to implement your own project using Git Fusion. As you can see, the addition of Git as an API for Perforce, adds potent new options for testing and developing source control automation solutions for the Perforce environment.