March 30, 2012

Massive Automation: Agile, Continuous Integration, and Perforce


Continuous integration and Agile development are two distinct but related trends in software development. Continuous integration (CI) is really about increasing transparency, but usually manifests itself as automated build and test routines. To put it very briefly, building and testing early and often is a great way to make sure that you’re actually building what you wanted to, and expose potential problems early on.

Agile development has several foundational guidelines, and increased transparency is perceived as a key benefit of Agile development. In practice Agile teams like to test their work immediately after (or in some cases before or during) development, to make sure that the delivered product actually delivers value to the customer. That again has increased the need for rapid automated build and test cycles.

The natural culmination of these trends is often characterized with words like massive and extreme. One of our customers presented a talk at the last user conference on successfully scaling Agile, and their Agile processes involve tens of thousands of automated tests and over 100 scrum teams.

At Perforce, we’ve been watching this trend very closely. Automation, particularly at this scale, puts a tremendous load on the SCM server. Build automation can account for well over 50% of the load on a Perforce server, and that can start impacting the end user experience. Perforce proxy servers certainly help, as they can easily offload the file distribution work from the main server. Replicated servers are another part of the solution, as they can completely handle the load from a purely read-only build process.

Perforce federated architecture

Improved Replication Support for Automation

As I mentioned in an earlier article , the 2012.1 release includes some very useful enhancements to replicated servers. In the context of handling automation, two new replication behaviors will support even more types of build and test processes.

  • In the build server mode, the replica supports bound workspaces. A bound workspace is a Perforce workspace that is purely local to the replica. In other words, all of the workspace metadata is kept on the replica and not passed on to the central server. Many build processes do need to maintain workspace state information to facilitate incremental builds, and now they can use a build server replica to run an otherwise read-only build without putting any load on the main server.
  • In the smart proxy (command-forwarding) mode, the replica can be used for all Perforce commands. Read-only commands are serviced by the replica, and write commands are proxied through to the main server. If your build process needs to submit build artifacts or make labels, this replica mode will let you transparently take advantage of the replica for read-only activity, which is normally the bulk of an automated process.

Measuring the Impact

I did a simple test recently to see just how much of an impact the new replication technology would have.  In my setup, I used a Perforce server located at our home base in Alameda.  Just to establish a baseline, I connected from a workstation on the same LAN and ran a few activities that would be part of a typical automated build process:

  • Fully populating the workspace for a clean build.  This operation requires file content transfer to the workspace and metadata updates on the server.
  • Running an incremental workspace update for an incremental build.  This operation requires metadata checks on the server and possibly some file transfer activity.
  • Cleaning the workspace (removing all files).  This operation requires file activity in the workspace and metadata updates on the server.
  • Running a single reporting command on the entire workspace, as might be done for release notes.  This requires metadata access on the server.
  • Running a reporting command on each file in the workspace, as might be done for a more complex set of release notes.  This requires metadata access on the server.

Next I set up a workspace on a server in a remote office which is connected over a WAN.  I ran the same activities while connected directly to the server in Alameda.  As might be expected, there was a penalty for the remote file transfer activity and metadata access.

Then I set up a proxy server in the remote office, and repeated the activities.  That significantly reduced the penalty for the file transfer activity, as the file content was cached locally.  However, the penalty for network activity for metadata was still present.  That's about what I'd expect: a proxy makes file content available locally, but still has to talk to the main server for any metadata activity.

Finally, I set up a build server replica in the remote office.  In this test, there was very little performance penalty, and the activities completed about as quickly as when I was using a workspace on the same LAN as the main server.  This last test shows the benefit of using a build server replica: file content, all metadata used for reporting, and workspace metadata are all available locally, with no communication with the main server required.  That eliminates some of the impact of network latency, and again transfers all work off of the main server.

As with all performance testing, your mileage may vary.  Automated build performance will depend greatly on your hardware, network, and other factors.  However, generally speaking a proxy server is a good solution if most of the cost of your build process is incurred during file transfer activity.  For the most demanding environments, a build server replica will give you the most performance improvement.

Nuts and Bolts

How do you configure a build server replica’s behavior? In a nutshell:

  • Follow the normal replica configuration to get started.
  • Use the new p4 server command to name the replica and describe its behavior.
  • Use the p4 serverid command to specify the server ID of the replica server.
  • Create build workspaces that are bound to the replica by entering the replica's server ID in the ServerID field in the workspace form.

Full details are in the release notes and System Administrator's Guide.

Where to Go From Here

As that user conference presentation shows, using proxies, replicas, and brokers gives you a tremendous amount of flexibility to support your development practices – even if you need to use words like massive and extreme to describe them.

Not sure how to use all of these new tools to support your development and automation? Don’t panic: Running a cutting edge development shop is hard, and we’re here to help. Perforce Support and Consulting can offer great advice on everything from Perforce configuration to best practices. Let us know if you need help!