October 8, 2013

You're Probably Not Running Enough Archive Pull Threads

Traceability

Image: Sewing Daisies w/Flickr

Perforce replica servers can be configured in a variety of ways. To learn more about replica servers, run 'p4 help replication' and look through the documentation for replica servers, or consult the System Administrator's Guide.

One of the important decisions that you must make when configuring a replica server involves how it will handle its archive of versioned file contents.

For some replica servers, no file contents are necessary; the only thing that the replica server needs is the database metadata. For example, you might choose this option if you were building a replica server to aid with offline checkpointing, so that you can regularly checkpoint your server database without impacting production use on your main server.

A replica server may also maintain a partial copy of the archive files, holding only those archives which are regularly referenced by the users of that replica. In this configuration, the replica operates similarly to a Perforce proxy, caching those archives which are in use and not holding unused archives.

Or, a replica server may maintain a complete copy of the archive files. This is critical for a replica which is used for disaster recovery, or as a standby spare machine, and is also useful for a replica which is being used as the basis for periodic backups of your archive file data, again without impacting production use on your main server.

For those replica servers which actively maintain copies of archive files, you should specify one or more 'pull -u' threads to retrieve new file content from the master server and store it on the replica.

Here, you have a choice: how many 'pull -u' threads should I have?

Years ago, when I was setting up the first internal replica servers, and I wrote the first draft of the replica setup documentation, I chose for my example replica server that I would use 2 'pull -u' threads. This example has been reproduced in many locations, and now it is common to see the suggestion that your replica server should have 2 'pull -u' threads.

But there's nothing that restricts you to having 2 'pull -u' threads, and in fact there are a number of reasons why you would be better off with a different configuration.

New files tend to be added to a Perforce server in batches. It's relatively rare to see a changelist with just a single file in it; changelists often have dozens or hundreds of files, and changelists with thousands of new files are not uncommon.

So new file arrival in your master server is probably "bursty", and if your replica learns about a "burst" of new files, and has only 1 or 2 'pull -u' threads available to retrieve those new files, it's going to take the replica a while to transfer those files.

Moreover, many replica servers are used for build automation, and it's common for automated builds to reference the very latest file contents immediately after they have been submitted. So, a common pattern is for your users to submit new files to the master server, then almost instantly the build automation software attempts to access those files on your replica.

Replica servers are built to handle the situation when a user references a file which is scheduled to be transferred from the master server, but has not yet been transferred: the file is retrieved from the master server "on demand," synchronously, by the command which referenced that file.

But this synchronous on-demand file transfer is not very efficient. It is fine for handling the occasional file reference, and the occasional situation in which a replica command needs access to a file which the replica doesn't have in its cache. But if you try to 'p4 sync' many thousands of new files from a replica, and none of those files are in the replica's cache, performance can suffer greatly.

Happily, you can avoid, or at least greatly reduce the occurrence of, this performance problem, simply by having more 'pull -u' threads. Ideally, you'd like all file transfers from the master to the replica to be handled by the 'pull -u' threads, and no file transfers to require "on-demand" retrieval. When the 'pull -u' threads retrieve files and transfer them to the replica, all of the work happens in the background, asynchronously, and user access to your replica server is as efficient as possible.

Having extra 'pull -u' threads costs you very little: each archive pull thread uses only a tiny bit of memory, and the archive pull threads only wake up and consume resources when there are pending file transfer requests in the replica work queue; the rest of the time they are just sleeping in the background.

So, if you're running a production replica server, and you have a fairly active site, and you're only using 2 archive pull threads, you're probably not running enough archive pull threads.

Try increasing the number of archive pull threads. Perhaps try defining 5 archive pull threads, and see how your replica behaves. Keep an eye on your archive transfer queue by running 'p4 pull -l -s' regularly, and monitor the queue size. If you see that the queue is still building up, you may even need to have more pull threads. I've heard of a few sites even having more than a dozen pull threads, although since archive pull threads are very industrious workers, you really need an unusual situation to have that many.

How many archive pull threads are you running, and what number have you found works best for you? Drop me a line, and let me know!