May 16, 2013

Perforce Replication Lag & The Facts of Life


Recently, I happened across an interesting article on the MongoLab web site: Replication Lag & The Facts of Life.

MongoDB is a very popular modern "NoSQL" database which is extremely powerful, but it can be complicated to configure and monitor its operation.

Now, it so happens that I've spent the last several years of my life thinking quite hard about replication lag, so I was immediately attracted to this article.

And as I read the article, I realized that, although the MongoLab team are talking quite specifically about MongoDB, many of the issues they describe, and many of the strategies they recommend, are true of any asynchronous replication product.

And, in particular, these issues are of concern to Perforce users.

So let's take a more detailed look at Perforce Replication Lag, following the same overall structure as the MongoLab team do in their article.

Although the details vary (because Perforce servers are different than MongoDB servers), the overall principles are, not surprisingly, very similar.

  1. What is replication lag?

    Perforce replica servers replicate information from the master server asynchronously. This means that when you submit a changelist to Perforce, the submission completes immediately on the Perforce central server, with no waiting for the replica servers.

    Subsequently, each replica server retrieves information about the new changelist from the central server, asynchronously and in the background, and copies that information to the replica, at which point the users of the replica server see that changelist in the results of their commands.

    (This approach is also sometimes called "log shipping"; if you're searching the web to learn more about the fundamentals of this technology, try searching for "log shipping" as well as "asynchronous replication".)

    Perforce replica servers typically replicate the file contents as well as the changelist metadata. This file content replication also occurs asynchrously, as background threads in the replica transfer the content of the newly-submitted files from the master.

    Asynchronous replication is an extremely high performance implementation (as you would expect from Perforce, performance matters very much to us), but it does mean that the replication of metadata and archive contents is not immediate.

    This means that, as a practical matter, the replica servers in your Perforce installation may be slightly out of date as compared to your master server.

    We call this "replication lag".

  2. Why is lag problematic?

    Often, replication lag causes no problems at all, but it can introduce complications into your use of Perforce.

    A command which runs in a replica server returns results based on the replica's database; these results might be slightly older than the results that would be returned by the master server.

    For example, commands such as p4 counter change or p4 changes, which are commonly used by automated build tools to query for new submissions to the server, will not notice a new changelist until it has been replicated to that replica.

    So if your build tool submits a changelist to the central server, and then expects that changelist to be instantly visible in the output of p4 changes on the replica server, it will not get the desired behavior.

    Another issue is that if you are using Perforce replica servers to provide an extra level of disaster recovery for your site, you need to be aware of replication lag, since it affects the exposure of your organization to loss of data during a true disaster.

    For example, if a tornado should strike your primary data center, taking out your central Perforce server, and your operations team decide that you need to fail over to your remote disaster recovery replica, determining the replication lag is necessary in order to inform your users about how much work they may have lost.

  3. What causes a replica to fall behind?

    Replica lag can occur due to resource issues: the central server may be a more powerful machine than the replica, and hence it can take the replica extra time to replicate work.

    Network bottlenecks can also arise between the central server and the replica, causing delays in transfer of data to the replica.

    And certain replica configuration settings can affect replica lag: if the metadata pull thread timer is set quite high (pull -i 3600, e.g.), then the replica lag will be affected; similar issues can arise with archive file contents if the replica is under-configured with pull -u threads.

  4. How do I measure lag?

    The primary tools for measuring lag on a Perforce replica server are the pull -l -j and pull -l -s commands.

    The p4 pull -l -j command tells you the current situation regarding replication lag for your replica. The output will look something like:

        Current replica journal state is:	Journal 1136,	Sequence 53080551.
        Current master journal state is:	Journal 1136,	Sequence 53080551.
        The statefile was last modified at:    2013/03/26 15:54:02.
        The replica server time is currently:  2013/03/26 15:54:02 -0700 PDT
    In this case, there is no lag at all; your replica is completely up to date with your central server.

    But if your replica's journal position is less than the master's journal position, there is work in the master's journal which has not yet been replicated to this replica.

    Journal positions are reported as Big Ugly Numbers (e.g., 53080551); in fact, these numbers are byte offsets within the server's journal file. So if the difference between the replica journal state and the master journal state is, say, 150234169, then the replica has 150,234,169 bytes of journal data which it has not yet replicated.

    Similarly, the p4 pull -l -s command tells you the current situation regarding file content lag for your replica. The output will look something like:

        File transfers: 2 active/10 total, bytes: 60400 active/135000 total.
    or, if you happen to run this on a fully-up-to-date replica, it would read:
        File transfers: 0 active/0 total, bytes: 0 active/0 total.
    The reason that Perforce replicas report their lag in units of bytes, is because the most common reason for replication lag is a network constraint, and it is useful to know the number of bytes that remain to be transferred.

    Since file content can be quite sizable, a Perforce replica may routinely be several tens of gigabytes behind the central server, but given the normal cycles and patterns of usage, the replica may easily catch up on this work during off hours when the network is less contended.

  5. How do I monitor for lag?

    As the MongoLab team observe:

    It is critical that the replication lag of your replica set(s) be monitored continuously. Since you have to sleep occasionally, this is a job best done by robots. It is essential that these robots be reliable, and that they notify you promptly whenever a replica set is lagging too far behind.

    If you are operating one or more Perforce replicas at your site, you should be actively monitoring the lag of your replicas.

    A simple way to do this is to write a small script to issue the two commands mentioned above: p4 pull -l -j and p4 pull -l -s, and to append the output of these commands to a long-lived log file.

    Then, using a facility like the Unix cron scheduler or the Windows Scheduled Task scheduler, arrange to run your script regularly, around the clock. For example, you might arrange to run this command every 15 minutes.

    Over a period of weeks or months, you can then review the information in these log files, to get a feeling for how the replication lag is occurring on your replica.

  6. What can I do to minimize lag?

    The specific suggestions that the MongoLab team provide are not directly useful for a Perforce replica, since Perforce replicas are not the same as MongoDB servers. However, there are some basic tips that you can use to keep a lid on replication lag and prevent it from being a problem.

    Feed Your Replica.

    Replication requires CPU, memory, disk space, and network resources; a replica which is starved of any one of these resources will experience increased delay. Precisely sizing a replica is not easy, because different replicas are deployed for different purposes.

    Some replicas will need substantially less resources than your central server, while other replicas may actually need more resources. For example those replicas which are deployed to offload build automation tasks from your central server may find that they are experiencing a dramatically higher load than the main server, since build automation can deliver a very high workload.

    Watch Out for Logout.

    If your monitoring tools tell you that a stable replica server has unexpectedly stopped replicating (that is, the lag is growing and the replica's journa position is not changing), the most common cause is that the service user which is used to authenticate the replica server to the master server has become logged out. Simply log the service user back into the master and replication will resume.

    This is a good reason to include a p4 login -s command for the service user in your automated monitoring job, so you can keep an eye on service user ticket expiration.

    Configure Appropriately.

    Replica servers have a variety of configuration settings that can be used to tune your replica, and several of these settings can have a big impact on replication lag.

    Ensure that the interval timer for your metadata pull thread is as small as possible; for many sites, pull -i 1 is quite appropriate, as the metadata pull thread's polling loop is quite efficient.

    Keep an eye on the file content backlog with pull -l -s, and if it is getting large, you should consider adding more pull -u threads to keep up with the workload.

    Consider enabling network compression if network bandwidth appears to be your bottleneck, and if you have adequate CPU resources.

    Be Aware of the Chain Gang.

    If you deploy Perforce replica servers in a chained topology, the management of replication lag becomes more complex, because each replica in the chain introduces an additional lag to its target.

    Avoid deploying long replica chains; in all the cases that I've seen so far, a replica chain has never needed to get longer than three replicas.

    Consider Selective Replication.

    If, despite all your best efforts, your replica server simply can't keep up with the master server's workload, it may be necessary to filter out some of the data.

    Although filtering is an extremely powerful technique to address replication lag and resource consumption issues, it has some subtle consequences, so please discuss the particulars of your situation with the Perforce Technical Support team before enabling replica filtering, so they can help you design a solution that works best for your situation.

  7. Don't let replication lag take you by surprise.

    As the MongoLab team note, managing replication lag involves compromises:

    the "right" balance will be different in different situations.

    Your first step toward managing a problem, of course, is to become aware of the problem, and learn more about it, and gather information about the extent of the issue and how it is affecting your production work.

Hopefully this article has given you some ideas that you can apply in your own situation, and you can pleasantly surprise your user community by ensuring that replication lag is never an issue for them!