January 29, 2014

Some of the Wonders of the Sync Command

Healthcare

Image: Steffe via Flickr

An acquaintance of mine, a relatively new user of Perforce, said to me the other day:

I ran 'p4 sync', but I realized I wanted to do something else first, so I hit Control C and aborted it. Later, I ran 'p4 sync' again, and it picked right up where it had left off.

How did it do that?

Well, 'p4 sync' is certainly one of the two most important Perforce commands (the other one being 'p4 integrate'), and over the decades we have made many improvements and refinements to its operation.

The simple restartability of 'p4 sync', however, is one of its most basic and oldest features. Here's how it works.

For every client workspace in a Perforce installation, the Perforce server keeps track of which files that workspace has retrieved from the server. This list of files is called your "havelist", and you can see it by running the command 'p4 have'.

When the 'p4 sync' command causes the server to send a file to your machine, the server first sends the file, then it sends a message to the client requesting an acknowledgement of that file.

After the client has written that file to the filesystem on your workstation, the client sends an acknowledgement back to the server.

When the server receives that acknowledgement, the server updates your havelist, which is stored, by the way, in the 'db.have' table in your server's database. So when the server needs to know what files your client has, it just looks at your havelist.

And the sync command thus knows that it doesn't need to send you a file that you already have.

That's the simple explanation, but things are in fact far more complex than that, and most importantly I haven't mentioned the Control C part at all yet.

During the 'p4 sync' command, the server and the client are both busy: the server is sending files to the client as fast as the network connection will allow, and the client is writing those files to its filesystem as fast as the workstation will allow. And each time the client finishes one of those files, it sends an acknowledgement back to the server so that the server will know.

So data flows in both directions on the network connection: files are flowing from the server to the client, and acknowledgements are flowing from the client back to the server.

All of this data is flowing at the same time, asynchronously and overlapped. When the server sends the client a file, it doesn't stop and wait for the acknowledgement, for that would be very slow, and Perforce is not slow! The server moves immediately on to the next file; it knows it will (eventually) get the acknowledgement for that first file.

As long as the acknowledgement messages make it back to the server, the server knows that the client successfully received those files.

So now what about that Control C part?

Well, suppose that you are syncing files A, B, C, D, and E. And further suppose that you have already received files A and B, and you are are the middle of receiving file C when you abort the sync command by hitting Control C.

At that point, the 'p4' executable is terminated. However, the messages that it has already sent to the server are not lost! In fact, the client and server software work very hard to ensure that all your pending acknowledgements get processed. When the Control C signal is sent to the p4 client program, our signal handler ensures that we flush all our network buffers to the operating system. And the TCP/IP software in the network ensures that all data that was given to it does, in fact, get sent up to the server (although this may take some time).

The server, meantime, doesn't instantly abort its server-side processing when the network notifies it that the client program has been aborted. The server does stop sending new files, but it continues to read the pending acknowledgements from the network connection until every last acknowledgement has been received.

Now, there are a few situations which can disrupt this: if you use, say, 'kill -9' rather than a simple Control C to terminate your 'p4 sync' command, then the client program doesn't get to run its signal handler to flush its buffers, in which case some pending acknowledgements can be lost because the client hasn't given them to TCP/IP yet (so: please don't do that! Just use Control C instead!).

And, if you reboot your workstation right after you hit Control C, the TCP/IP network stack may not have had enough time to finish sending all those pending acknowledgements back up to the server before you rebooted your machine.

Or, if the reason your 'p4 sync' command was aborted was because the network that connects the client and server failed (perhaps you unplugged the network cable accidentally, or the router that connects your workstation to the network crashed, or some network administrator changed the firewall rules and accidentally rebooted the network firewall while your sync command was still running), then not only will you stop receiving files, but the server may not have received all your pending acknowlegements, either.

There's nothing we can do about that: if the network throws the messages away, then the server doesn't know about those acknowledgements. It's no big deal; it just means the server may re-send you a few extra files that you already have, the next time you run the sync command.

But as long as the network remains alive, the server will eventually receive all those pending acknowledgements and process them, which is why you can just simply restart your 'p4 sync' command later, and know that it will pick up automatically where it left off.

In fact, this mechanism is so reliable that, a few years back, we added a special command to 'p4 sync' so that it can perform these restarts itself: if you specify the '-r' flag to 'p4 sync', then if the network should crash in the middle of your sync, the 'p4 sync' will retry the command itself, picking right up where it left off (assuming that the network connection can be re-established later). This is a very nice feature for situations in which you have a unreliable and flaky network connection between your workstation and the Perforce server.

Of course, this is only one of the many things that the sync command does. Although its primary job is to send you new files, it does many other things as well:

  • If you sync to a new revision of a file which you have opened for edit, the sync command of course doesn't overwrite your changes and lose your hard work! Instead, it realizes that you have to merge your changes with the newly-submitted changes in the repository, and so it schedules a resolve to allow you to carefully merge that work together.
  • Sometimes, the sync command must not only send you new file content, it must delete files from your workstation (because a submitted changelist has deleted them from the repository). These deletions, of course, also use acknowledgements, so if you restart such a sync, the server won't tell you to re-delete the files that have already been deleted from your workstation.
  • Often, particularly if you are using the Perforce streams feature, you may be switching from one stream to another, but that doesn't mean that all the files in the new stream have to be sent to your workstation. Using cryptographic digests, the server knows which files in the new stream have the same content as the files in the old stream, and so it doesn't resend those files, only the files that are actually different.
  • The server doesn't write each acknowledgement to the database the instant it is received, because that would be slow and expensive, and the server is never slow and expensive! Instead, the server accumulates the acknowledgements into batches and writes them to the database when it has enough of them to justify the database cost.
  • If you have made some local changes to your workstation which mean that the server's knowledge of what files you have is no longer correct, that's no problem: you can just perform a "force" sync and the server will gladly re-send you all the files for your workspace (and will fix up its havelist in the process)
  • You might be issuing the sync command via a proxy or replica, in which case the sync command knows that it doesn't need to send you the files all the way from the master server; it can deliver those files to your workstation from that proxy or replica, which means the files have a much shorter distance to travel over the network.
  • In fact, if you're using an Edge Server, the master server doesn't have to get involved at all! The Edge Server does all the work to handle your sync command.

Over the nearly two decades that the sync command has existed, it has been enhanced many times (in fact, way back then, it was actually known as the 'p4 get' command -- did you know that?). But we're a long ways from being done, and we're constantly putting new improvements into the sync command.

In the 2013.3 release, many enhancements were made to p4 sync. On a properly-configured server, the sync command can now perform many of its database queries in "lockless" fashion, which means sync commands on a heavily-loaded server are much less disruptive to other concurrently-running commands (such as populate or submit). And the most recent version of the sync command takes much less memory, so installations which have very large codebases (workspaces with tens of millions of files synced to them) can now support much higher loads.

But we won't stop there! In the 2014.1 release of the server, which will be released next month, you'll be able to use a new feature of the sync command to specify that multiple files should be delivered in parallel. "Parallel sync," as we call it, is not appropriate for all situations, but in certain situations it can substantially speed up the elapsed time of a large sync operation.

And, did I mention: executing your commands as fast as possible is what Perforce is all about. We've been doing that for the past two decades, and you can count on us doing that for decades to come.

So sync, and enjoy.