Networked Software Development

SCM over the Internet and Intranets

Introduction

Many commercial software configuration management (SCM) systems support the corporate LAN (local area network); any user who is connected via a LAN can usually transparently access SCM repositories as if they were on the user's desktop workstation. However, due to their reliance on the high speed and low latency of LANs, not all commercial SCM systems are able to extend that transparent support to a slower WAN (wide area network) or the Internet. Those that do typically have a not-so-transparent solution. There, local users have first-class access to the repository while remote users must deal with a read-only replica, face massive source code merging down the road, or deal with often severe bottlenecks associated with higher latency and lower bandwidth.

Nowadays, modern corporations and their software development organizations are becoming increasingly dispersed. After a corporate acquisition, a key development team may be spread across the country or around the world and connected by a corporate intranet. Telecommuting developers often have powerful development environments sitting on their desktops at home, connecting via the Internet back to a central office. Some fledgling startups have no corporate headquarters at all, but instead collaborate entirely over the Internet.

For these organizations, Networked Software Development is a necessity. In this paper we discuss how Perforce -- The Fast SCM System -- can handle such environments, using WANs such as far-flung corporate intranets or the Internet. Our proposition is that Perforce is not hampered by the low speed and high latency of such long-haul networks. We also describe the pieces needed to make operating over such networks secure.

The Problem With WANs

The two metrics of a network's performance are its bandwidth and its latency. Bandwidth is simply the number of bits per second that can be moved from one computer to another, and ranges nowadays from 56 kilobits per second through a dial-up modem to 10 gigabits per second over copper and fiber; latency is the delay in getting each packet from one end to the other, and ranges from under 1ms for Ethernet to the better part of a second (or more) on the Internet.

In the world of the LAN, few users would tolerate anything less than 100Mb/s in their working environment and gigabit ethernet is commonplace; on the WAN side, however, much of the world is connected at cable or T1 speeds or lower. While improvements in technology are rapidly increasing the bandwidth remote workers enjoy, the physical limits imposed by the speed of electricity in wires, light in fibre-optics, or microwaves through space prevent a reduction in the latencies of WANs: London and California are at least 30ms apart even with the fastest network connectivity. In practice, due to the storing, forwarding, and switching that packets typically endure when travelling over long networks, the latency of most WANs is 100ms or more.

The problem with WANs is that their long latency dramatically reduces the performance of any protocol that depends on synchronous communication: such protocols are bounded by the round-trip-time, the time it takes to send a data packet to the remote end, and the time it takes to receive a response. While this round-trip-time is measured in a few milliseconds (usually <5-10 ms) on a LAN, a WAN may frequently endure a RTT of a second or more. Thus any network protocol that sends data and waits for a response is vulnerable to a greatly amplified latency and will typically run hundreds of times slower over a WAN than a LAN.

The network file system (NFS) protocols that lie beneath the architectures of some commercial SCM systems comprise a case in point. These systems achieve their network transparency by relying on the fact that NFS protocols make remote disks seem local. In fact, some SCM systems have no network support per-se, but are dependent solely upon access to networked disks. Network file systems themselves are usually built on the premise that their transport layers are fast: NFS tries to give the appearance of being local volumes, and NFS protocols assume that the network will answer at near-disk speeds. When the network delivers a response too late, NFS will often just fall down. As NFS falls so does any SCM system that relies on it.

Certain NFS implementations (NetBEUI on NT, NFS over TCP on 4.4 BSD) have overcome some of the problems of working over slower WANs, but they still cannot deliver data any faster than the network can. SCM systems are applications that expect fast access to the disks, and they typically do disk intensive operations like scanning directories or opening many files. Remote disks mounted over slow WANs cannot deliver data at the needed rate, and the resulting performance of is more or less intolerable.

The bottom line is that SCM solutions like IBM's Rational Clearcase, Serena's PVCS®, MKS Source Integrity, and Microsoft's Sourcesafe all rely on NFS for their transparency and none of them offer their normal solution for WANs. ClearCase has a WAN offering called Multisite, but it is markedly different than what their LAN product supports. It basically supports read-only replicas that are synchronized (by the user) over the WAN.

Perforce Network Support

Like the SCM solutions mentioned above, Perforce has an architecture that works over a network. However, Perforce has two specific design features that make it considerably less reliant on the performance of the network. These design features are:

  1. Local files but centralized metadata.

    A critical component of Perforce's design locates data so as to minimize network use. The repository metadata lives on the central server, so global state is all in one easy-to-access location. On the other hand, the files that a user works on are copied to a local disk for faster access. This means that operations such as editing, compiling, or searching all run at local disk speeds without requiring any network interaction.

    Only explicit Perforce operations make use of the network. When copying files between the user's client host and the central server, Perforce uses its global metadata to transfer only what has changed since the last copy. When reporting, the Perforce Server has fast, local access to all metadata, so only distilled information needs to be sent back over the network to the client.

  2. TCP/IP-based message queuing protocol.

    Perforce uses a streaming, message queuing protocol built directly atop TCP/IP. It does not rely on any NFS access. Furthermore, the Perforce protocol is asynchronous, so multiple messages may be sent and responses are processed as and when they arrive. This minimizes the impact of network latency on throughput.

    The difference between TCP/IP-based streams and UDP-based NFS is significant. With Perforce, the latency of the network affects Perforce once or twice each time the client command is invoked by the user. Thus, over a network with a 100ms latency the user might notice delays of an extra 300ms or 1/3rd of a second after typing a Perforce client command before the streaming response which then operates at full network bandwidth. This delay is primarily TCP handshake overhead, almost always tolerable and usually unnoticeable. With the NFS access required by other SCM solutions, the latency of the network affects each file I/O operation, which can easily number in the hundreds for something as simple as a file checkout. The result is impractical, as users refuse to wait minutes to check out a single file.

Since a Perforce user normally works against the local disk, he or she is only aware of the speed of the network when performing explicit SCM operations, and since Perforce operations are geared to minimize the effects of network latency, only network bandwidth is a real factor in the performance of Perforce network operations.

Perforce's requirements for network bandwidth are at their highest when transferring files from the repository to the client or vice-versa. Basically, the network is the limit on file transfer rates. How long the user has to wait then depends on the number and size of the files being transferred. For change submissions, the files being transferred are the ones changed by the user and are often fairly small in number. When the client is synchronizing with the repository and pulling down the combined efforts of the other clients of the repository, the number of files transferred can grow larger. In these cases, use of the Perforce Proxy Server to maintain a local cache of submitted file revisions will dramatically reduce the bandwidth requirements and often give users LAN-speed file delivery.

Note that the network is not always the limiting factor: at the highest network speeds (over 1000 Mb/s) it is often the client's filesystem that limits the speed of file transfers. This happens not because the quantity of data is so large, but in part because writing new files on a client's filesystem includes creating directory entries. On some UNIX-like operating systems, directory updates are handled synchronously. This improves the integrity of the on-disk data in the case of a power failure or system crash, but carries quite a performance penalty.

Non-file interactions with the repository such as change reports or user configuration result in very little data transfer and are usually not affected by network bandwidth.

Empirical reports from users support the notion that if the network is 1Mb/s or faster, interactions with Perforce appear to be as responsive as if the client and server were on the same fast LAN. At 64Kb/s (ISDN speeds), the speed of the network is noticeable when transferring files but not for reporting or other operations.

The intelligent architecture of Perforce's communications make it capable of performing SCM over any TCP/IP network, be it a fast LAN, a slower corporate WAN, or the (even slower) Internet.

Three Levels of Security

Once it becomes possible to open up access to the SCM repository to all users on the network, the most immediate concern will be security. The Internet, of course, is not a safe place for the software assets of any company, but even the corporate intranet can be a hostile place for small groups operating within a larger company.

This section discusses measures that are necessary to protect software developed across a network. The severity of each measure depends both on the sensitivity of the software and the insecurity of the network.

Level One - Perforce Protections

Perforce has a built-in mechanism for performing simple host authentication, based on TCP/IP host addresses. Called protections and managed by the protect command, this mechanism allows the Perforce repository administrator to limit the hosts that can access the repository. For a corporate intranet, this is often sufficient. It has the advantage that it can selectively enable read or write access to any parts of the repository for any collection of hosts. This makes it easy to give users on certain hosts the ability to look at data in the repository but not modify it. For example, the technical support organization may be permitted to look at source code or software developers may have read-only access to documentation.

The Perforce protections mechanism can also limit access based on user name, but Perforce does not strongly authenticate the user name, so this is not recommended as a security measure. It is instead a safety device to keep users from accidentally modifying data in the repository. For example, if a code line is frozen for a release, that part of the repository may allow read/write access only to users in the product release organization.

The Perforce Server on its own is not hardened against denial of service attacks (where unauthorized users repeatedly attempt to connect, just to use resources), nor does it encrypt all data passed between the client and server. Thus, a default Perforce Server install is intended for use within environments where espionage, packet-sniffing, or attack techniques are not a concern. For more adverse environments, firewalls and encryption with strong authentication are required.

Level Two - Firewalls

If a company intends to operate in a hostile network environment either by connecting to the Internet or by participating in a large corporate intranet, firewalls may be in order. A firewall is an excellent barrier between the trusted LAN and the untrusted wider network. Perforce uses plain TCP/IP connections, so configuring a firewall to permit authorized Perforce use is straightforward. In addition to any other configuration, a firewall needs only to pass Perforce connections (usually port 1666) through from trusted hosts to the Perforce repository machine.

A firewall itself provides protection against hostile networks, but doesn't help when sensitive data must travel over those hostile networks.

Level Three - Encryption and Strong Authentication

If a company plans to develop sensitive software over its corporate network or plans to involve the Internet at all in software development, then maximum security measures must be taken including encryption and strong authentication. Strong authentication is a means to verify that a TCP/IP connection made with the Perforce Server is from a trusted user on a trusted host and not just from someone who has gained physical access to the network, and often uses public key cryptography. Encryption software scrambles data so people snooping the network can't make use of it.

Many commercial and open source VPN technologies provide both strong authentication and strong encryption for distributed users based on a simple TCP/IP connection. Perforce's streamed message protocol is compatible with most (if not all) such implementations, so integrating Perforce with existing corporate security policy is relatively easy.

In practice, a combination of the above mechanisms is required to allow authorized Perforce connections while securing the LAN from a wider network. A VPN should be used to provide authentication and encryption, a firewall must be used to block non-VPN connections, and Perforce protections can further limit access to selected parts of the repository. The result of all this is a Perforce repository that can be securely accessed from anywhere on the Internet or corporate intranet.

Conclusion

Networked Software Development is a real need in many companies and is likely to continue to be the norm in the future. Perforce supports networked software development, and when used in conjunction with security tools (versions of which are often freely available,) Perforce can make the Internet or corporate intranet as viable a development environment as a LAN.

PVCS is a trademark of Serena Software, Inc.