August 23, 2010

Version Control Repository: Distributed or Centralized?

What's New
E-Commerce

Not too long ago, "distributed" in the context of version control referred to location of the users. Lately the term is also being used to define location of the repository, and you have probably seen the advent of distributed version control systems (DVCS) like Git and Mercurial. You are probably quite familiar with Perforce’s centralized architecture: a central repository that all users connect to, and pull the code down to their own respective workspaces. If you are part of a “remote” group of users then you could setup Perforce proxy as an intermediary to mitigate WAN transmission overheads.  In fact most commercial version control products have similar model of a central repository or multiple nodes on the network that users work off of. DVCS products use a different architecture: a repository for each user.

Wait a second – a repository for every user – isn’t that an overkill? Well if you work in an enterprise where controlled development environment is paramount, then probably that’s what you would think. However there are several motivations behind such architecture, mostly driven by the needs of open source software (OSS) development world:

  • Better performance: Open source SCM tools generally lack in performance, particularly across WAN, for operations such as commits, viewing revision history, or reverting changes. Having the entire repository local to the user gets around that problem.
  • No central control: Unlike commercial software development, OSS community is loosely managed in terms of development standards, procedures and policies.  DVCS does not mandate a canonical development “trunk” branch – it needs to be declared and managed by someone (the curator).
  • Working disconnected: OSS community is a loosely-coupled network of individual developers who like to work pretty much on their own, and like the idea of being isolated from others, especially when trying out experimental code branches.

So how does Perforce stack up, you wonder? Overall performance is one of the notable strengths of Perforce, while Perforce Proxy architecture works quite well to support distributed development.  Having a central “gatekeeper” is an organizational choice and Perforce is quite adaptable at instituting as much or as little centralized control over things like user check-ins. As far as working disconnected is concerned, Perforce has recently added newer features to P4V that allow all user actions (adds, edits, deletes) to be cached in a local “offline” repository, to be subsequently merged with the main repository. Besides, in today’s uber-connected world, finding ourselves out of connectivity is an exception than a norm.

DVCS is still an evolving field; tools, procedures and practices in its ecosystem haven’t quite settled down yet. It seems to have broader appeal in the OSS community rather than the commercial enterprise. Not to say that Perforce is not suitable for OSS development; in fact we are happy to provide free Perforce licenses if you develop software that is licensed or otherwise distributed exclusively under an Open Source license.