August 19, 2015

Opening the Hood on Git Fusion

Git at Scale

The latest release of Git Fusion brings major performance improvements to initial push performance, as mentioned in a previous blog post. Let's open the hood on other new functionality in the Git Fusion 2015.2 release.

Better lock management

The locking mechanism in Git Fusion has been enhanced to allow concurrent read access to a repository, which means the locking system will now permit either multiple readers (fetch) or a single writer (push). A fetch will attempt to perform the Helix-to-Git translation if it can do so without waiting on any write locks (a write lock is needed for any Helix-to-Git translation, as well as Git-to-Helix). Otherwise, the fetch will skip the translation phase and simply return whatever has already been translated by an earlier fetch or push. In contrast, a push requires a write lock and will thus wait on any pending readers to finish. Any other readers that start their request after the writer has begun will be made to wait until the write has had a chance to progress (i.e. the lock mechanism nicely prevents starvation of writers).

Given the increased complexity of the locking mechanism, a new script is included that prints the status of any active locks. The script, p4gf_lock_status.py, reports the status of all locks it can find, including Helix key-based locks, Git Fusion instance-specific file-based locks, the disk-usage lock, and the common-reviews lock.

The repository-specific locks also have a facility for automatically detecting and removing stale locks. This is true of both the Helix key-based locks and the file-based locks.

Improved audit log details

The audit log has seen the biggest changes since its inception in this release. To begin, general exception errors and their stack traces are no longer sent to the audit log. Instead, only the failure to attain access to a repository is recorded in the audit log (i.e. user does not exist or user does not have permission), which is in addition to the usual logging of the pull or push of a repository.

Further, the audit log now captures more relevant information. Access to the repository is logged at the point of invoking the original git command (i.e. git-upload-pack or git-receive-pack), rather than immediately after accepting the request. This means that there will be one line per access in the audit log, success or failure in every case. It also means that any request rejected by a preflight check will not be logged at all (which makes sense as no content is transferred in such a case). Yet another difference as a result of this change is that special commands will not be logged in the audit log (again, no content is being exchanged).

Note that with HTTP, Git sends multiple requests, typically two or three. This cannot be prevented and hence each request will be logged separately in the audit log.

Read-only Git Fusion installation support

By setting READ_ONLY to true in the [environment] section of the configuration file named by the P4GF_ENV environment variable, the Git Fusion instance will be unable to receive push requests. It will also prohibit initializing a repository via clone or pull. When fetching from this instance, it will only return whatever changes have already been translated (and cached) by another Git Fusion instance that is not operating in read-only mode.

This feature can be used to facilitate the configuration of multiple Git Fusion instances, allowing one to be connected to an edge server or replica while another is connected to the commit/primary server. In such a setup, the read-only instance would serve content to a remote team more quickly than if Git Fusion were pointing to the commit server directly. Since push to an edge or replica server is considerably slower, administrators can use this feature to prevent committers from pushing changes to the edge or replica server. Meanwhile, pushing to the Git Fusion instance connected to the commit server is often faster.

To ensure the Git Fusion instance connected to the edge or replica server is kept up to date, the Git Fusion instance connected to the commit/master server should be set up to run p4gf_poll.py on a regular basis (e.g. using a cron-like facility).

Note that in general this feature is only to limit the activity of the Git Fusion instance configured as “read-only”. It does not mean that this instance makes no changes whatsoever to the content in the Helix Versioning Engine. It will, in fact, modify keys and create clients, as it must, in order to serve Git Fusion repositories.

As you can see, Git Fusion 2015.2 includes a series of targeted investments that pay huge dividends in terms of performance and memory use. Git use has only grown better with Perforce Helix, so why not download and give it a spin today?