August 9, 2010

Selective Encryption To The Cloud Via A +X Trigger

Cloud

While researchers try to solve the problem of generic, secure remote computability, we Perforce users can take advantage of the advanced features of Tahoe-LAFS to hold at least the versioned file store of Perforce in an otherwise untrusted cloud of storage servers. While researchers try to solve the problem of generic, secure remote computability and storage, we Perforce users can take advantage of the advanced features of Tahoe-LAFS to hold the versioned file store of the Perforce server in an otherwise untrusted cloud of remote storage servers. This ability was recently made much simpler via the +X filetype.

sundog in centre of clouds

What Is Cloud Computing

There has been a lot of buzzword bandying lately of the term "cloud computing." Embryonic shifts towards remote computing services include the various commercial "cloud" offerings available such as Amazon's Elastic Compute Cloud, the private cloud Open Source Eucalyptus, and then at a much lower level, parallel computing platforms such as the Open Source Hadoop umbrella. Beyond that are wonderfully advanced concepts that haven't escaped academic research papers into practical form yet.

As much as these forays may promise to converge into ubiquity, this hasn't yet happened—but it is threatening to. One of the biggest obstacles of this convergence into a trustworthy general-purpose platform is well-vetted user-controlled strong cryptography wrapping not only sensitive data, but computation involving that data. Promising advances in both of these areas include Tahoe LAFS and Hadoop-on-Tahoe secure remote computation.

Introduction To Tahoe-LAFS

Tahoe-LAFS allows users to store data remotely without having to worry so much about trusting the network, storage nodes, or even machine reliability. Tahoe is one of the most conceptually advanced software storage mechanisms available, and uses cryptographically-sound "capabilities" to facilitate varying levels of trust and permissions. Someone with a "read-only" key might only be able to read an object's contents. Someone else with a "verify" key might be able to double-check the data is valid. A third person might have "read/write" keys which enable full access to stored objects.

Tahoe-LAFS encrypts a file client-side to produce ciphertext, then splits the ciphertext into generic erasure-coded redundant shares, and then spreads these shares out to remote untrusted nodes to store. Due to the nature of erasure-coding, only x of y shares are necessary to reconstruct the original, so, y-x shares can disappear from a Tahoe grid for any reason and your original file will still be functionally available. The overhead for this is surprisingly light, and x and y are 100% under the user's control.

Hadoop On Tahoe Impossible? Not Anymore

Hadoop-on-Tahoe secure computing is an impressive step towards the concept of secure, remote, parallel computing. This achievement still requires you to trust portions of the remote VM, but I am optimistic that humans will eventually be able to encrypt not only remote data, but also some kinds of remote algorithms.

Perforce Archive Triggers

In Perforce version 2009.1, the +X archive trigger was introduced. This gave users the ability to control how and where Perforce versioned files were stored. Users now have the ability, for example, to supplant the normal RCS and zlib-based backend storage mechanisms with ones of their own devising.

The way this trigger works is simple: for all files with the +X filetype modifier, P4D will call the appropriate trigger with the operation, the filename, and the revision number as arguments to the trigger. It will then pipe the data to the trigger on the trigger's standard-in handle for the "write" operation, or wait for the trigger to pipe file data to it for the "read" operation, or for the "delete" operation it will do neither and trust that your trigger will delete the file. The calling P4D will then wait for the script to end and collect its exit status. This makes writing an archive trigger trivial.

tahoe_backend.pl As An Archive Trigger

The short script that follows has been tested against Tahoe-LAFS 1.7.1, which is the current release as of this post. NOTE: This script is only for UNIX hosts, although Tahoe-LAFS itself works just fine on Windows.

Since this article is not comprehensive in scope, I will assume you have a working Tahoe storage grid up and running and available to the user that runs P4D. For more information, visit their quickstart documentation, here. Once this is done, simply install your trigger like so:

p4 triggers
happyday archive //... "/path/tahoe_backend.pl %op% %file% %rev%"

The trigger script above is available at tahoe_backend.pl. You can see how trivial it is to create: it's only a page or two long, and is primarily just comments. Next, configure your typemap to include the +X modifier on those files you want it to apply to:

p4 typemap
+X //...

WARNING: At this point, all files with the +X modifier will be passed through the archive trigger. Make sure you proceed carefully! Additionally, at this point all pre-existing files will still be stored normally. The above typemap will push all new files through the archive trigger.

The interesting implication here is that the trigger can be selectively applied not only to new incoming files, but also, via the trigger path itself, to specific Tahoe grids based on a Perforce file pattern. If you have, for example, lawyers who insist that all .XML documents should be carefully encrypted, you might use a trigger pattern like so:

lawyer1 archive //....XML "/path/tahoe_backend.pl %op% %file% %rev%"
lawyer2 archive //....xml "/path/tahoe_backend.pl %op% %file% %rev%"
lawyer3 archive //....Xml "/path/tahoe_backend.pl %op% %file% %rev%"

In this example, there are three individual entries in order to catch differently-cased .XML files. On a Perforce server set to operate in case-sensitive mode, this is a necessity.

Now when performing standard Perforce operations, you will be utilising the encrypted file storage grid provided by Tahoe-LAFS; files will be encrypted, erasure-coded into shares, and distributed. The benefits are plural:

  • Remote storage is trivial to expand; simply add more commodity nodes that participate in the Tahoe grid.
  • Files are error-tolerant to a user-specified degree.
  • Tahoe can be instructed to use convergent encryption, in which case files are automatically de-duplicated.
  • Files are encrypted and unreadable to any curious remote node operators.
  • Local administration is limited to metadata only.
  • Incredibly simple trigger for ease of maintaining and fixing it.

It's easy to peer into a crystal ball and see the potential for something this powerful. Technology like Tahoe-LAFS tends to be ahead of its time enough that the network and computer power hasn't quite caught up to it in the general case, but still practical enough to be used in the special case. An encrypted Perforce versioned file tree in a Tahoe storage grid is one example of this.

Further Considerations

Since the default typemap can be overriden by a user, if the policy is to ensure at all times that the versioned file tree is encrypted, some additional measures should be implemented:

  • A change-submit trigger can be used to ensure the +X type is present on all submitted files;
  • Denying write access to the enclosing meta-data directory (and the rest of the P4D host system) will prevent users from submitting files if the filetype enforcement trigger, above, fails;
  • All encryption keys and configuration should be kept by multiple people for redundancy and trustworthiness—this might be a good candidate for shared secret schemes.
  • Outside of the configurability of Perforce archive triggers, Tahoe-LAFS can be used as a FUSE backend and thus perhaps even Perforce meta-data could be pushed out to the cloud. This would probably perform very poorly, but is an interesting idea.
  • Since Perforce makes heavy use of $TMP for storing intermediate files, consider putting your OS' equivalent of $TMP on a memory-based FS. Otherwise, some of your files will be stored (then deleted) in unencrypted form on potentially non-volatile hard drives.

Special Thanks

Thank you to Mr. Zooko O'Whielacronx for an invaluable technical review. Thank you to our own Jason Gibson for a final touch-up review.