October 31, 2012

Git Fusion and Working with Open Source

Git at Scale

open source workflow

Figure 1: With an integration repository, you can quickly collect and organize your internal changes -- and publish them for outside open-source developers to discuss and apply.

Git Fusion enables a new approach to working with open source. Instead of making complex trade-offs between the demands of in-house and open-source schedules and requirements, you can design a new kind of workflow that enables open-source cooperation with minimal disruption to the company process. 

Previously, companies were forced to take one of three approaches to working with open source:

ApproachAdvantagesDisadvantages
Treat an open-source component as a black box. This is the most common approach.Limited complexity, relatively easy to deploy new versions from upstream if needed.Often requires complex workarounds. The lack of a one-line fix in the upstream project may require a complex special case in your own code.
Modify the open-source component and contribute relevant changes to the project.

Let upstream developers help maintain your bug fixes and enhancements, by testing in combination with their new code.

Relatively easy to upgrade as the upstream project changes.

Make upstream developers aware of what you need from the project.

All developers involved need to be aware of open-source  practices and project culture, details of which may vary from project to project.

Developers must commit changes to open-source and in-house repositories separately.

Developers must commit changes to open-source and in-house repositories separately.

Modify the open-source component, but do not contribute changes.Easy for all developers to apply site-specific changes without knowledge of open-source norms.

Harder to do upgrades to new upstream versions because of the complex process of reconciling local changes with upstream.

Future upstream changes may affect your project in undesireable ways, since upstream developers who have not seen your work are less aware of your needs.

With Git Fusion, a company can work on its own "master" in-house branch of an open source project, and make it easy for an internal developer who is also a participant in the open source project to handle integration with upstream. That developer can construct a branch that is cleaned up for release, and push it to a single carefully administered public git server. The developer or team responsible for tracking open source can also pull down new versions manually–or the company can automatically populate a branch that tracks upstream. This process for working with open-source projects is similar to the process for working with other external contributors, such as contract developers.

In order to fix an internal bug, a developer might do some work that touches both in-house code and an open source component. For convenience, the original developer works in a single Perforce workspace or Git repository that contains both open-source and in-house code. The original work can all happen in a single Perforce submit, or a single Git commit and push. Then, using an integration repository, an open source-trained developer within the company can extract new commits with just the open-source work.

An example might look something like Figure 1 above. In the integration repository, we have a "public" remote pointing at the system outside the firewall that we use to distribute code changes to upstream, plus an "internal" remote pointed at Git Fusion and an "upstream" remote pointed at the open source project's main repository.  We can make a branch that tracks internal, and a branch that tracks the upstream master:

 $ git branch --track internal internal/master
 $ git branch --track upstream upstream/master

Now, we'll make a branch to work on changes to go upstream:

 $ git checkout -b for-upstream

 

 $ git cherry-pick --edit 6c02dd61

We can cherry-pick multiple commits here, to gather in all the in-house work that we want to submit. When we're done (assuming we had three of them), we'll rebase:

 $ git rebase -i HEAD~3

In the editor, I'll replace the "pick" at the beginning of the lines with "edit" and "squash" to construct the series of commits to publish. At the end, I get an editor with commit messages concatenated, and I can edit it to something that upstream will like. I'll add my "Signed-Off-By" line with my name and email address, to make my participation clear to the other project contributors. Now I have one, or a few, nice clean commits to submit upstream. I'll push them to public:

 $ git push public for-upstream

So, no matter what happened to the open source code internally, I can rebase the internal work in the correct way for the project, rewrite the commit message to be relevant and not mention in-house code, and contribute normally. Everything outgoing matches the project's "guidelines for contributors" without affecting the content or pace of the original work.

Thanks to Greg Kroah-Hartman of the Linux Foundation for reviewing a draft of this article.