July 13, 2015

How to Extend Open Source Projects

Integration
When starting a complex project, it's wise to verify you aren't re-inventing the wheel. With the wealth of available open source projects it's likely there are useful components (be they libraries, a web server, etc.) or if you're particularly lucky, a product that is close to what you need.
 

 
In the case of a product in particular, you'll almost certainly need to adjust how things work. This may be as simple as contributing back bug fixes, providing features or adjusting workflow, docs, installation, etc.
 
Whenever you can, the easiest and best course of action is to simply contribute these changes back to the larger community. This allows you to avoid diverging and continue to benefit from the hard work of others contributing to the project.
 
Sometimes though, you'll simply need to diverge. If you want to add a feature or make a change that is critical for you but unpopular or misaligned with the community’s goals for the product, divergence is your best choice.
 
But how do we fork without forking? How can we diverge yet continue to give back in other areas and continue to accept change?
 
My team has been working on this issue while extending GitLab-ce into GitSwarm by adding the ability to bi-directionally mirror git projects with the Perforce Helix versioning engine. This is a powerful feature - it allows Perforce developers to deal with large assets while their Git co-workers access smaller views of the project. But it's not on GitLab's critical path so we've had to diverge from them in order to add it.
 
Our approach, which has been working quite successfully, is based on two key tenets:
 
  1. Don't touch their files!
  2. Run-time extension
 
By not touching their files beyond two minor tweaks to get into the app, we're able to avoid merge conflicts. This lets us keep the automated tests running frequently, allowing us to easily identify problems with taking community changes promptly and in smaller increments.
 
As we're not touching their files, we're programmatically extending the existing apps’ logic by embedding a Rails Engine inside of it. Ruby and Rails are particularly well suited to this approach. The ability to re-open existing classes and change their behavior for all instances of the class is extremely powerful.
 
This approach works well if you want to do things before/after the existing logic of the application runs. Or to conditionally prevent the existing logic from firing. If you find you're frequently needing to get in on the middle of large methods in the existing application, it's likely you'll need to contribute back some refactoring or consider alternative methods of adjustment.
 
If you'd like more details on our approach – what’s worked well and what needs improvement; check out my upcoming Webinar! Or download a copy of GitSwarm and take a peek for yourself.