April 11, 2012

A Maven Dependency Management Example

What's New
Integration
Traceability

maven logo

This is the second post in a three part series about managing dependencies with Maven.

In a recent article, I described a few high level ways to manage a complex dependency tree in your projects. Now I'll go through a simple example using a Maven repository to help build and maintain a Java project.

Background

When Perforce rolled out the Remote Administration program, I helped to write part of the reporting tools. Part of the program is periodic reports on Perforce server performance. I designed the reports in BIRT, an open source report engine, and then wrote a Java program to invoke BIRT and generate the report automatically. 

BIRT is a moderate complex product by itself, and so my Java program has to pull in the right set of BIRT libraries to compile and run.  My program also needs log4j, a logging system.

Managing Dependencies in Ant

When I first built the project, I wrote an Ant build script to compile, package, and run the program. The compilation task needs a single log4j JAR file and all the JARs included with BIRT in the classpath. So I downloaded these libraries and checked them in to a 3rdParty folder in our software depot.

The Ant script declares two properties to reference the location of these libraries, and then includes them in the classpath for compilation:

<property name="log4j" location="../../../3rd_party/log4j/log4j-1.2.15.jar"/>
<property name="birtlib" location="../../../3rd_party/BIRT/birt-runtime-2_6_1/ReportEngine/lib"/>

<javac srcdir="${src}"
            destdir="${bin}">
            <classpath>
                <pathelement location="${log4j}"/>                 <fileset dir="${birtlib}">                     <include name="**/*.jar"/>                 </fileset>             </classpath>
</javac>

It seemed ok when I first wrote it. After all, this is a Java program that won't change often, and there were only a few of us working on it.

But a couple of problems quickly surfaced:

  • The relative paths to the libraries in the depot are fragile. Not everyone maps things into their workspace in the same place, and so you start thinking about declaring environment variables to identify locations, and lots of other techniques that I'm not fond of.
  • I had an important piece of information embedded at the end of the two property declarations: the versions of log4j and BIRT that I was using (1.2.15 and 2.6.1 respectively).

Stream Paths?

When Perforce streams was released, I thought about using streams to manage the dependencies on log4j and BIRT. I could easily set up a stream with these paths (simplified for clarity):

share ...
import lib/log4j/... //3rdParty/log4j/1.2.15/...
import lib/BIRT/... //3rdParty/BIRT/2.6.1/...

That's a little better. I declare the dependencies centrally, and I can include the version information somewhere prominent. The libraries go into a predictable location in the workspace, so my build scripts are more consistent.

But it's still not perfect.

  • This technique doesn't really scale well when you have, say, 75 dependencies to declare. It's harder to manage and causes a performance impact as well (all those view lines make the Perforce server do extra work).
  • By default, changes in the paths in the main stream will flow down to all child streams instantly, without any throttling. Does the consultant working on the next version of the program in a development stream really want her dependencies to change right away? Or does she want to control when that happens?
  • Dependency trees are tough to manage. For example, BIRT itself actually depends on other libraries like commons-codec. So if I go with a straight directory path approach, I need to explicitly add these other dependences to my stream paths or just bundle them up into the BIRT folder.
  • Although you can use the spec depot and other tools to keep track of how stream paths changed over time, and roll back a workspace so that it has older dependencies, the versioning on stream definitions isn't quite as powerful or easy to use as the versioning on regular files.

Maven

So finally I tried Maven. First, I set up a skeletal Maven project by running mvn archetype:generate and picking a generic template. That lays out a predictable directory structure for a project that uses Java source code and wants to package into a JAR file. (I'm omitting a few details for brevity; Maven can be a quite sophisticated tool and I'm really only scratching the surface here.) It also generates the POM (project model definition) for this project.

Since the POM is just an XML file, I keep it in the project folder and check it in. That means it's fully versioned in Perforce. I can run Time-Lapse View to see how my dependencies change over time, and use Revision Graph to see how the file is branched and merged. If I change the POM in the main branch, a colleague working in a development branch will see that available as a pending merge. She can choose how and when to accept my dependency changes in her branch, just using regular merge tools.

The log4j Dependency

In my Maven POM, I declared a dependency on log4j:

<dependency>
        <groupId>log4j</groupId>
        <artifactId>log4j</artifactId>
        <version>1.2.15</version>
</dependency>

log4j, like a lot of open source tools, is hosted a public Maven repository that's available by default. Terrific - that's all I need to do, and now log4j is available for building and running my program. If I need to update to a newer version, all I need to do is update the version declaration.

The BIRT Dependency: Making a Local Maven Repository

BIRT wasn't quite as easy. There are no public repositories that host BIRT in a way that's easy to consume. So I set up my own repository using Archiva.

Setting up a private repository is something that most Maven users want to do eventually. It lets you publish your own libraries as artifacts available to other teams in your company, and it also proxies for some of the public repositories, potentially improving performance.

To use Archiva, I added a little snippet to my Maven settings file, declaring my own repository as the default mirror.

<mirror>
      <id>archiva.default</id>
      <url>http://my.maven.repo:8080/archiva/repository/internal/</url>
      <mirrorOf>*</mirrorOf>
</mirror>

Next I imported all of the BIRT library JAR files into Archiva, in this case just using the web interface. In order to make my POM simpler, I also installed a standalone BIRT project with it's own POM that declares all of the BIRT library files and their dependencies. Here's a snippet from that POM, showing the artifact ID and a couple of the dependencies:

<groupId>org.eclipse.birt</groupId>
<artifactId>birt-container</artifactId>
<version>2.6.1</version>
<packaging>pom</packaging>
<dependency>
            <groupId>birt</groupId>
            <artifactId>coreapi</artifactId>
            <version>2.6.1</version>
</dependency>
<dependency>
            <groupId>commons-codec</groupId>
            <artifactId>commons-codec</artifactId>
            <version>1.0</version>
</dependency>

By installing this POM, I can easily reference BIRT - all of its libraries and dependencies - as a single logical artifact in my project's POM:

<dependency>
        <groupId>birt</groupId>
        <artifactId>birt</artifactId>
        <version>2.6.1</version>
        <type>pom</type>
</dependency>

Building

Now that I have a POM that references the log4j and BIRT libraries, and I have BIRT installed in my local repository, I can build and run my program just by invoking normal Maven goals:

mvn package

The POM is short and sweet. Maven will automatically pull dependencies from my repository as necessary.

Recap

So how has Maven helped me manage dependencies in this simple project?

  • I've got a simple text file with all of my dependency declarations. This file is versioned, which means I can branch it, merge it, see the history, and roll back as necessary.
  • I can update dependency versions by changing a single line the POM. Downstream streams or branches can choose when and how to accept POM changes.
  • Maven automatically injects dependencies where necessary for building, testing, and running a project. I don't need to worry about workspace paths in my build scripts anymore.
  • Maven scales well to handling complex dependency trees via dependency inheritance. That's how I was so easily able to wrap all the BIRT dependencies into a single declaration in my project POM.
  • The stream, branch, and workspace views in Perforce are simple: good for users and good for performance.

And my apologies to the other consultants - I haven't shared these improvements yet. 

The Right Tool for the Job

In my last article and this one, I've laid out a few approaches for managing dependencies in a software project. There are a huge number of home-brew and off-the-shelf solutions for dependency management, but I think a lot of them fit in the general picture I've laid out.

Which tool you use will depend greatly on your projects, environment, and development teams. But as a piece of general advice, I'd recommend looking at dedicated dependency management systems sooner rather than later. Even in the simple Java program I described in this article, managing dependencies gets complicated sooner than you'd expect.

Don't hesitate to ask for help if you need it. Perforce Consulting and Support can give you advice and hands-on assistance if you need it.

Read part 1 and part 3 of thisseries on managing dependencies with Maven.