March 5, 2019

How to Use Docker Volumes for Massive Builds

Continuous Integration

Docker containers have become an integral part of test, development, and Continuous Integration workflows. One of the benefits of using containers is their relatively small size and short-lived nature. Teams can generate clean, reproducible builds that are quick to deploy. Plus, there is no leftover environment data or custom tools to break the build.

The problem, but also a benefit of Docker, is that everything gets thrown away after each run.

If you are managing a lot of large assets — like many Helix Core customers are — you need a way to deploy a container without slowing things down. Copying large numbers of potentially very large files — such as graphics, movies, and sounds — thousands of times can severely impact performance.

Persistent external storage, like Docker volumes, significantly reduces the amount of time it takes to pull in large amounts of code and non-code assets into a container. This accelerates your container workflow and enhances team productivity.

In this blog, we will review the different ways to use Helix Core to allow your containers to access unchanged code and other large binary assets.

Building Without Docker Containers

Before Docker, build agents like Jenkins would sync your source code from Helix Core using a Helix Core workspace. Helix Core kept track of the file revisions synced into the build agent’s workspace.

When updating the build agent’s workspace with the latest file versions, the Helix Core workspace would only send the changed files. Helix Core uses a data table (db.have) to track the revisions synced into a workspace. You can see this by using the p4 have command.

Building and syncing without a container was quick and efficient. But this method lacked all the advantages a container offers, such as a clean and managed environment.

So how can you efficiently sync files into a temporary build environment?

Issues Using Virtual Machines

Companies turned to virtual machines (or VMs) for a solution. VMs provided a way to take a snapshot the machine’s state — a kind of versioning. But this still lacked the more formal management needed to control the machine’s environment. Other drawbacks included:

  • VMs have a lot of complexity, especially when compared to containers.
  • VMs need to be provisioned and configured.
  • VMs consume more IT resources and can be more expensive than containers.

Builds can break when provisioning an entire environment on a VM just to check your latest code change. This would cause a significant delay to your development and subsequent pipelines. This is why we see teams increasingly embrace containers to develop, test, and deploy within their Continuous Integration/Continuous Delivery (CI/CD) pipelines.

Puppet, Chef, and Vagrant all offered ways to manage an environment, but Docker stands out as a way to efficiently control the entire machine’s environment.

Using Docker for Continuous Integration

Docker containers give both developers and DevOps teams more agility. They are a lightweight alternative to VMs. They can be created in seconds and killed off when they have fulfilled their purpose.

Docker has proven to deliver performance improvements to Continuous Integration workflows for teams:

  • Building web apps
  • Employing microservices architectures
  • Working on projects that don’t build or deploy with large amounts of code

With containers, you get a clean sync every time, and there’s no leftover environment settings from earlier builds. There is no need for different tool chains to be installed.

So Do You Need a Helix Core Workspace for Docker?

If the container is short-lived, there are a couple options with Helix Core. Should you use a new Helix Core workspace every time the container starts? Do you even need a Helix Core workspace if the source is synced each run?

Using a Helix Core workspace has benefits. It allows you to:

  • Control the view
  • Map source files
  • Pin revisions
  • Apply filters

The more important question: should you track the file revisions when syncing with the db.have list? Let’s review your options.

Provisioning a New Helix Core Workspace

Setting up a new workspace for each container has benefits. When running a build, you know there is nothing left over in the workspace, and they are quick to start.

But each time you create a container, you need to delete the old workspace. If you do not delete it, you can clog the Helix Core db.have list. Better still, since the workspace is only used once, skip the recording of the have list using p4 sync -p command. You can use the Jenkins plugin option ‘SyncOnly’ and uncheck the ‘Populate Have List’ option to avoid syncing the ‘have list.’

checkout perforce(  credential: 'myID',   populate: syncOnly(have: false),   workspace: manualSpec(cleanup: true,     view: '//depot/project/… //${P4_CLIENT}/…')  ))


For the keen eyed, you may have spotted a new ‘cleanup’ option in the Jenkins p4-plugin. This is not exposed in the Pipeline Syntax Snippet Generator. Setting this option to true will delete the Helix Core workspace after the initial sync.

But be careful. It is important to ensure that no subsequent pipeline steps require the Helix Core workspace. Alternatively, there is a ‘p4cleanup’ pipeline step (an alias of ‘cleanup’) that, if set to true, will remove the Helix Core workspace and all local files.



Reusing a Helix Core Workspace

If you choose to reuse the Helix Core workspace, you can skip the cleaning up and deleting steps. You would need to ignore the ‘have list’, as a previous sync would no longer have the relevant information. You can achieve this using the ‘Force sync’ populate option. But again, because the ‘have list’ information is not used, it may be better to uncheck the ‘Populate have list’ option.

Note: You cannot set ‘Force Sync’ and unset ‘Populate Have List’ because this is an invalid state. You cannot ignore a list you never created.

Building the 1 TB Project With Docker Volumes

Many Helix Core users don’t just deal with small source repositories. They also have large code bases with hundreds of thousands or millions of files. Some of these codebases include large asset files, like binaries, that may reach over 1 TB of project code. So throwing away the workspace with the container only to resync all the files simply isn’t an option.

So how can Docker scale to support large files and code bases?

The answer is this: some of the persisting data used by Docker does not change all the time. Implementing Docker volumes allows you to compile some assets outside of the container, and then pull them into the build.

We are not scaling the container. Instead, we are placing some of the code, as well as non-code binary assets (graphics, etc.), outside the Jenkins build machine. This saves time compiling and copying large files. These large assets are readily available on external storage, and they’re ready for use.    

How to Create Docker Volumes in Helix Core

Follow these steps to create persistent, external storage.

  1. Mount your build area outside of the container, so it can persist through builds.

  2. Use 'AutoClean' (a Perforce reconcile/clean) if the area is shared or is at risk of pollution.

  3. Alternatively, use 'SyncOnly' if you can guarantee the Docker volume is always used by the same Jenkins job, agent, and Helix Core workspace. 

Dealing With Concurrent Access

It is vital to prevent concurrent access to a shared workspace mounted by Docker. It would be a bad idea to have multiple Docker instances building from the same set of external files — unless you are sure that generated assets or intermediary files are not visible to another container.

While Helix Core workspace reuse has its advantages, concurrent access of the same workspace will lead to issues. This is especially true if two or more executors are trying to update the same ‘have list.’

Put Power Into Continuous Integration With Helix Core

Docker is an incredibly powerful tool to accelerate Continuous Integration and give developers near instant feedback. And it provides better fidelity into what you are running in production. By using external storage, like Docker volumes to persist some of the code and binaries, you can realize these benefits even with massive projects.

See Helix Core in Action