May 27, 2010

JournalReader: Journal and Checkpoint Analysis

Integration

Now, if you are of a fearful disposition, it is definitely time to press the back button on your browser, 'cause there are lots of dragons here.

<waiting>

Still here?

Alright, then let's begin.

When I have my consulting hat on, I occasionally need to analyze a customer's checkpoint. Trying to understand the Perforce journal and checkpoint format anyway, I decided to write a little Java program that would allow me to do the analysis for myself.

Now, this is not an (officially) supported tool, but a framework that is used by Perforce consultants and potentially curious and motivated administrators and programmers.

The idea, right from the start, was to allow me to run different kind of analyzers and manipulators on the schema that would allow me to collect statistics, but also to rewrite  the checkpoint and ultimately dump the checkpoint and journal in an SQL database (therefore the choice of Java with its JDBC drivers). This is different to the most excellent tool P4toDB my friend and esteemed colleague Jason Gibson has written, in that JournalReader is a static analysis tool of an arbitrary checkpoint, while P4toDB is mostly designed to continually update an SQL database via 'p4 export' from a live Perforce Server. This make P4toDB more useful for ongoing analysis of a Perforce server; on the other hand it  requires P4D 2009.2 since it uses the schema and journal extraction interfaces introduced in that release.

JournalReader has a static schema compiled in. It can be used against any checkpoint of any Perforce version ever released; this also means you will need to wait for an update of the the tool to analyze the latest checkpoint (don't worry, 2010.1 is already supported).

One fact that helps a lot with journal and checkpoints is that they are text files and therefore easy to parse. So why not simply use 'grep' and 'sed'?

The problem is that a single entry in the journal/checkpoint can span several lines. Just think of the "change" description field. In the journal format, strings are surrounded by @ symbols (see discussion in The P4 command line client in colour post), which makes it challenging to parse with simple tools like grep.

I will discuss the inner workings and some of the more interesting analyzing and manipulation tools I have written (they are called "actions") in a later post. For now, let me simply demonstrate some of the fun applications of JournalReader:

java -jar journalReader.zip checkpoint.911.gz

This command will read the checkpoint and print it out on stdout. That sounds quite boring and entirely useless, I know (zcat would have been much faster), but it is good practice to have an identity function to verify the inner workings of a tool. Despite the trivial outcome, there is a lot going on inside the JournalReader:

  • The checkpoint is decompressed and parsed, record by record
  • Each record is identified by its type (insert, update, delete) and by the table it refers to
  • An internal schema object is assigned to the statement which allows future actions to access each field of the record by name and type
  • An action (here the default action, PrintAction) is performed on each record

Not convinced? Try this for size:

java -jar journalReader.zip \
     --action journal.action.ClientWorkspaceReporter checkpoint.911.gz

Now the output is a list of all workspaces, their age, the number of files synced to each workspace and the number of files open. The same operation on a large Perforce Server would lock that server for quite a while; it would also not be idempotent, because retrieving the number of open files from a workspace sets the 'Accessed' field in that workspace, rendering future analysis of obsolete workspaces useless.

Now your administrator can analyze the client workspaces at his or her leisure and decide which workspaces need to be cleaned up or deleted.

ClientWorkspaceReporter implements journal.action.Action, which defines the following contract:

public interface Action {
 public void help();
 public void start(Options options) throws Exception;
 public void finish() throws Exception;
 public String[] parseArgs(String[] args);

 public void putValue(JournalEntry entry) throws Exception;
 public void replaceValue(JournalEntry entry) throws Exception;
 public void deleteValue(JournalEntry entry) throws Exception;
 public void verifyValue(JournalEntry entry) throws Exception;
 public void commitMarker(JournalEntry entry) throws Exception;
 public void flushMarker(JournalEntry entry) throws Exception;
}

The class journal.action.BaseAction has default implementations for all these methods. With this base class, writing a filtering Action class that only prints out certain entries from a checkpoint is now a trivial 5-line job.

If you are interested, download the JournalReader from the public depot. It comes with the usual "No Warranty" license, but it is safe to use for analysis. Checkpoint manipulation is a complete different kettle of fish we will get into in another post :-).

There is a help system built in (try java -jar journalReader.zip --help), since I can never remember the options myself.

JournalReader is available on the public depot (public.perforce.com:1666) under //guest/sven_erik_knop/java/JournalReader/, or you can use the P4Web link here.

Happy hacking.

Sven Erik