April 1, 2010

Converting the Audit Log to SQL

Version Control

The Perforce server's audit log records information regarding access to the content of depot files. For tracking data, it answers the question of: who, what, when, where and how. Every action or p4-command that exposes content to the user has one or more entries created for it in the log. This data can be important to track and understand, so in this post I'll illustrate how to take the text-formatted audit log and convert it into an SQL database, along with examples of how to use the results.

The log has one entry per-line, per-file accessed. A single Perforce command can generate multiple lines. There are two types of format you'll encounter. The first is for a direct client-to-server connection, and the second is for when a proxy is in between the two.

date time [email protected]         clientIP command     file#rev
date time [email protected] proxyIP/clientIP command-src file#rev

Audit log commands that you might encounter are: branch, annotate, diff, deliver, merge, print, revert, resolve, sync and grep. Note that "deliver" is a server -> proxy transfer, and that the IP can be the string 'unknown' if there was a problem.

Here are two examples. Notice that special characters are encoded in the file name. See the Limitations on characters in filenames and entities section of the command reference for details.

2010/03/26 11:57:26 [email protected] sync //depot/file%23#1
2010/03/26 16:10:09 [email protected] print-proxy //depot/abc#1

Now that you have an idea for what the audit log is about, you can take a look at the script (auditlog2sql.pl) I wrote to parse the log and write the data to a SQLite database. Its only dependency is the sqlite3 executable. There's also a version that uses Perl's DataBase Interface, which will make it easier to adapt the script for use with other databases. There isn't much difference between the two other than the DBI one being 18% slower (something I haven't looked into.) There's nothing fancy going on in the script, but it should be useful as a reference.

If your server doesn't already have audit-logging enabled, you'll need to either use the P4AUDIT environmental variable, or the '-A' flag when restarting the server.


Now on to the examples. The first thing we need is some log data to process. To start off, we'll create a new file with two revisions:

p4 add file
p4 submit -d 'rev 1' file
p4 edit file
p4 submit -d 'rev 2' file

One thing to note is that if a user adds/submits a new file, that there won't be an audit log entry for it. The rationale being that the server hasn't actually sent the user any file content - the audit log tracks access.

Now if we remove the file and re-sync, it'll get logged.

p4 sync file#none
p4 sync file#head cat /
p4_logs/audit.log 2010/03/31 13:27:46 [email protected] sync //depot/file#2

Now create a few more log entries.

p4 sync file#1
p4 sync file

The converter script has two ways it can be called. The first is where you specify both the name of the log file and resulting database. The second is where the script reads the log from its STDIN and writes to the default database file 'p4audit.db'. The database and target are created if they don't exist, otherwise they are used as-is.

cmd> auditlog2sql.pl   /p4_logs/audit.log p4audit.db
cmd> auditlog2sql.pl < /p4_logs/audit.log

This is what the generated schema looks like:

cmd> sqlite3 p4audit.db .schema
CREATE TABLE p4audit ( date INT, user TEXT, client TEXT, proxyIP TEXT, clientIP TEXT, cmd TEXT, cmdSrc TEXT, file TEXT, rev INT );

Now we can start playing with the data. The first thing that comes to mind is that we now can tell the difference between the have and had of a workspace.

cmd> sqlite3 -column p4audit.db
sqlite> SELECT DATETIME( date, 'unixepoch', 'localtime' ), rev FROM p4audit WHERE cmd = 'sync' AND user = 'user1' AND client = 'ws1' AND file = '//depot/file';
2010-03-31 13:27:46    2
2010-03-31 13:29:49    1
2010-03-31 13:36:33    2

If all you care about is the sync history, you might do the following, then keep around the 'had' table for every chunk of audit log for as long as you care to know the history of a workspace.

CREATE TABLE had AS SELECT date, user, client, file, rev FROM p4audit WHERE cmd = 'sync';
DROP p4audit;

Want to know who has seen files in a particular directory?

SELECT DISTINCT user FROM p4audit WHERE file LIKE '//depot/secrets/%';

Since the audit log will grow as fast as your users are getting data, I expect that the log will be rotated frequently. One way to use auditlog2sql is to convert each rotated log into a separate, named database, then extract data from each one into another, smaller database. SQLite can attach a database to the current session, so your audit pipeline might be like this: rotate the text-based audit log, convert it to a database, load the database, attach a database to extract into, extract data, close, repeat. You could keep the source database files around on some offline storage system and use the smaller ones for whatever reports you need.

Here's an example of how to convert two audit logs, connect the resulting databases and run a query between them. The active database you're using when in the SQLite shell is called 'main' by default.

cmd> ./auditlog2sql.pl audit1.log audit1.db
cmd> ./auditlog2sql.pl audit2.log audit2.db
cmd> sqlite3 audit2.db
sqlite> ATTACH 'audit1.db' AS prior;
sqlite> SELECT DISTINCT main.p4audit.user AS user FROM main.p4audit JOIN prior.p4audit ON main.p4audit.user = prior.p4audit.user;

Here's a more useful example - it shows how to record all file content your super user has ever seen. The first block is the bootstrap, and the next one is of the form that every subsequent import will follow.

cmd> sqlite3 audit1.db
sqlite> ATTACH 'super_seen.db' AS super_seen;
sqlite> CREATE TABLE super_seen.p4audit AS SELECT * FROM p4audit WHERE user = 'super';
sqlite> .exit
cmd> sqlite3 audit2.db
sqlite> ATTACH 'super_seen.db' AS super_seen;
sqlite> INSERT INTO super_seen.p4audit SELECT * FROM p4audit WHERE user = 'super';
sqlite> .exit

If you're not keen on storing the converted audit data in binary-form, you can make a text-based SQL export of the database like this:

sqlite3 p4audit.db .dump

For those curious about how P4toDB relates to this - the difference is that P4toDB reflects the current state of the server (and all its metadata,) while the audit log captures content access history. P4toDB provides one database per P4D instance, while auditlog2sql creates many.

If you come up with interesting audit-log queries, I'd like to hear about them!