February 8, 2010

Introducing P4toDB - the Perforce integration with SQL databases.


I'm pleased to announce beta availability of a new tool for administrators called P4toDB. P4 -> to -> DataBase is a one-way replicator of Perforce metadata into an SQL database. This affords you more accessible metadata, and all the amenities of the SQL toolchain such as an existing employee skillset, the SQL server itself, ODBC drivers, integrations with reporting tools, programming libraries and so forth. You can also replicate multiple source Perforce servers into a single destination SQL database, so the sky's the limit for what you can do with it!

You can find P4toDB under the Related Software page under the Tools & Utilities heading.

Supported databases are: Postgres 8, Microsoft SQL Server 2005, MySQL 5 (with InnoDB), IBM DB2 9 and Oracle 11.

Here are some minimal quick-start steps for setting up a test environment. You really ought to read the README.txt at some point, though. If you're a Perforce administrator without much knowledge of database administration, these steps will get you going to play around with it, but I recommend finding your DBA and collaborating since this tool will require some database-specific knowledge - installation defaults won't always cut it. E.g. the default maximum number of rows in a table might be too low.

I'm going to use MySQL syntax and command-line interface for the examples, since many people use that tool and it's easy to set up. If you don't have a convenient SQL server already available, then Postgre is another good low-barrier-to-entry choice. The rest of the supported databases are of course excellent options if they are available.


Step 0: Get a suitable Java 6 runtime.

Sun's JRE or JDK are fine. The JDK comes with some extra diagnostic tools which are nice, but not necessary. If you're unsure what to get, look for the Java Platform, Standard Edition. Now's a good time to get it anyways since Sun just released update 18 recently.

Step 1: Set up a new Perforce server.

Normally you'd go to the download page, but for the sake of expediency, go right to the 2009.2 section of the ftp site, browse to your platform's directory and grab the server executable "p4d". Put it in a temporary directory and start it up in the foreground;

p4d -r . -vserver=3 -Llog.txt -p localhost:5000

Leave off the "-Llog.txt" if you want to have the log printed to the console so you can see what's going on behind the scenes. Then, follow-up with taking a checkpoint, since that's the initial data that P4toDB uses to populate the SQL database. Note that uncompressed journals and checkpoints are a requirement.

p4 -p localhost:5000 admin checkpoint

Step 2: Configure the SQL server.

If you're trying this out on your own machine and don't already have an SQL server installed, go to the vendor's website and find the correct package. It's pretty straight-forward using the default settings. After that, connect to the database with administrative credentials, create the database and grant the P4toDB user access.

CREATE DATABASE p4todb USING CHARACTER SET utf8; GRANT create, create view, index, insert, update, delete, select ON p4todb.* to 'p4todb'@'localhost' IDENTIFIED BY 'password';

You could also be really blunt about it and "grant all privileges ...". Note that I specified the UTF-8 character set - unless you have a good reason not to do this, I recommend doing it to avoid unicode issues down the road.

Step 3: Get the SQL database's JDBC 4 driver.

These are not always made very visible by the SQL vendor. Sometimes they have their own product page, and sometimes they're hidden away in some obscure package. Here are links to MySQL and Postgre's.

Step 4: Prepare P4toDB.

Unzip p4todb.zip, go to the extracted directory and check out the README, release notes and sample queries. Note that you will launch P4toDB via a platform-specific wrapper script (which you'll probably only ever change in order to tell the JRE to use more memory). Lastly, put the JDBC JAR file into P4toDB's "lib" directory (where you'll also find a copy of the P4-Java JAR).

Step 5: Create a configuration file for P4toDB.

Normally you'd start off with the template (p4todb --sample-config), but these four lines will suffice for this scenario. Call the file "test.properties".

P4PORT = localhost:5000 P4USER = perforce DRIVER = com.mysql.jdbc.Driver URL    = jdbc:mysql://localhost/p4todb?user=p4todb&password=password

The URL is vendor-specific. The README gives examples for other databases.

Step 6: Run it!

p4todb --config-file test --checkpoint 1

Note that if you have a large amount of data, that this could take quite a while. You should choose the tables you replicate carefully. If you want to monitor progress along the way, you can either query the P4toDB configuration table (SELECT `number`, `offset` FROM `p4todb_cfg`; where the offset is the number of bytes into the checkpoint) or turn logging to FINER in the configuration file. To give you a rough idea of the speed, my desktop machine does about 50GB a day. Yes, this is definitely something I'd like to improve upon in the future.

Step 7: See that that slick replication thing is happening. (*puts on a pair of sunglasses*)

We first take a look at what's currently in the SQL database, then update something on the Perforce-side and re-check the database.

mysql> SELECT * FROM `counters`;
| name      | value  |
| journal   | 1      |
| upgrade   | 21     |

cmd> p4 -p 5000 counter -f ha-ha! 1234 (Polling defaults to once every 5 seconds.)

mysql> SELECT * FROM `counters`;
| name      | value  |
| journal   | 1      |
| upgrade   | 21     |
| ha-ha!    | 1234   |

And there you have it! Now you can do all sorts of interesting near-realtime stuff. Schema documentation is described in the README. I encourage you to share interesting queries that you come up with. Perhaps we will have a wiki page for that at some point. If that is something you'd find interesting, let us know (via Support). Here are some quick examples queries that I came up with when writing this post. An equivalent to "p4 changes -m 1" to find the current highest changelist number is "SELECT `change` FROM `rev` ORDER BY `change` DESC LIMIT 1;" To search changelist descriptions, perhaps looking for cephalopod lovers:

SELECT `change`, `desc` FROM `change` WHERE `desc` LIKE '%squid%';
| change | desc    |
|  17234 | Squid!  |

To report the number of changes submitted by user:

SELECT `user`, COUNT( `user` ) AS num_changes FROM `change` GROUP BY `user` ORDER BY num_changes DESC LIMIT 10;
| user      | num_changes |
| qzoltan   |       46982 |
| harish    |       19803 |
| mall      |        2965 |
| lmitch    |        1237 |
| alurnr    |         826 |
| zog       |         742 |
| edee      |         676 |
| loola     |         631 |
| jiminez   |         537 |
| gonltz    |         442 |

You could count the most-synced files with this query by looking at the 'have' table and the 'dfile' column. This doesn't seem to be a very efficient query though, as it takes quite a while on my small-ish data set. The following query shows which workspaces have a particular file synced. Note the 'BINARY' modifier - MySQL does case-insensitive comparisons by default, so a cast to the binary character set is necessary to make it case-sensitive. The string manipulation is to extract the workspace name. SELECT SUBSTRING_INDEX( SUBSTRING( `cfile`, 3), '/', 1) AS workspace, SUBSTRING( `cfile` FROM 2 + LENGTH( SUBSTRING_INDEX( `cfile`, '/', 3))) AS workspace_file FROM `have` WHERE `dfile` = BINARY '//depot/proj/specialtests/Bar1.txt' ORDER BY workspace ASC; A couple of other things that come to mind are: querying labels (I see some of you perking up back there ;), checking clients/users without updating the modification time, checking password strength, etc. I think there is a lot of opportunity to build your own tools on top of this data as well, so I'm interested to hear what people come up with.