Examples of verifying replica integrity

The p4 journaldbchecksums command provides a set of tools for ensuring data integrity across a multi-server installation, The command provides the ability to:

  • Compute database table checksums
  • Compute changelist checksums
  • Compute database table block checksums
  • Unload database table content in a time-consistent fashion on master/replica or commit/edge

When the rpl.checksum.* server configurables are set, they control the behavior and invocation of p4 journaldbchecksums commands when certain server events occur, such as journal rotation or the submission of a new change to the server. The p4 journaldbchecksums command can also be run manually.

Note

The examples below assume you are familiar with the Support Knowledge article on Journal notes.

Database Table Checksums

The following form of the p4 journaldbchecksums command:

p4 journaldbchecksums [-t tableincludelist | -T tableexcludelist] [-l N]

causes the server to write journal notes containing table checksum information:

p4 journaldbchecksums -t db.rev

@nx@ 12 1487712216 @41@ 9 -933920831 0 4 0 @db.rev@ @@ @@ @@ @@

Edge/Replica servers automatically verify the table checksums when processing these notes, writing the results to the server log and optionally an integrity structured log if configured:

Table db.rev checksums match. 2017/02/21 13:23:36 version 9: expected 0xC8557FC1, actual 0xC8557FC1

p4 logparse -m1 -F f_table=db.rev -T 'f_date f_results' integrity.csv
... f_date 2017/02/21 13:23:36 219149298

... f_results match

The result of a table checksum comparison is one of the following: match, DIFFER, or empty. In general, the remedy to unexpected checksum differences, whether caused by failed replication or other reasons, is to restore the edge/replica server database from a new checkpoint on the commit/master server.

about DIFFER

Table db.have checksums DIFFER. 2017/02/21 13:08:38 version 3: expected 0x3BB210EE, actual 0xB1BF3E83

p4 logparse -F f_results=DIFFER -T 'f_date f_table' integrity.csv
... f_date 2017/02/21 13:08:38 203821071
... f_table db.have

Table db.ldap checksums empty. 2017/02/24 11:33:54
version 0: expected 0x0, actual 0x0.

p4 logparse -F f_results=empty -T f_table integrity.csv
... f_table db.ldap

Possible reasons for table checksums to DIFFER:

  • The database structure diverged as the result of software upgrades. There are certain 'on-the-fly' upgrades that are performed against a database table when data in a table is accessed and this has the potential to generate differing checksums.
  • The database structure diverged as the result of replaying a checkpoint or journal. When administrators replay journal data or journal patches using p4d -jr, the transactions replayed into the database are not themselves journaled. This might generate differing checksums. When replaying journal data in a distributed environment, always use p4d -s -jr so the replayed transactions are journaled. This enables downstream edge/replica servers to replay them as part of the normal replication process. Be aware that p4d -jr run against a replica server only updates the replica's database files. This might generate differing checksums.
  • The database structure diverged as the result journal filtering. When filtering is active in your replication process, not all journal checksums are expected to match.

Changelist Checksums

The following form of p4 journaldbchecksums command:

p4 journaldbchecksums -c change

causes the server to compute a checksum of an individual submitted changelist. This checksum is written as a journal note:

p4 journaldbchecksums -c 12073
@nx@ 15 1487961638 @41@ 12073 1 0 0 0 @46B19358420B468668781A002BA0AC15@ @@ @@ @@ @@

Replica servers automatically verify the checksum of the change when processing these notes and write the results to the integrity structured log:

p4 logparse -F f_change=12073 -T f_results integrity.csv
... f_results match

Server behavior depends on the setting of the rpl.checksum.change configurable.

Database Table Block Checksums

The following form of the p4 journaldbchecksums command:

p4 journaldbchecksums -s -t tablename [ -b blocksize ][-v N]

Causes the server to scan the specified database table. The table is scanned in blocks. The number of records in a block is specified by the -b flag, which defaults to 5,000. For each block, the server computes a block checksum and writes it as a journal note:

p4 journaldbchecksums -s -t db.have

@nx@ 17 1487964567 @41@ 3 1 313 0 0 @db.have@ @@@//Talkhouse/build/jar/Talkhouse.jar@@ @ @@@//Jam/MAIN/src/glob.c@@ @ @2BCDA450287C03DE3433AEB6278EA4AA@ @@

Replica servers automatically verify these blocks when processing these notes and write output to the integrity structured log if configured:

p4 logparse -F 'f_table=db.have' -T 'f_results f_checkSum f_checkSum2' integrity.csv
... f_checkSum 2BCDA450287C03DE3433AEB6278EA4AA
... f_checkSum2 D41D8CD98F00B204E9800998ECF8427E
... f_results failed

This command can be used with large tables on a production system because the table is unlocked between each block. Inspecting the results of the block verifications reveals the location of damage that affects only part of a database table.

Database Table Unload

The following form of p4 journaldbchecksums command:

p4 journaldbchecksums -u filename -t tablename [-v N] [-z]

causes the server to unload the specified database table to the specified file. The command also writes a journal note describing this action:

p4 journaldbchecksums -u working.txt -t db.working
@nx@ 16 1487964861 @41@ 10 0 0 0 0 @db.working@ @working.txt@ @@ @@ @@

Replica servers automatically unload the same table to the same file when processing these notes. If only a file name is specified with -u, as in the example above, the unload files are created in the P4ROOT directory of both servers. Any relative path specified with -u is relative to P4ROOT. Absolute paths to the unload file can also be used. In the case of relative or absolute path, ensure that any referenced directory paths exist on both master and replica prior to running the unload.

Unloading the tables in this way allows you to compare the contents of the table in a time-consistent fashion. This command is recommended only for tables that are small. The -z flag specifies that the file should be compressed.