Parallel checkpointing, dumping, and recovery

The p suboption to -jc, -jd and -jr allows the use of parallel threads for writing to, and reading from, multiple checkpoint files, one per database table.

For example, to specify 4 parallel threads, use the -N numberOfThreads option:

$ p4d -r . -N 4 -jcp cp
Checkpointing files to cp.ckp.9154...

$ p4d -r . -N 4 -jrp cp.ckp.9154
Recovering from cp.ckp.9154...

Although the number of parallel threads is typically controlled by the db.checkpoint.threads configurable, the two examples above with -N numberOfThreads show that the p4d command line can override that value.

The m multifile option is available for:

parallel Checkpoint and journal options. For example, -jcpm [-N numberOfThreads] [-Z | -z] [prefix]
parallel Journal dump and restore filtering. For example, -jdpm [-N numberOfThreads] [-Z | -z] [prefix]

where

the if [-N numberOfThreads] is omitted, the db.checkpoint.threads configurable determines the number of threads to use for checkpoint
-Z compresses the checkpoint, -z compresses both the checkpoint and journal, and if neither -Z nor -z is included, no compression occurs. See Helix Core Server (p4d) Reference.

When the directory argument -jcp, -jdp, or -jrp, is specified as a relative path (does not start with a / or \), the directory is relative to the server root, P4ROOT.

File naming convention within a recovery directory if using parallel option

After a successful checkpoint or dump request, the file for a specific table db.xxx is db.xxx.ckp or db.xxx.ckp.gz will be created in this directory.

For example:

db.archive.ckp.gz
db.archive.ckp.gz.md5
db.archmap.ckp.gz
db.archmap.ckp.gz.md5
...

or, for a non-compressed checkpoint or dump:

db.archive.ckp
db.archive.ckp.md5
db.archmap.ckp
db.archmap.ckp.md5
...

The files with the .md5 suffix contain the MD5 sum of their matching replay file.

When the multifile suboption (m) is specified, the files for a specific table db.xxx is db.xxx_bbbbbbbb.ckp where each has a distinguishing batch number (b).

The batch number always consists of 8 lower case hexadecimal digits.

For example:

...
db.config.ckp.gz
db.config.ckp.gz.md5
...
db.revcx_00000001.ckp.gz
db.revcx_00000001.ckp.gz.md5
db.revcx_00000002.ckp.gz
db.revcx_00000002.ckp.gz.md5
db.revcx_00000003.ckp.gz
db.revcx_00000003.ckp.gz.md5
db.revcx_00000004.ckp.gz
db.revcx_00000004.ckp.gz.md5
db.revcx_00000005.ckp.gz
db.revcx_00000005.ckp.gz.md5
db.revcx_00000006.ckp.gz
db.revcx_00000006.ckp.gz.md5
db.revcx_00000007.ckp.gz
db.revcx_00000007.ckp.gz.md5
db.revcx_00000008.ckp.gz
db.revcx_00000008.ckp.gz.md5
db.revcx_00000009.ckp.gz
db.revcx_00000009.ckp.gz.md5
db.revcx_0000000a.ckp.gz
db.revcx_0000000a.ckp.gz.md5
...
db.locks_00000001.ckp.gz
db.locks_00000001.ckp.gz.md5
db.locks_00000002.ckp.gz
db.locks_00000002.ckp.gz.md5
db.locks_00000003.ckp.gz
db.locks_00000003.ckp.gz.md5
...

The m option might improve performance

When the p suboption is specified for a dump or checkpoint operation, each table is dumped into its own file in the checkpoint directory. At some sites, a few of the tables, such as db.have and db.integed, might be so large that the checkpoint operation needs more time for them than for tables of an average size. The m suboption causes any large table to be split into multiple output files in the checkpoint directory. With multi-threading, these output files are processed in parallel.

Tip

The size of checkpoint data might be smaller with multi-file checkpoints than with single-file checkpoints. This might occur if parallel checkpoints result in better compression ratios.

How to know the parallel checkpoint has completed

You can determine when a checkpoint has completed by using either of these two methods:

We recommend polling the checkpoint history (ckphist) record because this method works whether the checkpoint failed or succeeded. This method was introduced in Helix Core Server version 2023.1.1
Polling for the existence of the md5 file is still supported, but it does not indicate when a checkpoint has failed

Poll for the ckphist record

The syntax to poll the ckphist record is p4d -xj --jnum ckpnum

where ckpnum is the checkpoint number used to name the checkpoint file and directory.

The checkpoint command reports:

$ p4d -jcmp
Checkpointing files to checkpoint.4...

Example shell script:

#!/usr/bin/bash
P4D=$HOME/bin/p4d

# Start the multifile checkpoint request
$P4D -r . -jcmp -N 4 > out &
sleep 1

# Read the output from p4d
out=`cat out`

# Extract the checkpoint number from the p4d output
ckpnum=`expr "$out" : '.*checkpoint\.\([0-9]*\).*'`

# Search for the ckphist record for that checkpoint number
rec=`$P4D -r . -xj --jfield="startDate,endDate,jfile,failed" --jnum "$ckpnum"`

# See what we've got
echo "Found " $rec

Poll for the md5 file

In the same directory that contains the checkpoint or dump directory, look for the consolidated .md5 file. This file is created when the operation has completed, whether the operation was successful or not. In the following example, this file is named checkpoint.3.md5

$ p4d -r . -jcmp -z
Checkpointing files to checkpoint.3...
Rotating journal to journal.2.gz...
$ ls checkpoint.*
drwxrwxr-x 2 perforce perforce 12288 Jun 29 11:11 checkpoint.3
-r--r--r-- 1 perforce perforce  9784 Jun 29 11:11 checkpoint.3.md5
$

The file consists of one line for each checkpoint file created in the checkpoint directory, with each line including the checkpoint file name, MD5 digest, and the epoch time stamp. For example:

MD5 (checkpoint.3/db.config.ckp.gz) = 5A32E66EE638A52F480F476B0B78191E 1688033506 
MD5 (checkpoint.3/db.configh.ckp.gz) = B26E2EBA2E35B5F138792549A585276D 1688033506 
MD5 (checkpoint.3/db.counters.ckp.gz) = D9A5E3CE0728B6206E4A746CB6854994 1688033506 
MD5 (checkpoint.3/db.nameval_00000001.ckp.gz) = 035080F2CDFDB5BE9FC5E9D640CF5ABA 1688033506 
MD5 (checkpoint.3/db.nameval_00000004.ckp.gz) = 7D052BA5C906C7C3087FE49DB4FCD48D 1688033506
...

The ordering of the records is significant. Checkpoint files that were completed first by a parallel thread are at the top of the file.

Prefix for parallel checkpoints

Checkpoint files are placed into a newly-created directory based on the prefix for the checkpoint or dump. Specifying a prefix on the p4d -jcp command overrides the prefix set by the journalPrefix configurable.

Configurables for tuning checkpoints

The values and purpose of the configurables that you can use to tune checkpoints:

Configurable	Default	Min	Max	Use
`db.checkpoint.reqlevel`	`4`	`2`	`20`	Only database files at this level or deeper in the btree are considered to be split into multiple checkpoint files during a checkpoint or dump request.
`db.checkpoint.worklevel`	`3`	`2`	`20`	The page level examined within a database table that supplies the record keys used to split that table during a multifile checkpoint operation.
`db.checkpoint.numfiles`	`10`	`1`	`20000`	Used to determine how many checkpoint files should be generated during a multifile checkpoint operation. This value can be overridden by the `--numfiles` option of `p4 dbstat` command.
`db.checkpoint.threads`	`0`	`0`	`4096`	Maximum number of threads to use in a checkpoint, dump, or recovery. The value must be `2` or greater for a multifile request to split a table. Many factors might affect performance (CPU, memory, disks, controllers, file system, system load), so no simple way exists to determine the best value. Start with a value such as `4`, `6`, or `8`, then monitor processor and I/O performance to determine whether a larger value is appropriate for your system.

How the number of checkpoint files is calculated

The value of numfiles is used to determine the number of keys to generate. The total number of pages found at the worklevel is divided by the "effective numfile" (en) value. The "effective numfile" value is calculated by this formula:

en = n ^{(a - w)}

where

n is the value of db.checkpoint.numfiles

a is the depth of the current database file, and

w is the value of db.checkpoint.worklevel

For example, if a = 5, w = 3, and n = 10,

then 10 ^ ( 5 - 3) which is 10 ^ 2 = 100

so in this case setting db.checkpoint.numfiles to 10 results in 100 "effective numfiles".