Tuning Perforce for Performance
Your Perforce server should normally be a light consumer of system resources. As your installation grows, however, you might want to revisit your system configuration to ensure that it is configured for optimal performance.
This chapter briefly outlines some of the factors that can affect the performance of a Perforce server, provides a few tips on diagnosing network-related difficulties, and offers some suggestions on decreasing server load for larger installations.
Tuning for performance
In general, Perforce performs well on any server-class hardware platform. The following variables can affect the performance of your Perforce server.
Memory
Server performance is highly dependent upon having sufficient memory.
Two bottlenecks are relevant. The first bottleneck can be avoided by
ensuring that the server doesn't page when it runs large queries, and
the second by ensuring that the db.rev
table (or at
least as much of it as practical) can be cached in main memory:
-
Determining memory requirements for large queries is fairly straightforward: the server requires about 1 kilobyte of RAM per file to avoid paging; 10,000 files will require 10 MB of RAM.
-
To cache
db.rev
, the size of thedb.rev
file in an existing installation can be observed and used as an estimate. New installations of Perforce can expectdb.rev
to require about 150-200 bytes per revision, and roughly three revisions per file, or about 0.5 kilobytes of RAM per file.
Thus, if there is 1.5 kilobytes of RAM available per file, or 150 MB for 100,000 files, the server does not page, even when performing operations involving all files. It is still possible that multiple large operations can be performed simultaneously and thus require more memory to avoid paging. On the other hand, the vast majority of operations involve only a small subset of files.
For most installations, a system with 1.5 kilobytes of RAM per file in the depot suffices.
Filesystem performance
Perforce is judicious with regards to its use of disk I/O; its metadata is well-keyed, and accesses are mostly sequential scans of limited subsets of the data. The most disk-intensive activity is file check-in, where the Perforce server must write and rename files in the archive. Server performance depends heavily on the operating system's filesystem implementation, and in particular, on whether directory updates are synchronous. Server performance is also highly dependent upon the capabilities of the underlying hardware's I/O subsystem.
Although Perforce does not recommend any specific hardware configuration or filesystem, Linux servers are generally fastest (owing to Linux's asynchronous directory updating), but they may have poor recovery if power is cut at the wrong time. The BSD filesystem (also used in Solaris) is relatively slow but much more reliable. NTFS performance falls somewhere in between these two ranges.
Performance in systems where database and versioned files are stored on NFS-mounted volumes is typically dependent on the implementation of NFS in question or the underlying storage hardware. Perforce has been tested and is supported under the Solaris implementation of NFS.
Under Linux and FreeBSD, database updates over NFS can be an issue because file locking is relatively slow; if the journal is NFS-mounted on these platforms, all operations will be slower. In general (but in particular on Linux and FreeBSD), we recommend that the Perforce database, depot, and journal files be stored on disks local to the machine running the Perforce server process.
These issues affect only the Perforce server process (p4d). Perforce applications, (such as p4, the Perforce Command-Line Client) have always been able to work with client workspaces on NFS-mounted drives (for instance, workspaces in users' home directories).
Disk space allocation
Perforce disk space usage is a function of three variables:
-
Number and size of client workspaces
-
Size of server database
-
Size of server's archive of all versioned files
All three variables depend on the nature of your data and how heavily you use Perforce.
The client file space required is the size of the files that your users will need in their client workspaces at any one time.
The server's database size can be calculated with a fair level of accuracy; as a rough estimate, it requires 0.5 kilobytes per user per file. (For instance, a system with 10,000 files and 50 users requires 250 MB of disk space for the database). The database can be expected to grow over time as histories of the individual files grow.
The size of the server's archive of versioned files depends on the sizes of the original files stored and grows as revisions are added. For most sites, allocate space equivalent to at least three times the aggregate size of the original files.
The db.have
file holds the list of files opened in
client workspaces. This file tends to grow more rapidly than other files
in the database. If you are experiencing issues related to the size of
your db.have
file and are unable to quickly switch
to a server with adequate support for large files, deleting unused
client workspace specifications and reducing the scope of client
workspace views can help alleviate the problem.
Monitoring disk space usage
Use the p4 diskspace command to monitor diskspace usage. By default, p4 diskspace displays the amount of free space, diskspace used, and total capacity of any filesystem used by Perforce.
By default, the Perforce Server rejects commands when free space on the
filesystems housing the P4ROOT
, P4JOURNAL
,
P4LOG
, or TEMP
fall below 10 megabytes. To
change this behavior, set the filesys.P4ROOT.min
(and
corresponding) configurables to your desired limits:
Configurable |
Default Value |
Meaning |
---|---|---|
|
10M |
Minimum diskspace required on server root filesystem before server rejects commands. |
|
10M |
Minimum diskspace required on server journal filesystem before server rejects commands. |
|
10M |
Minimum diskspace required on server log filesystem before server rejects commands. |
|
10M |
Minimum diskspace required for temporary operations before server rejects commands. |
|
10M |
Minimum diskspace required for any depot before server rejects
commands. (If there is less than
|
If the user account that runs the Perforce Server process is subject to
disk quotas, the Server observes these quotas with respect to the
filesys.*.min
configurables, regardless of how much
physical free space remains on the filesystem(s) in question.
To estimate much disk space is currently occupied by specific files in a depot, use the p4 sizes command with a block size corresponding to that used by your storage solution. For example, the command:
p4 sizes -a -s -b 512 //depot/...
shows the sum (-s
) of all revisions
(-a
) in //depot/...
, as
calculated with a block size of 512 bytes.
//depot/... 34161 files 277439099 bytes 5429111 blocks
The data reported by p4 sizes actually reflects the diskspace required when files are synced to a client workspace, but can provide a useful estimate of server-side diskspace consumption.
Network
Perforce can run over any TCP/IP network. Although we have not yet seen network limitations, the more bandwidth the better.
Perforce uses a TCP/IP connection for each client interaction with the
server. The server's port address is defined by P4PORT
,
but the TCP/IP implementation picks a client port number. After the
command completes and the connection is closed, the port is left in a
state called TIME_WAIT
for two minutes. Although the
port number ranges from 1025
to
32767
, generally only a few hundred or thousand can
be in use simultaneously. It is therefore possible to occupy all
available ports by invoking a Perforce command many times in rapid
succession, such as with a script.
By default, idle connections are not kept alive. If your network silently drops idle connections, this behavior may cause unexpected connectivity issues. (Consider a p4 pull thread that transfers data between a master server and a replica at a remote site; depending on each site's respective business hours and user workloads, such a connection could be idle for several hours per day.) Four configurables are available to manage the state of idle connections.
Configurable |
Default Value |
Meaning |
---|---|---|
|
0 |
If non-zero, disable the sending of TCP keepalive packets. |
|
0 |
Idle time (in seconds) before starting to send keepalives. |
|
0 |
Interval (in seconds) between sending keepalive packets. |
|
0 |
Number of unacknowledged keepalives before failure. |
If your network configuration requires keepalive packets, consider
setting net.keepalive.idle
to a suitably long value,
for example 3,600 seconds (1 hour), and an interval measured in tens of
minutes.
CPU
The Perforce versioning service is relatively lightweight in terms of CPU resource consumption; in general, CPU power is not a major consideration when determining the platform on which to install a Perforce server.
Improving concurrency with lockless reads
Prior to Release 2013.3, commands that only read data from the database take a read-lock on one (or more) database tables. Although other commands can read from the tables at the same time, any commands attempting to write to the read-locked tables are forced to wait for the read-lock to complete before writing could begin. Currently, the default behavior is to allow some commands to perform lock-free reads (or "peeks") on these tables, without sacrificing consistency or isolation. This provides significant performance improvement by ensuring that write operations on these tables can run immediately, rather than being held until the read-lock is released.
Note
Lockless reads require that server locks be enabled. Since this can
cause issues for long duration syncs, the default value for controlling
the 'sync' server lock (server.locks.sync
) is
currently disabled by default.
To change the setting of lockless reads on your Perforce Server, use the
p4 configure set
db.peeking=N
command.
Any change to db.peeking
requires a server restart to
take effect.
Possible values for db.peeking
are as follows:
|
Meaning |
---|---|
|
If This corresponds to the behavior of Perforce at release 2013.2 and below. |
|
If This configuration is intended primarily for diagnostic purposes. |
|
If This configuration is expected to provide the best performance results for most sites. It is the default value. |
|
If
This configuration involves a trade-off between concurrency and
command completion speed; in general, if a repository has many
revisions per file, then some commands will complete more slowly
with |
Commands implementing lockless reads
When peeking is enabled, the following commands run lockless:
Command |
Notes |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Applies to files -a |
|
|
|
when |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Applies to sizes -a |
|
when |
|
Applies to print -a |
|
|
|
|
|
The following commands run partially lockless; in most cases these commands will operate lock-free, but lockless operation is not guaranteed:
Command |
Notes |
---|---|
|
|
|
|
|
when |
|
in the context of |
|
in the context of |
|
|
|
when |
Overriding the default behavior
You can override the db.peeking
setting on a
per-command basis by using the -Zpeeking=
flag
followed by your preferred value. For example, to disable peeking for
one command, run the following command:
p4 -Zpeeking=1 fstat
and compare the results with:
p4 -Zpeeking=2 fstat
Measuring the effectiveness of lockless reads
To determine whether read locks are impacting performance (and the
extent to which enabling lockless reads has improved performance), you
can examine the server logs, or you can use the
-Ztrack
flag to output, for any given command,
the lines that would be written to the P4LOG
. For
example:
p4 -Zpeeking=1 -Ztrack sync
produces output for 11 database tables. The relevant lines here are
those that refer to "locks read/write
".
... --- db.counters --- pages in+out+cached 3+0+2 --- locks read/write 1/0 rows get+pos+scan put+del 1+0+0 0+0 --- db.user --- pages in+out+cached 3+0+2 --- locks read/write 1/0 rows get+pos+scan put+del 1+0+0 0+0 ...
The 1
appearing in ("locks read/write
1/0
") every table's locking results shows one read lock taken
per table. By contrast, the diagnostic output from:
p4 -Zpeeking=2 -Ztrack sync
... --- db.counters --- pages in+out+cached 3+0+2 --- locks read/write 0/0 rows get+pos+scan put+del 1+0+0 0+0 ...
shows that the sync operation completed without any read or write locks
required on db.counters
(if you try it yourself, on
many other tables); when peeking is enabled, many commands will show
read/write 0/0
locks (or at least, fewer locks)
taken.
Side-track servers must have the same db.peeking level
A single Perforce instance can detect and ignore inadvertent attempts to
override db.peeking
that would change table locking
order and risk deadlock. (For example, if you attempt to use
db.peeking=3
on a server for which peeking is
disabled by having db.peeking
set to 0 (or unset),
the service ignores the attempt altogether and the command proceeds with
the old behavior.
In the case of "side-track servers" described in the following Knowledge Base article:
http://answers.perforce.com/articles/KB_Article/Setting-Up-a-Side-track-Server
this protection is not available.
Warning
All side-track servers must have the same
db.peeking
setting as the main server. Server
deadlock may result.
Diagnosing slow response times
Perforce is normally a light user of network resources. Although it is possible that an extremely large user operation could cause the Perforce server to respond slowly, consistently slow responses to p4 commands are usually caused by network problems. Any of the following can cause slow response times:
-
Misconfigured domain name system (DNS)
-
Misconfigured Windows networking
-
Difficulty accessing the p4 executable on a networked file system
A good initial test is to run p4 info. If this does not respond immediately, then there is a network problem. Although solving network problems is beyond the scope of this manual, here are some suggestions for troubleshooting them.
Hostname vs. IP address
Try setting P4PORT
to the service's IP address instead of
its hostname. For example, instead of using:
P4PORT=host.domain:1666
try using:
P4PORT=1.2.3.4:1666
with your site-specific IP address and port number.
On most systems, you can determine the IP address of a host by invoking:
ping hostname
If p4 info responds immediately when you use the IP address, but not when you use the hostname, the problem is likely related to DNS.
Windows wildcards
In some cases, p4 commands on Windows can result in a delayed response if they use unquoted filepatterns with a combination of depot syntax and wildcards, such as:
p4 files //depot/*
You can prevent the delay by putting double quotes around the file pattern, like this:
p4 files "//depot/*"
The cause of the problem is the p4 command's use of a
Windows function to expand wildcards. When quotes are not used, the
function interprets //depot
as a networked computer
path and spends time in a futile search for a machine named
depot
.
DNS lookups and the hosts file
On Windows, the
%SystemRoot%\system32\drivers\etc\hosts
file can be
used to hardcode IP address-hostname pairs. You might be able to work
around DNS problems by adding entries to this file. The corresponding
UNIX file is /etc/hosts
.
Location of the p4 executable
If none of the above diagnostic steps explains the sluggish response time, it's possible that the p4 executable itself is on a networked file system that is performing very poorly. To check this, try running:
p4 -V
This merely prints out the version information, without attempting any network access. If you get a slow response, network access to the p4 executable itself might be the problem. Copying or downloading a copy of p4 onto a local filesystem should improve response times.
Working over unreliable networks
To set a hard upper bound on how long a connection is willing to wait on
any single network read or write, set the net.maxwait
configurable to the number of seconds to wait before disconnecting with
a network error. Users working over unreliable connections can set
net.maxwait
value either in their
P4CONFIG
files, or use
-vnet.maxwait=
on a
per-command basis, where t
t
is the number of
seconds to wait before timing out.
Note
Although net.maxwait
can be set on the Perforce
server, it is generally inadvisable to do so.For example, if
net.maxwait
is set to 60 on the server, users of
the Command-Line Client must complete every interactive form within
one minute before the command times out. If, however, individual
users set net.maxwait
in their own
P4CONFIG
files (which reside on their own workstations)
their connections are not subject to this limitation; commands only
fail if the versioning service takes more than 60 seconds to respond
to their requests.
It is useful to combine net.maxwait
with the
-rN
global option, where
N
is the number of times to attempt
reconnection in the event that the network times out. For example:
p4 -r3 -vnet.maxwait=60 sync
attempts to sync the user's workspace, making up to three attempts to resume the sync if interrupted. The command fails after the third 60-second timeout.
Because the format of the output of a command that times out and is
restarted cannot be guaranteed (for example, if network connectivity is
broken in the middle of a line of output), avoid the use of
-r
on any command that reads from standard input.
For example, the behavior of the following command, which reads a list
of files from stdin and passes it to p4 add, can
result in the attempted addition of "half a filename" to the depot.
find . -print | p4 -x - -r3 add
To prevent this from happening (for example, if adding a large number of files over a very unreliable connection), consider an approach like the following:
find directoryname
-type f -exec p4
-r5 -vmax.netwait=60 add {} \;
All files (-type f
) in
directoryname
are found, and added one at a
time, by invoking the command "p4 -r5 -vmax.netwait=60
add" for each file individually.
After all files have been added, assign the changelist a changelist number with p4 change, and submit the numbered atomically with:
p4 -r5 -vmax.netwait=60 submit -c
changenum
If connectivity is interrupted, the numbered changelist submission is resumed.
Preventing server swamp
Generally, Perforce's performance depends on the number of files a user tries to manipulate in a single command invocation, not on the size of the depot. That is, syncing a client view of 30 files from a 3,000,000-file depot should not be much slower than syncing a client view of 30 files from a 30-file depot.
The number of files affected by a single command is largely determined by:
-
p4 command-line arguments (or selected folders in the case of GUI operations)
Without arguments, most commands operate on, or at least refer to, all files in the client workspace view.
-
Client views, branch views, label views, and protections
Because commands without arguments operate on all files in the workspace view, it follows that the use of unrestricted views and unlimited protections can result in commands operating on all files in the depot.
When the server answers a request, it locks down the database for the duration of the computation phase. For normal operations, this is a successful strategy, because the server can "get in and out" quickly enough to avoid a backlog of requests. Abnormally large requests, however, can take seconds, sometimes even minutes. If frustrated users press CTRL+C and retry, the problem gets even worse; the server consumes more memory and responds even more slowly.
At sites with very large depots, unrestricted views and unqualified commands make a Perforce server work much harder than it needs to. Users and administrators can ease load on their servers by:
-
Using "tight" views
-
Assigning protections
-
Limiting
maxresults
-
Limiting simultaneous connections with
server.maxcommands
-
Unloading infrequently-used metadata
-
Writing efficient scripts
-
Using compression efficiently
-
Other server configurables
Using tight views
The following "loose" view is trivial to set up but could invite trouble on a very large depot:
//depot/... //workspace/...
In the loose view, the entire depot was mapped into the client workspace; for most users, this can be "tightened" considerably. The following view, for example, is restricted to specific areas of the depot:
//depot/main/srv/devA/... //workspace/main/srv/devA/... //depot/main/drv/lport/... //workspace/main/dvr/lport/... //depot/rel2.0/srv/devA/bin/... //workspace/rel2.0/srv/devA/bin/... //depot/qa/s6test/dvr/... //workspace/qa/s6test/dvr/...
Client views, in particular, but also branch views and label views, should also be set up to give users just enough scope to do the work they need to do.
Client, branch, and label views are set by a Perforce administrator or by individual users with the p4 client, p4 branch, and p4 label commands, respectively.
Two of the techniques for script optimization (described in Using branch views and The temporary client workspace trick) rely on similar techniques. By limiting the size of the view available to a command, fewer commands need to be run, and when run, the commands require fewer resources.
Assigning protections
Protections (see “Administering Perforce: Protections”) are actually another type of Perforce view. Protections are set with the p4 protect command and control which depot files can be affected by commands run by users.
Unlike client, branch, and label views, however, the views used by protections can be set only by Perforce superusers. (Protections also control read and write permission to depot files, but the permission levels themselves have no impact on server performance.) By assigning protections in Perforce, a Perforce superuser can effectively limit the size of a user's view, even if the user is using "loose" client specifications.
Protections can be assigned to either users or groups. For example:
write user sam * //depot/admin/... write group rocketdev * //depot/rocket/main/... write group rocketrel2 * //depot/rocket/rel2.0/...
Perforce groups are created by superusers with the p4
group command. Not only do they make it easier to assign
protections, they also provide useful fail-safe mechanisms in the form
of maxresults
and maxscanrows
,
described in the next section.
Limiting database queries
Each Perforce group has an associated maxresults,
maxscanrows, and maxlocktime
value. The default for each is unset
, but a superuser
can use p4 group to limit it for any given group.
MaxResults prevents the server from using excessive memory by limiting
the amount of data buffered during command execution. Users in limited
groups are unable to run any commands that buffer more database rows
than the group's MaxResults
limit. (For most sites,
MaxResults
should be larger than the largest number
of files anticipated in any one user's individual client workspace.)
Like MaxResults
, MaxScanRows
prevents certain user commands from placing excessive demands on the
server. (Typically, the number of rows scanned in a single operation is
roughly equal to MaxResults
multiplied by the average
number of revisions per file in the depot.)
Finally, MaxLockTime
is used to prevent certain
commands from locking the database for prolonged periods of time. Set
MaxLockTime
to the number of milliseconds for the
longest permissible database lock.
To set these limits, fill in the appropriate fields in the p4
group form. If a user is listed in multiple groups, the
highest of the MaxResults
(or
MaxScanRows
, or MaxLockTime
)
limits (including unlimited
, but
not including the default unset
setting) for those groups is taken as the user's
MaxResults
(or MaxScanRows
, or
MaxLockTime
) value.
Example 28. Effect of setting maxresults, maxscanrows, and maxlocktime.
As an administrator, you want members of the group
rocketdev
to be limited to operations of 20,000
files or less, that scan no more than 100,000 revisions, and lock
database tables for no more than 30 seconds:
Group: rocketdev MaxResults: 20000 MaxScanRows: 100000 MaxLockTime: 30000 Timeout: 43200 Subgroups: Owners: Users: bill ruth sandy
Suppose that Ruth has an unrestricted (loose) client view. She types:
p4 sync
Her sync command is rejected if the depot contains more than 20,000 files. She can work around this limitation either by restricting her client view, or, if she needs all of the files in the view, by syncing smaller sets of files at a time, as follows:
p4 sync //depot/projA/...
p4 sync //depot/projB/...
Either method enables her to sync her files to her workspace, but without tying up the server to process a single extremely large command.
Ruth tries a command that scans every revision of every file, such as:
p4 filelog //depot/projA/...
If there are fewer than 20,000 revisions, but more than 100,000
integrations (perhaps the projA
directory
contains 1,000 files, each of which has fewer than 20 revisions and
has been branched more than 50 times), the
MaxResults
limit does not apply, but the
MaxScanRows
limit does.
Regardless of which limits are in effect, no command she runs will be
permitted to lock the database for more than the
MaxLockTime
of 30,000 milliseconds.
To remove any limits on the number of result lines processed (or
database rows scanned, or milliseconds of database locking time) for a
particular group, set the MaxResults
or
MaxScanRows
, or MaxLockTime
value
for that group to unlimited
.
Because these limitations can make life difficult for your users, do not
use them unless you find that certain operations are slowing down your
server. Because some Perforce applications can perform large operations,
you should typically set MaxResults
no smaller than
10,000, set MaxScanRows
no smaller than 50,000, and
MaxLockTime
to somewhere within the 1,000-30,000
(1-30 second) range.
For more information, including a comparison of Perforce commands and the number of files they affect, type:
p4 help maxresults
p4 help maxscanrows
p4 help maxlocktime
from the command line.
MaxResults, MaxScanRows and MaxLockTime for users in multiple groups
As mentioned earlier, if a user is listed in multiple groups, the
highest numeric MaxResults
limit of all the groups
a user belongs to is the limit that affects the user.
The default value of unset
is
not a numeric limit; if a user is in a group
where MaxResults
is set to
unset
, he or she is still limited by the highest
numeric MaxResults
(or
MaxScanRows
or MaxLockTime
)
setting of the other groups of which he or she is a member.
A user's commands are truly unlimited only when the user belongs to no
groups, or when any of the groups of which the user is a member have
their MaxResults
set to
unlimited
.
Limiting simultaneous connections
If monitoring is enabled (p4 configure set monitor=1
or higher), you can set the server.maxcommands
configurable to limit the number of simultaneous command requests that
the service will attempt to handle.
Ideally, this value should be set low enough to detect a runaway script or denial of service attack before the underlying hardware resources are exhausted, yet high enough to maintain a substantial margin of safety between the typical average number of connections and your site's peak activity.
If P4LOG
is set, the server log will contain lines of the
form:
Server is now using
nnn
active threads.
You can use the server log to determine what levels of activity are
typical for your site. As a general guideline, set
server.maxcommands
to at least
200-500% of your anticipated peak activity.
Unloading infrequently-used metadata
Over time, a Perforce server accumulates metadata associated with old
projects that are no longer in active development. On large sites,
reducing the working set of data, (particularly that stored in the
db.have
and db.labels
tables)
can significantly improve performance.
Create the unload depot
To create an unload depot named //unload
, enter
p4 depot unload, and fill in the resulting form as
follows:
Depot: unload Type: unload Map: unloaded/...
In this example, unloaded metadata is stored in flat files in the
/unloaded
directory beneath your server root
(that is, as specified by the Map:
field).
After you have created the unload depot, you can use p4 unload and p4 reload to manage your installation's handling of workspace and label-related metadata.
Unload old client workspaces, labels, and task streams
The p4 unload command transfers infrequently-used
metadata from the versioning engine's db.*
files
to a set of flat files in the unload depot.
Individual users can use the -c
,
-l
, and -s
flags to
unload client workspaces, labels, or task streams that they own. For
example, maintainers of build scripts that create one workspace and/or
label per build, particularly in continuous build environments, should
be encouraged to unload the labels after each build:
p4 unload -c oldworkspace
p4 unload -l oldlabel
Similarly, developers should be encouraged to unload (p4
unload -s oldtaskstream
) or
delete (p4 stream -d
oldtaskstream
) task streams after
use.
To manage old or obsolete metadata in bulk, administrators can use the
-a
, -al
, or
-ac
flags in conjunction with the -d
and/or date
-u
flags to unload all labels
and workspaces older than a specific user
date
,
owned by a specific user
, or both.
By default, only unlocked labels or workspaces are unloaded; use the
-L
flag to unload locked labels or workspaces.
To unload or reload a workspace or label, a user must be able to scan
all the files in the workspace's have list and/or
files tagged by the label. Set MaxScanrows
and
MaxResults
high enough (see
MaxResults, MaxScanRows and MaxLockTime for users in multiple groups) that users do not need to ask for
assistance with p4 unload or p4
reload operations.
Accessing unloaded data
By default, Perforce commands such as p4 clients,
p4 labels, p4 files, p4
sizes, and p4 fstat ignore unloaded
metadata. Users who need to examine unloaded workspaces and labels (or
other unloaded metadata) can use the -U
flag
when using these commands. For more information, see the
P4
Command Reference.
Reloading workspaces and labels
If it becomes necessary to restore unloaded metadata back into the
db.have
or db.labels
table,
use the p4 reload command.
Scripting efficiently
The Perforce Command-Line Client, p4, supports the scripting of any command that can be run interactively. The Perforce server can process commands far faster than users can issue them, so in an all-interactive environment, response time is excellent. However, p4 commands issued by scripts — triggers, review daemons, or command wrappers, for example — can cause performance problems if you haven't paid attention to their efficiency. This is not because p4 commands are inherently inefficient, but because the way one invokes p4 as an interactive user isn't necessarily suitable for repeated iterations.
This section points out some common efficiency problems and solutions.
Iterating through files
Each Perforce command issued causes a connection thread to be created and a p4d subprocess to be started. Reducing the number of Perforce commands your script runs is the first step to making it more efficient.
To this end, scripts should never iterate through files running Perforce commands when they can accomplish the same thing by running one Perforce command on a list of files and iterating through the command results.
For example, try a more efficient approach like this:
for i in `p4 diff2 path1/... path2/...` do [process diff output] done
Instead of an inefficient approach like:
for i in `p4 files path1/...` do p4 diff2 path1/$i path2/$i [process diff output] done
Using list input files
Any Perforce command that accepts a list of files as a command-line argument can also read the same argument list from a file. Scripts can make use of the list input file feature by building up a list of files first, and then passing the list file to p4 -x.
For example, if your script might look something like this:
for components in header1 header2 header3 do p4 edit ${component}.h done
A more efficient alternative would be:
for components in header1 header2 header3 do echo ${component}.h >> LISTFILE done p4 -x LISTFILE edit
The -x
flag
instructs p4 to read arguments, one per line, from
the named file. If the file is specified as file
-
(a
dash), the standard input is read.
By default, the server processes arguments from -x
in batches of 128
arguments at a a time; you can change the number of arguments
processed by the server by using the file
-b
flag to pass
arguments in different batch sizes.
batchsize
Using branch views
Branch views can be used with p4 integrate or p4 diff2 to reduce the number of Perforce command invocations. For example, you might have a script that runs:
p4 diff2 pathA/src/... pathB/src/...
p4 diff2 pathA/tests/... pathB/tests/...
p4 diff2 pathA/doc/... pathB/doc/...
You can make it more efficient by creating a branch view that looks like this:
Branch: pathA-pathB View: pathA/src/... pathB/src/... pathA/tests/... pathB/tests/... pathA/doc/... pathB/doc/...
...and replacing the three commands with one:
p4 diff2 -b pathA-pathB
Limiting label references
Repeated references to large labels can be particularly costly. Commands that refer to files using labels as revisions will scan the whole label once for each file argument. To keep from hogging the Perforce server, your script should get the labeled files from the server, and then scan the output for the files it needs.
For example, this:
p4 files path/...@label | egrep "path/f1.h|path/f2.h|path/f3.h"
imposes a lighter load on the Perforce server than either this:
p4 files path/f1.h@label path/f1.h@label path/f3.h@label
or this:
p4 files path/f1.h@label
p4 files path/f2.h@label
p4 files path/f3.h@label
The "temporary client workspace" trick described below can also reduce the number of times you have to refer to files by label.
On large sites, consider unloading infrequently-referenced or obsolete labels from the database. See Unloading infrequently-used metadata.
The temporary client workspace trick
Most Perforce commands can process all the files in the current workspace view with a single command-line argument. By making use of a temporary client workspace with a view that contains only the files on which you want to work, you might be able to reduce the number of commands you have to run, or to reduce the number of file arguments you need to give each command.
For instance, suppose your script runs these commands:
p4 sync pathA/src/...@label
p4 sync pathB/tests/...@label
p4 sync pathC/doc/...@label
You can combine the command invocations and reduce the three label scans to one by using a client workspace specification that looks like this:
Client: XY-temp View: pathA/src/... //XY-temp/pathA/src/... pathB/tests/... //XY-temp/pathB/tests/... pathC/doc/... //XY-temp/pathC/doc/...
Using this workspace specification, you can then run:
p4 -c XY-temp sync @label
Using compression efficiently
By default, revisions of files of type binary
are
compressed when stored on the Perforce server. Some file formats (for
example, .GIF and .JPG images, .MPG and .AVI media content, files
compressed with .gz
compression) include
compression as part of the file format. Attempting to compress such
files on the Perforce server results in the consumption of server CPU
resources with little or no savings in disk space.
To disable server storage compression for these file types, specify such
files as type binary+F
(binary, stored on the server
in full, without compression) either from the command line or from the
p4 typemap table.
For more about p4 typemap, including a sample typemap table, see Defining filetypes with p4 typemap.
Other server configurables
The Perforce server has many configurables that may be changed for performance purposes.
A complete list of configurables may be found by running p4 help configurables.
Parallel processing
When syncing your workspace, depending on the number and size of files
being transferred, the p4 synccommand might take a long
time to execute. You can speed up processing by having this command
transfer files using multiple threads. You do this by setting the
net.parallel.max
configuration variable to a value
greater than one and by using the --parallel
option
to the p4 sync command. Parallel processing is most
effective with long-haul, high latency networks or with other network
configuration that prevents the use of available bandwidth with a single
TCP flow. Parallel processing might also be appropriate when working with
large compressed binary files, where the client must perform substantial
work to decompress the file.
See the description of the p4 sync command in the P4 Command Reference for additional information.
Checkpoints for database tree rebalancing
Perforce's internal database stores its data in structures called Bayer trees, more commonly referred to as B-trees. While B-trees are a very common way to structure data for rapid access, over time, the process of adding and deleting elements to and from the trees can eventually lead to imbalances in the data structure.
Eventually, the tree can become sufficiently unbalanced that performance
is negatively affected. The Perforce checkpoint and restore processes (see
Backup and recovery concepts) re-create the trees in a balanced manner, and
consequently, you might see some increase in server performance following
a backup, a removal of the db.*
files, and the
re-creation of the db.*
files from a checkpoint.
Rebalancing the trees is normally useful only if the database files have become more than about 10 times the size of the checkpoint. Given the length of time required for the trees to become unbalanced during normal Perforce use, we expect that the majority of sites will never need to restore the database from a checkpoint (that is, rebalance the trees) for performance reasons.
The 2013.3 release of Perforce involves a change to the B-trees that
extends the limit on db.*
file size to 16TB in
length, as well as other significant scalability improvements.
The changes to the B-trees between Perforce 2013.2 and 2013.3 require that any upgrade that crosses this release boundary must be performed by taking a checkpoint with the older release and restoring that checkpoint with the newer release. See Upgrading p4d - between 2013.2 and 2013.3 for details.