Unicode

This section covers specific issues and configuration settings for a Unicode enabled Perforce Helix Core Server and the Jenkins build environment.

Charsets

A Perforce Client Workspace must specify a Charset specific to the platform (Windows, Linux, and so on) when accessing files from a Unicode enabled Helix Server. The Workspace configuration provides a CHARSET dropdown (with a default of 'none' for non-unicode Helix Servers). For more information about Perforce Character sets please refer to our internationalization notes.

UTF8 BOM

The UTF8 charset supports an optional BOM (Byte Order Mark) at the start of the file, represented by the byte sequence 0xEF 0xBB 0xBF. If a file has been added to Perforce with the type set to utf8, then the default behavior is to sync the file with a BOM even if the file did not originally have the BOM sequence in the file.

There is an undocumented Perforce configurable filesys.utf8bom which can be modified to change the default behavior:

filesys.utf8bom  (default 1)    Set to 0 to prevent writing utf8 files BOM
                                Set to 1 to write utf8 files with a BOM
                                Set to 2 to write utf8 BOM only on Windows

For Jenkins to make use of the configurable you need to set the JVM property on the Jenkins Master or Build Slave. For example:

-Dcom.perforce.p4java.filesys.utf8bom=0

For most use-cases you will need set the JVM option for the slave build node:

  1. From the Jenkins dashboard, click Manage Jenkins and click Manage Nodes.
  2. Click New Node or select an existing node and click Configure.
  3. If you selected New Node, enter a Node name, select Permanent Agent, and click OK. Configure the node as required.

    Image of Node Configuration

  4. Click Advanced and add the -D property to the JVM Options.
  5. Image of JVM Options

  6. Click Save.