November 1, 2007

Anatomy of a Share

Surround SCM

Applies to Surround SCM 2008 and earlier

Improper understanding of Surround SCM file sharing can lead to undesired situations. This article explains more of what actually occurs in the backend database. It should help you understand how Surround SCM shares files and its benefits. [toc] To learn more about how to share files, refer to the Surround SCM Sharing Strategywiki article. The main purpose of Surround SCM is to preserve file history, which is one reason why Surround SCM can be used in heavily regulated industries. This is also the main reason why there is no undo available in some situations or why a user can get into an irreversible situation. Surround SCM will not allow a user to perform any action that compromises the history of a file. Note: In Surround SCM 2009, data is stored in an RDBMS system. Some of the limitations mentioned in this article are not present in Surround SCM 2009.

The Example - How Sharing Works

A simple set up was created to test different scenarios and to display what occurs in these different scenarios. First, a mainline branch named 'Sharing Branch' is created, along with two child repositories, Repo A, and Repo B.

Figure 1 - Sharing Branch

In the database, three folders are created under the mainline:

Figure 2 - Initial Directories

These folders use the same naming convention, and the first eight characters are 'a'. So the first repository that uses the same name as the mainline corresponds to the aaaaaaaa folder. Because Repo A repository was created next, it corresponds to the aaaaaaab folder. These folders are referred to as archive directories.
There is one archive directory assigned to every unique repository path ("Sharing Branch/Repo A", for example). Archive directories contain the repository database files that hold all the meta-data for files in the repository, and for all branches that the repository appears in. The actual file delta data and header data needed to reconstruct every revision of a file are stored in the aaaaaaaa/data sub-directory (referred to as the archive delta directory). The key here is that the archive delta files contain data that is referenced by the metadata for all branches of the mainline. This allows Surround SCM to be able to determine common ancestors across branches, even when files or repositories are renamed or removed or shared in one branch, but not another.
After the repositories are set up, a new file is added to Repository A and immediately shared with Repository B:

Figure 3 - Shared File

A series of corresponding files are added to the folders in the database to match these actions. First, the 'mytextfile.txt' file is added. This is a copy of the file that was added plus some delta information that Surround SCM uses to identify that this is the original version of the file. This file is the archive delta file.

Figure 4 - Repository A Contents in Database

Surround SCM then creates the corresponding 'mytextfile.txt.info', which includes information about the file that helps Surround SCM keep track of all the revisions. This file is the header data file.

Figure 5 - Contents of Info File

When the file was shared, a file named 'mytextfile.txt.linked' was created. This tells Surround SCM where the file was shared to.

Figure 6 - Contents of Linked File

In the folder that corresponds to the repository that the file was shared to, a file named 'mytextfile.txt.link' was created.

Figure 7 - Files in Folder for Repository B

The file has a reference to its ‘base’ (its source). The base of a share is the original file that was shared.

Figure 8 - Contents of Link File

The file that is created in the database in the archive directory for the destination repository is referred to as an archive link. When a share is created in a branch in the Surround SCM repository, Surround creates a corresponding archive link in the underlying archive directories so it knows to keep all the file revisions in the same place, which is needed when doing promotes/rebases between branches containing the repository.
It is important to note that the files in the database may not always have the same name as the file originally added. The name of the file might need some modifications to ensure that the name is portable to other file systems or if the name contains any unicode characters. Surround SCM will also choose a different name if the name would conflict with its naming convention (i.e., if a file named test.info is added).

Breaking Shares

First, here is a high level definition of both types of break shares available in Surround SCM: Local Break Share A local break share refers to when “Break Share” is selected from the context menu of a file. A local break share is local to the branch that is being executed on. If the share exists on other branches, the share will be left intact in other branches. Internally, with a local break share, there is still a single archive object. Surround SCM creates different deltas for each change that occurs in each repository to keep the changes separate. In a local break share, Surround SCM breaks the share in the repository database for the branch in which the break was performed, but the file share and archive link remain in all other branches. The archive link remains because Surround SCM needs all the file delta information in the same place, so it can auto-merge any changes when promoting/rebasing between branches that still contain the unbroken share. Global Break Share A global break share refers to when "Break Share All Branches" is selected from the "Tools">"Administration" menu. A global break share is global because it applies to all branches. Internally, the archive object is copied to the folder for the other repository. The files are also truly separate copies in the database. The global break share breaks the internal archive link by copying the 'archive delta files' to both locations, so they each now have the complete history. Global break share also breaks the 'file share' in all branches. Restoring Shares It is important to understand the difference between the base of a share and the actual share.
  • The base of a share is the file that is shared to another repository location.
  • The share displays the history of its share base. All check ins to the share are really performed on the base. A share has no history of its own.
This is important to understand because, in the examples below, you will see that sometimes destroying the share allows you to restore a broken share. However, in all scenarios, it is the share that is destroyed and not the base of the share. The file remove action behaves differently on the base of a share versus the share and is why some scenarios are irreversible. A user may not be allowed to destroy the archive delta file if it is referenced in other branches, and so the user will not be able create a share over the top of the archive delta files that are in the original 'base location'. A remove on the base of a share does an implicit local break share, and then marks the base as removed. If you 'destroy' the base, you permanently remove its history from the branch and the repository database. However, because the 'broken share' is still in the branch, the 'archive link' remains and the archive delta files in the base location cannot be destroyed.

Single Branch Scenario

Local Break Share In the single branch set up above, 'break share' is selected from the context menu.

Figure 9 - Local Break Share

The file is no longer shared, which means now that any change made in one repository will not be propagated to the other repository. The real question to be answered is what happens in the database? Browsing to the database directory reveals that the files are still linked. It looks just like in the diagrams above.

Figure 10 - Repository Contents After Local Break Share

Restoring the Share To restore the share, the file in Repo B is removed and destroyed. This allows us to re-share the file from Repo A to Repo B. The reason for this is, because by destroying the file in Repo B, it did two things:
  • It removed all contents from the aaaaaaac/data/ folder (data folder for Repo B.
  • It removed the file mytextfile.txt.link from the aaaaaaab/data/ folder.
So basically this returns us back to the point before the file was shared to Repo B.
This behavior is because we are in a single branch scenario. If the file existed in other branches, the files in the data folders would not be removed since they would be needed for those other branches
. The file can now be shared again from Repo A to Repo B. Global Break Share Now we will see what happens when a global break share occurs. Select "Break Share All Branches" from the "Tools" > "Administration" menu. Figure 11 below illustrates the contents of both folders after the global break share action.

Figure 11 - Data Folders' Contents After Global Break Share

Each data folder now contains its own archive copy of the file as well as its own .info file. Restoring the Share To restore the share, the file in Repo B is removed and destroyed. This allows the file to be re-shared from Repo A to Repo B. The reason for this is because, by destroying the file in Repo B, it removed all contents from the aaaaaaac/data/ folder (data folder for Repo B. So basically this returns us back to the point before the file was shared to Repo B.
This behavior is because we are in a single branch scenario. If the file existed in other branches, the files in the data folder would not be removed since they would be needed for the other branches.
The file can now be shared again from Repo A to Repo B.

Multiple Branch Scenario

To illustrate how shares work in a multiple branch scenario, a baseline branch is created. The new branch is named "Sharing 1.x", and it contains both Repo A and Repo B. As illustrated by Figure 12 below, the files are also shared in the new branch.

Figure 12 - New Branch Contents

The data folders in the backend database look the same; no new files are added because of the new branch addition. Remember that creating a new branch simply creates 'pointers' to indicate which deltas belong to which branch. Local Break Share To illustrate how local break share works in a multiple branch scenario, "break share" is selected from the context menu of the file in Repo A on the baseline branch. The result is that the files on the baseline branch are no longer shared, but the files on the Mainline branch are still shared.

Figure 13 - Mainline and Baseline Branches

Restoring the Share To restore the share, the file on the Repo B must be destroyed. This allows the file to be shared again. The reason is because, in the backend database, both files were still pointing to the same archive object. Before the share was broken locally, the link contained a reference to this branch and it used the same delta pointers for the file on both repositories. When the file was broken on this branch, the link remained for this branch, but kept separate delta pointers to keep changes separate. When the file was destroyed, the reference for the file to the specific branch was removed. The share can be restored two ways:
  • Share file from Repo A back to Repo B.
  • Rebase from Mainline. This places the file on Repo B like a new file.
Global Break Share After the share is restored, "Break Share All Branches" is selected from the "Tools" > "Administration" menu while the file was selected in Repo A on the Baseline branch. This broke the share on both branches, as the image below illustrates:

Figure 14 - Branches After Global Break Share

Browsing the database reveals the same results as when the share was broken globally when only the mainline branch existed. Each data folder now contains its own archive object of the file as well as its own .info file (see figure 11 above). Restoring the Share Restoring the share in this scenario is more difficult, and in some cases, impossible. In the previous scenarios, all we had to do was destroy the file from the repository that we want to share the file to. This is no longer sufficient. Suppose we want to restore the share on the baseline branch, so we destroy the file in Repo B on the baseline branch. A subsequent share attempt returns the following error:

Figure 15 - Error Message When Sharing

The error is returned because the file still exists in the mainline branch. Remember, that since we previously did a global break share, there is actually a separate archive object for this file in the Repo B data folder. Surround SCM cannot create a linked file where an actual archive object already exists. In order for this to work, Surround SCM would probably have to take all the deltas from the archive object file in the Repo B data folder and merge them into the corresponding archive object file in the Repo A data folder. Remember that Surround SCM also can not assume that the files started out as the same file, so it would have to use a logic that would work in many different scenarios. Then, after somehow 'merging' the two archive object files, it would have to delete the file from the Repo B data folder and then create a .link file and sort it all out. It would also have to modify the .info file in the Repo A data folder to keep straight which deltas belong to which branch, and also, as in the case of the mainline branch, which deltas belong to which repository. As you can see, this would not be easy to implement and there are many scenarios that it would be hard to assume what would be the 'expected' or 'correct' behavior. So, the next logical step would be to just destroy the file on the mainline branches. In our simple scenario this would work. Completely destroying the file from Repo B on the mainline branch results in a successful share attempt on the baseline branch. The share can then be propagated to the mainline branch by a promote action. Snapshot Branch Scenario However, this is an overly simplified example. Suppose you already created snapshot branches that included this file in Repo B. You would not be able to destroy this file in Repo B because a snapshot branch is read-only. The only recourse is to destroy the snapshot branch. To illustrate this, a snapshot branch is created off the mainline branch, as illustrated below:

Figure 16 - Snapshot Branch

Now that we have a snapshot branch, we repeat the same steps we did above:
  • Globally break share while selecting the file on Repo B in the baseline branch.
  • Destroying the file from Repo B in the baseline branch.
  • Destroying the files from Repo B in the mainline branch.
Reattempting the share results in the same error message shown before:

Figure 17 - Error Message When Sharing

Let's assume destroying the snapshot branch is acceptable. Destroying the snapshot branch and then sharing the file again still results in the same error. This is because destroying a snapshot branch is not the same as destroying a file. There is still an archive object file in the Repo B data folder because Surround SCM cannot assume this is the last branch that had this file and also if Surround SCM was to perform a detail check to see if this was the last branch that contained this file, server performance would suffer. To completely remove the file, the Surround SCM Analyze Utility must be used to compact the database. The utility performs a detailed check of every branch to ensure that this file is not referenced. To run the utility, start the Surround SCM Analyze Utility, then select "File" > "Open Database" and select the mainline branch that you want to compact. This will compact the entire database for this mainline branch. The only setting that has to be selected is "Purge unreferenced file versions from archive storage". After running the utility, the following output excerpt is noted: Directory contains a Surround SCM Mainline database. aaaaaaac/data/mytextfile.txt - was purged. You are now able to share the file. Losing the Share With branching, the only way to preserve the share in the new branch, is if the branch contains the base of the share. In the previous above, the share was preserved because Repo A was included in the new branch. Had it not been included, the file would have lost its share on the new branch. To better illustrate this, let's rewind back to our database before any branch we created. Suppose a child repository named "SubRepoB" is created under Repo B, and the file is shared from Repo B to SubRepo B. The result is illustrated in the following image:

Figure 18 - Sub Repository With Shared File

Note that the base of the share is still pointing to Repo A, even though the file was shared from Repo B to SubRepoB. We now create a branch off Repo B, and name it "Repo B 1.x". Because the root of this branch is Repo B, Repo A will not be included in this branch. Because Repo A is the base of the share, the share is lost as illustrated by the following image:

Figure 19 - Repo B Baseline Branch

Restoring The Share - Don't Try This at Home! Based on the previous methods to restore a share, you might think that one way to restore the share in this scenario is to destroy the file from SubRepo B on both branches. You would then be able to share the file from Repo B to SubRepo B on the baseline branch, right? You could then promote to try to restore the share to SubRepo B on the mainline branch, correct? This is incorrect. While the previous steps may appear to restore the share, it will actually corrupt the file in the database. The problem will only get worse once changes are made on the baseline branch and are later promoted to the mainline branch. The changes will not be propagated to the all repositories. The shares will also point to different repositories as its base and this will prevent Surround SCM from being able to do a get or check out of the file from at least one of the repositories. Note: The issues outlined here have been addressed in 2009. The data corruption issue has been fixed, and in the snapshot branch scenario, it is no longer needed to destroy the snapshot branch to be able to share the file.

Conclusion

Sharing in Surround SCM can be beneficial in managing digital assets. However, it is important to understand how sharing works in the database to prevent problems or unexpected behavior. With the examples illustrated in this article you should now have a basic understanding of how sharing works in the database and understand the difference between a local and global break share. If sharing is something you would like to implement, it is strongly recommended that you do so with Surround SCM 2009.