Lycos

Mission Impossible:
Distributed Web Content Publishing at Lycos

Authors:
David Markley
Scott Money

August 1, 2000

Rev 1.1

  1. Abstract
  2. This paper discusses how some of the challenges of publishing web content for Lycos were met through Perforce. A brief history of web publishing at Lycos will be given, including the special challenges that are faced in such a large, distributed system. The current publishing system will then be presented; including the Java based tools built around Perforce.

  3. Introduction
  4. So here's the problem. Lycos needed to have content editors at various locations add and change content without stepping on each others changes and get that content through QA and out to our front-end web servers in under 5 minutes. We also needed to be able to roll back that content to any particular point in time and track what changes had been made. Not possible, you say? Wrapped around Perforce, we have developed a set of Java based tools that support this web-publishing scenario.

  5. Problem History
  6. In the beginning, DOD created the Internet. And the Internet was without form, and void; and darkness was upon the face of the user... The early days of the web were the early days of Lycos.

    Before most of the nifty web tools and languages available today were developed, Lycos was quickly becoming a major portal to the Internet. This brought about a lot of fast-paced, creative development within Lycos to solve its site management problems. One of the major problems facing Lycos was how to let multiple editors get their content and template changes into live service.

    There was, of course, the option of letting all the editors work directly within some shared file system. In this case, each front-end web server would simply serve its content directly from that file system. This would have been a reasonable option, if performance weren't such a major issue.

    Another option was to have editors work on their content locally and then move their content to a central repository. From this repository, content could then be copied out to the front-end web servers on command. In this way, each front-end would contain a local copy of the content and the needs for performance would be met. This, in a more complex form, is the option that Lycos originally chose to use.

    There are problems that arose over time with this system, however. Users could potentially work on the same piece of content and overwrite each others work. It also became clear that the copying of content from that central repository was not scalable. There was a process in place for determining what files had actually changed before copying the content, but as the number of front-end machines increased, so did the time it took to get content changes "live."

  7. Enter Perforce
  8. At about this time, Perforce was introduced at Lycos as a replacement to VSS and CVS. Having a very small client that doesn't run all the time and having a workspace that keeps user files local, it appeared that Perforce might prove useful as a major part of the solution to Lycos' content publishing mission.

    Users would be able to work on local copies of their content and check that content into Perforce with less chance of simply overwriting another user's work. Even if they did overwrite another user's work, a history of that work would be preserved.

    Once content was checked in, it could be verified on a development web server and then moved along for QA to review through movement between two Perforce branches. Likewise, once QA had reviewed the changes on a QA web server they could move the content to live service by branching those changes to another Perforce branch. This is the basic process for publishing used today at Lycos.

    This and other basic Web Content Management solutions were described well in

    Laura Wingerd's presentation:

    Web author workspaces<->DEPOT<->Web site sorkspace<->Web server

  9. Not All Content
  10. One caveat that must be made is that extremely dynamic content does not necessarily fit well within this model of content publishing. There are other mechanisms available to better handle such information sources.

    The difference between extremely dynamic and rapidly changing content is somewhat arbitrary. Some of the pages that are processed through our Perforce based system are changing three or more times a day. Extremely dynamic content might be defined as content that can be updated multiple times per minute throughout the day.

  11. More Issues of Scale
  12. While Perforce solved many of the issues surrounding Lycos' publishing mission, there were some additional issues of scale. Lycos now has content developers across the country and must replicate its content to front-end web servers at multiple locations. This is the overall structure of our portion of the Lycos content publishing system:

    Server replication at Lycos

  13. The Tools
  14. What was needed to make use of Perforce to publish our content was a set of tools that would operate on any platform and allow us to control the flow of content. The two main components of this toolkit would be a reviewer that monitored changes made to the Perforce depot and a robot that would be capable of updating its client workspaces on command.

    1. P4Reviewer
    2. The P4Reviewer should be familiar to anyone that has received e-mail from one of the other reviewers available from Perforce's supporting applications page. At its core, it is still simply a daemon that monitors changes made to the Perforce depot and takes action on those changes. Our P4Reviewer also sends mail to users that have registered paths in the depot within their "Reviews:" criteria. The P4Reviewer pays special attention to the user names it handles, however.

      When the P4Reviewer identifies a "robot" user, it enters a process of determining what web servers or collection of web servers need to be informed of the change. Instead of sending e-mail, the P4Reviewer will send an XML message to the appropriate P4Robots to tell them to update to the correct change.

    3. P4Robot

    The P4Robot is basically a listener. It will listen for XML messages from the P4Reviewer and react accordingly. It also updates the P4Reviewer when the message sent has been acted upon. This update is important for the P4Reviewer, since it must ensure that all the P4Robots that need to make a change have done so successfully.

    The P4Robot can maintain one or more client workspaces. In our environment, we have them maintain two workspaces. One workspace is always "active." The active workspace is where updates sent from the P4Reviewer are made. The other workspace is "live." The live workspace contains the content that is currently being served by the web server. After all P4Robots have acknowledged an update to their active areas, the P4Reviewer will have them swap areas in order to push that update live.

  15. Java P4 Package
  16. At the core of our tools is a Java P4 Package that contains wrapper classes for P4 objects. This set of wrapper objects has proven invaluable to us in our tool development. They started out as part of a package that included the P4Reviewer and P4Robot, but it was clear that they were better off as their own package for general use in other tools.

    We have been able to use this P4 Package to act as a programmatic interface for several other applications, including:

    1. P4Upload
    2. This utility allows users to upload documents they have open for add or edit to any web server that will allow an HTTP PUT. The primary use for this tool is by content developers who want to preview their content on a web server before checking in their changes. This reduces the number of changes checked into P4 and improves productivity.

      P4Upload screenshot

    3. P4Web

    Since the P4 Package is Java based, it was possible to make direct use of its classes in development of a JavaServer Page (JSP) interface. Through this interface, users can browse the depot, view all the common P4 objects and attributes, integrate changes, edit files, and monitor P4Reviewer and P4Robot status.

    The main page of this interface prompts the user to look at their changes in a particular section of the depot.

    P4 Package Web interface

    The login prompts the user for their Perforce username and password. It is important to note that the server we are using here is set up with a Secure Socket Layer (SSL) certificate.

    The user is also allowed to pick the stylesheet that they want to use throughout their session. The entire site uses this stylesheet, so it is possible to change things like the awful color scheme.

    P4 Package Web login

    As you can see by this list of changes, it is also possible to see many other objects in the Perforce system.

    P4 Package Web P4 Changes

    There are other systems available to browse the depot, but ours will also allow the user to edit text files through the web. Most of our users also like being able to see the modification time of the files in this way.

    P4 Package Web P4 Tree

  17. Conclusion

Perforce has proven to be an extremely powerful tool in publishing web content. The same excellence that can be found in its traditional SCM role is the reason for its success in this other critical mission.