October 15, 2014

How to Use the P4Search API

Healthcare
MERGE User Conference
Helix Swarm

p4search api

At the MERGE 2014 conference, I had the pleasure of presenting with Sven Erik Knop on the topic of P4Search, our popular new web service for content-based searches across your Perforce environment. Our talk explained the inner workings of P4 Search, its setup and applications, and explored ideas on how to extend this great and essential tool. You'll find our presentation here.

Our presentation was limited to 30 minutes, and there were many interesting topics we could not get to. One such topic was the P4Search API, which I'll elaborate a bit on here.

A brief overview of the P4Search API

P4Search indexes file revisions stored in the Perforce Server. “Classic” users of Perforce may find it easy to get to stored assets using a search index; however, the user community at large who would consume P4Search is likely to be less technical.

The Perforce user that produces assets typically knows where his files are located and will be less reliant on flexible search capabilities. For example, a specialized designer who creates an asset that will be consumed, but not edited, by a larger audience of designers and marketers via an intranet portal. Having search capabilities available in Perforce client programs like P4 CLI and P4V is not going to be very helpful to the consumers. In order to open up search to a larger audience, there is a RESTful API for P4Search.

What does P4Search return?

Ultimately, users want to get access to the content stored in the Perforce Server. The desired search result boils down to the identifier of a depot file revision and some closely associated attributes. The result is formatted in web-friendly fashion, currently JSON (Java Script Object Notation), and so an example search result looks like this:

[{
"type":"text",
"time":"1194548486",
"action":"edit",
"rev":"3",
"depotFile":"//depot/Talkhouse/rel1.5/com/walkerbros/common/estuff/EBolt.java","change":"1980"
}]

Once you extract and concatenate depotFile and rev, you have all you need to fuel p4 print or to prefix it with a Swarm URL to get to the content you want.

Exploring RESTful APIs

In a Perforce world, users feel comfortable using tools like P4 CLI or grep, awk and sed to parse and manipulate text. When working with RESTful APIs, however, talking to HTTP web services is as important as dealing with JSON-formatted data. In a terminal, curl is a pretty fundamental tool that talks HTTP that you likely already have installed. You can make your extended curl life a little easier with resty, a neat REST client for Bash. And dealing with JSON gets easier with Jsawk. As the name suggests, it provides awk-like functionality combined with some JavaScript. The latter requires a JavaScript interpreter in the terminal. This is what SpiderMonkey does. If you don’t have these tools already, do yourself a favor and install them.

Prepare resty to talk to P4Search

First, you need to source and tell resty which host to talk to and other parameters to pass on to curl: . resty http://:/api -H "Content-Type: application/json"

Running just resty with no parameters should give you something like:

http://perforce.p4demo.com:8088/api*

Next, you can check to see whether P4Search is accessible:

$ resty HEAD /search
HTTP/1.1 200 OK
Date: Sun, 17 Aug 2014 13:59:18 GMT
Content-Language: en-US
Content-Type: application/json;charset=UTF-8
Pragma: no-cache
Cache-Control: no-cache, no-store, max-age=0
Expires: Thu, 01 Jan 1970 00:00:00 GMT
Content-Length: 55
Server: Jetty(8.1.14.v20131031)

Prepare to search

Accessing the search index via P4Search’s search controller requires Perforce authentication, as users are not supposed to see search results for content they aren’t allowed to read. For a successful search, you need the user’s id and password or, even better, a login ticket. A call to p4 login –p –a can return this information.

The first search

A search is a POST to the P4Search API with the query data encoded as JSON data. The most basic search can be fired off like this.

resty POST /search '{"userId":"Joe_Coder", "ticket":"068E2C45060AA13A1706CFF5CF986F23", "queryRaw":"merge2014"}'

In this example “merge2014” is the text to search for and it should return a result. Success is indicated by a status message of OK and a code 200 very much like a successful HTTP return:

{"status":{"message":"OK","code":"200"},"payload":[...]}

The payload is an array that holds the JSON-encoded result data as described above. You can and eventually should limit the resultset by adding a rowCount sibling to the queryRaw element in the request.

Access search results

This is where you can make use of Jsawk. First figure out whether you have a successful search. Simply pipe the result of the call to resty into a call to jsawk and write some simple JavaScript.

resty ... | jsawk 'if (this.status.code==200) return "Success"; else return "Failure"'

Instead of retuning the word “Success” you can also return the payload and thus getting rid of the status message moving forward.

resty ... | jsawk 'if (this.status.code==200) return this.payload; else return null'

You will get a JSON array that is nice to use in web applications. For the purpose of this article you pipe it further into yet another Jsawk call:

resty ... | jsawk '...'| jsawk -n 'out(this.depotFile + "#" + this.rev)'

in order to see something familiar on screen:

//depot/Talkhouse/rel1.0/com/walkerbros/common/widget/ENut.java#5

More sophisticated queries

By now, you have mastered accessing P4Search results and it’s time to explore some more advanced query options.

Adding depot paths

You can add an array of depot paths to the JSON object posted to the /search endpoint of P4Search. A file revision is only returned in the result if it is located in either of the given paths in depot path syntax.

Adding Solr fields

Indexed search fields defined by the Solr schema can be used in the request, too. They need to be provided in an additional array named searchFields with the elements field and value. One common field to use would be “headrevision”. It is either true or false depending on whether the given file revision is at head or not. By default every search with P4Search will add headrevision:true to the query in order to limit the result to only tip revisions. Another common field is “filename”. A complete example would be:

resty POST /search '{"userId":"Joe_Coder", "ticket":"068E2C45060AA13A1706CFF5CF986F23", "queryRaw":"","paths":["//depot/Talkhouse/rel1.0/..."],"searchFields":[{"field":"headrevision","value":"*"}, {"field":"filename","value":"EBolt.java"}]}' | jsawk 'if (this.status.code==200) return this.payload; else return null' | jsawk -n 'out(this.depotFile + "#" + this.rev)'

If you only want to filter the query using searchFields and paths, this can be done by simply setting queryRaw to an empty string. This result can be created simply with a call to p4 files, but the example above highlights how fields can be addressed. Unlike with paths, the various searchFields are combined by a logical “and” operator.

File names or content

The queryRaw string can actually be prefixed depending on whether you want to search for strings in “filenames” or in the content of the file. An “f” or files: will look at file names whereas “c” or content: inspects the indexed content.

Tunneling Lucene queries through P4Search

It might well be that your needs are more complex and you want to benefit from some of the more advanced query features that the underlying Lucene index provides. P4Search will allow you to tunnel an original Lucene search string without manipulation by prefixing the query with “l” or “literal”. For an overview on query options please review this article.

Wildcard searches

Lucene allows for more flexible searches. That starts with rather simple patterns. Assume you want to find a file revision where the content merge2013 or merge2014 in it. “queryRaw”:”literal:merge201?” will give you the whole decade.

Ranges

Sometimes you might want ranges. Imagine you’ve stored images and indexed the aperture used by the camera when taking the pictures. Apache Tika (embedded in Solr) will extract the EXIF data from a JPEG file and store it in the Lucene index. Given Solr is configured accordingly, searching for it is easy. The aperture being a numerical value makes it practical to search for values from 1.4 to 2. It’s as easy as defining a query to be “queryRaw”:”l:aperture:[1.4 TO 2]”.

Conclusion

There are many more interesting use cases and topics to cover when accessing P4Search via it’s RESTful API. Please stay tuned for more blog posts on this topic.