A contestant must implement 4 major functions: StartQuery(), EndQuery(), MatchDocument(), and GetNextAvailRes(). The detailed parameters and specifications of these functions are described here.
These functions will be called by the testing framework. StartQuery() adds a query to the set of active queries, while EndQuery() removes it from that set. Each query is associated with the required matching type (exact, edit distance, or Hamming distance), and matching distance (for non exact matching).
MatchDocument() matches a document with the current set of active queries, and saves the result somewhere in the main memory, until it is requested by GetNextAvailRes(). That is, the number of calls to GetNextAvailRes() will be equal to the number of calls to MatchDocument(). Instead of letting MatchDocument() return the results directly, GetNextAvailRes() is introduced to allow the contestant to process several calls to MatchDocument() at the same time, using pthreads (which is recommended since the testing machine has 12 cores). However, using parallelism is not a requirement.
To successfully deliver results of a document, GetNextAvailRes() must return the document ID, along with all query IDs matching the document, sorted by query ID. Results of each document must be delivered exactly once. A call to GetNextAvailRes() must deliver results of any undelivered document, unless there are no such results.
The total size of active queries and temporary results will be at most 25% of the available main memory. The total number of calls to StartQuery() will be at least twice as the total number of calls to MatchDocument(). Since we are trying to model real queries, many queries will exhibit significant overlap.
The task API header file, a basic implementation of the task interface, the test driver along with an example workload, and a Makefile are available in the tarball: sigmod2013contest-1.1. The README file inside the tarball explains how to compile and run the code.
The submissions will be evaluated on a machine with the following specs:
|Processor||Intel Xeon X5650|
|Processor Speed||2.67 GHz|
|Configuration||2 processors (12 cores total)|
|L1 Cache Size||64 KB/core|
|L2 Cache Size||256 KB/core|
|L3 Cache Size||12MB|
|Main Memory||96 GB|
|Operating System||Ubuntu 12.10 (kernel: 3.5.0-17-generic)|
* Teams must consist of individuals currently registered as students (graduate or undergraduate) in an academic institution. Teams can register on the contest site after March 1, 2013. Multiple teams from an academic institution may participate.
* Registered teams are allowed to submit (as a test) a 64-bits Linux library implementing the contest interface functions in C/C++. The test submission will be evaluated and its results will appear on the leaderboard. The team institution and country will not be shown in the leaderboard if this is specified during team registration. Teams are allowed to make several test submissions.
* Qualifying teams must submit the source code of their final implementation before 16 April. This final submission will be evaluated after the deadline and the finalists will be chosen based on it. Teams can submit several times, but only the last final submission will be considered.
* A submission must contain only code written by the team or open-source licensed software. Source code from books or public articles is permitted, such that a clear note about the source exists.
* By participating in this contest, each team agrees to publish its source code in case it is selected as a finalist. A team is eligible for the prize if at least one of its members will come to present his work at the SIGMOD 2013 conference.