Implementation and Evaluation of a Load-Aware Graph Partitioner for the FliX Framework
While there are many proposals for path indexes on XML documents, none of them is perfectly suited for indexing large-scale collections of interlinked XML documents. The FliX framework integrates the existing approaches by first partitioning the collection into structurally homogeneous subcollections and then choosing and creating the most adequate index structure for each subcollection. Queries are incrementally evaluated by first evaluating the query in its originating subcollection and then following links to connected subcollections.
The current implementation of FliX considers only structural features of the XML graph for building partitions. The goal of this thesis is to integrate statistics about the query load into the partitioning process. The new partitioner should create a partitioning that is "optimal" for the given query load.
A possible structure of the thesis could look like this:
- Build a theoretic model that models the cost needed to evaluate a collection of queries (e.g., using the number of "jumps" between subcollections as measure) with a given partitioning into subcollections
- Derive a partitioning algorithm that, given a query load, builds a partitioning that minimizes the cost to execute the query
- Implement the algorithm with the current FliX implementation
- Evaluate the solution
•
Guidance: Ralf Schenkel
Student: Mohammad Alrifai
Level: Master's Thesis
Status: finished
Prerequisites: Basic Knowledge about XML, experience with Java
•
last change: Ralf Schenkel, June 8, 2006.