Implementation and Evaluation of a Load-Aware Graph Partitioner for the FliX Framework

 
 
   Abstract

While there are many proposals for path indexes on XML documents, none of them is perfectly suited for indexing large-scale collections of interlinked XML documents. The FliX framework integrates the existing approaches by first partitioning the collection into structurally homogeneous subcollections and then choosing and creating the most adequate index structure for each subcollection. Queries are incrementally evaluated by first evaluating the query in its originating subcollection and then following links to connected subcollections.

The current implementation of FliX considers only structural features of the XML graph for building partitions. The goal of this thesis is to integrate statistics about the query load into the partitioning process. The new partitioner should create a partitioning that is "optimal" for the given query load.

A possible structure of the thesis could look like this:

  • Build a theoretic model that models the cost needed to evaluate a collection of queries (e.g., using the number of "jumps" between subcollections as measure) with a given partitioning into subcollections
  • Derive a partitioning algorithm that, given a query load, builds a partitioning that minimizes the cost to execute the query
  • Implement the algorithm with the current FliX implementation
  • Evaluate the solution

   Organization

Guidance:        Ralf Schenkel
Student:           Mohammad Alrifai
Level:              Master's Thesis
Status:             finished
Prerequisites:  Basic Knowledge about XML, experience with Java

   Additional Information and Literature

Back to the list of topics.

last change: Ralf Schenkel, June 8, 2006.