Generation of Instance Relationships in Ontologies by Analyzing Web Tables and Forms

 
 
   Abstract

The WordNet thesaurus that is currently used as the main source for our ontology service implementation is a relatively complete representation of concepts from reality. One type of information that is missing from it are real-world instances of concepts, like models of cars (Audi A8, Mercedes C220 etc.). Such information would be of great use for classifying HTML pages as well as for finding inputs to web forms.

The goal of this thesis is to find a way to generate instances of concepts by crawling the Web and analyzing information in tables and forms of web sites. As a (oversimplified) example, consider a drop-down element of a web form offering several models of cars. When such an element is labeled "car", we could derive that the contents of the form element are instances of the concept "car". As underlying tools, the BINGO! system and the HTML2XML framework that have both been developed by our group should be applied.

   Organization

Guidance:       Ralf Schenkel, Jens Graupmann
Student:          Jun Cai
Level:              Master's Thesis
Status:             finished
Start
:               April 2003
Prerequisites:  Experience with Java

   Additional Information and Literature

Back to the list of topics.

last change: Ralf Schenkel, May 19th, 2004.