Ontology
An ontology is a specification of a conceptualization. The world's greatest ontology problem is, "how do you make sense out of the web?". How can multiple providers exchange data? How do you specify the location of web content?

The Open Directory Project is an amazing effort to solve the web ontology problem. In this project, a large number of human editors collaborate in the creation of an RDF (yet another XML dialect) index of the web, which is available for download. The index is used by many search engines as a seed for generating web indexes.

I have started to use the index in my own research, and I have written a few perl programs to process the data. These programs are available, or will be made available, under the GPL. The purpose of the programs is to show the scope of the Open Directory editors, and to understand the structure of the Open Directory in depth. Since I wish to become an editor in some of my areas of interest, I want to understand the editors, their interests and the structure of the information.

Programs

License Terms
GNU General Public License. Contact support@tomacorp.com for alternative licensing.
RDF Trimmer
This program reads the large RDF file and creates a subset of the content. It removes certain top-level categories, which can be easily changed in the program.

A Mathematical Approach

Jon Kleinberg has an amazing approach which can categorize web content using the eigenvectors of a matrix created from links found with web search engines. This is described in his paper Authoritative sources in a hyperlinked environment. I think this approach could be used to generate a powerful tool for Open Directory editors.

Other Ontology Links

I saved these links while I was looking for someone who was working on the general web ontology problem.

12/18/2000

By toma