An ontology is a specification of a conceptualization.
The world's greatest ontology problem is, "how do you make sense out of
the web?". How can multiple providers exchange data? How do you specify
the location of web content?
The Open Directory Project
is an amazing effort to solve the web ontology problem. In this
project, a large number of human editors collaborate in the
creation of an RDF (yet another XML dialect) index of the web,
which is available for download. The index is used by many
search engines as a seed for generating web indexes.
I have started to use the index in my own research, and I have
written a few perl programs to process the data. These
programs are available, or will be made available, under the GPL.
The purpose of the programs is to show the scope of
the Open Directory editors, and to understand the structure
of the Open Directory in depth. Since I wish to become an
editor in some of my areas of interest, I want to understand
the editors, their interests and the structure of the information.
Programs
- License Terms
- GNU General Public License. Contact support@tomacorp.com
for alternative licensing.
- RDF Trimmer
- This program reads the large RDF file and creates a subset
of the content. It removes certain top-level categories, which
can be easily changed in the program.
A Mathematical Approach
Jon Kleinberg
has an amazing approach which can categorize web content using the
eigenvectors of a matrix created from links found with web search engines.
This is described in his paper
Authoritative sources in a hyperlinked environment. I think this approach
could be used to generate a powerful tool for Open Directory editors.
Other Ontology Links
I saved these links while I was looking for someone who was working
on the general web ontology problem.

12/18/2000
By toma