Title: Semantic problems of thesaurus mapping
Speaker: Martin Doerr, Foundation for Research and Technology, Hellas (FORTH), Greece
Global information access to more and more heterogeneous data sources makes the problem of terminology and interoperability of thesauri more urgent, be it to improve performance of full text retrieval systems or for controlled vocabularies in structured data, now also increasingly in metadata. Thesauri are created in different languages, with different scope and point of view, and at different levels of abstraction or detail. The desirable total of knowledge we would like to build into general purpose information access tools in form of thesauri is distributed over numerous terminological resources. We have hardly the hope to create THE superresource due to lack of man power and theoretical understanding. The interest shifts from merging thesauri to translating between interlinked thesauri, or even more loosely to switch between federated thesauri. In all these methods, a central problem is the mapping of equivalent parts of different thesauri and a well-founded understanding of how terms can be compared.
A method for mapping thesauri based on an extension of ISO5964 is presented,
and a proposal is made how recall and precision can be controlled under transition
from one thesaurus into another. In practice however, thesaurus can become fairly
difficult due to a series of problems. Those may lie in the different coverage
of a domain, different levels of abstraction,
pre- or postcordination.
Even more difficult to capture are the different aspects, under which the splitting into narrower terms is done, and the complementary polysemy of natural terms. An empirical case study about criteria for creating narrower terms is presented, and ideas for a more effective representation discussed.