MESMUSES methodology Lessons learned and open issues… Alain Michard Florence, June 2003
MESMUSES broad vision n n Just like several other projects SW is all about semantic interoperability n n n Sharing machine-readable terminologies and classification schemes Science and culture are collective and international Semantic Web methodology should be highly relevant for managing and sharing scientific and cultural information
Some key S&T issues in the Project n Model : is RDFS / OWL-Lite adequate ? n Schema authoring : method and tools needed ! n Metadata : where does it come from ? n Automatic Indexing : experiments with a categorizer
The basic SW model Dwelling Lives-in Person Produces Artefact Owner Schema House Artist Lives-in Create Artwork Surrogates Type : texte imprimé, monographie Creates Auteur(s) : Zola, Émile (1840 -1902) Titre(s) : L'assommoir [Texte imprimé] / par Emile Zola Edition : 50 e éd. Publication : Paris : G. Charpentier, 1878 Description matérielle : 111 -569 p. Notice n° : FRBNF 35963044 Real-world entities
Model and Schema Language n Typed attributes are needed n n n XML-Schema types Derived types (e. g. : Celsius temperature, Gregorian date, etc. ) Enumerated types, thesauri Time-stamping Cardinality constraints Explicit transitivity of properties (e. g. : geographic inclusion)
Schema authoring issues (1) n Find the right level of abstraction n Is « Glucid » a class or an instance ? Or is it sometime a class and sometime an instance ? Avoid the « KR » attitude and practices ! n It’s all about indexing resources with shared terminologies, not about representing human knowledge !
Schema authoring issues (2) est-constitué-de consomme ISA transforme est-régulé-par produit est-constitué-de Processus Système implique élimine Structure déclenche Processus complexe Processus élémentair e ISA est-documentée-par nécessite est-réalisé-par est-documentée-par Organisme Cellule Appareil Organe Molécule GTANS Tissus Grande Thématique est-expliquée-par
Schema authoring issues (3)
Schema authoring issues (4) n Authoring tools are badly needed n n n Graphical representation of the schema Zooming on sub-graphs (hierarchies) Versioning Consider using UML authoring environment ? Established methodology and tutorials are needed
Creating Surrogates n Data extraction and fusion from structured sources n n Updating n n n R-DB, XML-DB, LDAP When ? Should not create duplicates ! Detect cross-references n n Authority lists Thesauri Lexical distance ? ? ?
Automatic Categorization n Automatic indexing n n n n By extracting metadata from resources By automatic categorization Define hierarchies of « concepts » inside the schema Seeding with representative documents Machine learning to create categorizers Pros : enriched search functionality Cons : hierarchies of categories are static n Adding a category may change the categorizers of the others
Bottom-line… n n RDFS schema authoring may be more difficult than E-R modelling Debates on syntactic features are irrelevant n n Should be grounded on real-world implementations and testbeds A new query language (e. g. : RQL) is not high priority n We have not addressed the « logical rules » layer n Semantic Web vs. Community Webs