- Количество слайдов: 53
Tearing down walls and Building bridges Principles and pragmatics of a Semantic Culture Web
Overview • Virtual collections and Semantic Web • Semantic collection-search demonstrator – For cultural heritage objects • Metadata & vocabulary representation and enrichment • Principles for knowledge engineering on the Web
Acknowledgements • Part of large Dutch knowledge-economy project Multimedia. N • Partners: VU, CWI, Uv. A, DEN, ICN • People: Alia Amin, Lora Aroyo, Mark van Assem, Victor de Boer, Lynda Hardman, Michiel Hildebrand, Laura Hollink, Marco de Niet, Borys Omelayenko, Marie-France van Orsouw, Jacco van Ossenbruggen, Guus Schreiber Jos Taekema, Annemiek Teesing, Anna Tordai, Jan Wielemaker, Bob Wielinga • Artchive. com, Rijksmuseum Amsterdam, Dutch ethnology musea (Amsterdam, Leiden), National Library (Bibliopolis)
Hypothesis • Semantic Web technology is in particular useful in knowledge-rich domains or formulated differently • If we cannot show added value in knowledgerich domains, then it may have no value at all
The Web: resources and links Web link URL
The Semantic Web: typed resources and links Painting “Woman with hat SFMOMA Dublin Core ULAN creator Henri Matisse Web link URL
Principle 1: semantic annotation • Description of web objects with “concepts” from a shared vocabulary
Principle 2: semantic search • Search for objects which are linked via concepts (semantic link) • Use the type of semantic link to provide meaningful presentation of the search results Query “Paris” Paris Part. Of Montmartre
The myth of a unified vocabulary • In large virtual collections there always multiple vocabularies – In multiple languages • Every vocabulary has its own perspective – You can’t just merge them • But you can use vocabularies jointly by defining a limited set of links – “Vocabulary alignment” • It is surprising what you can do with just a few links
Principle 3: vocabulary alignment “Tokugawa” AAT style/period Edo (Japanese period) Tokugawa AAT is Getty’s Art & Architecture Thesaurus SVCN period Edo SVCN is local in-house ethnology thesaurus
A link between two thesauri
Levels of interoperability • Syntactic interoperability – using data formats that you can share – XML family is the preferred option • Semantic interoperability – How to share meaning / concepts – Technology for finding and representing semantic links
Distributed vs. centralized collection data • Minimal requirement: collection object has image URI • Preference for external metadata, accessed through protocol such as OAI • In practice, external metadata access is still cumbersome
http: //e-culture. multimedian. nl/demo/search
Search strategies • Basic search: keyword-oriented • Advanced search: – Tweaking default search parameters – Time-related queries • Faceted search • Relation search – How are two URIs related?
Keyword search with semantic clustering 1. Btree of literals plus Porter stem and metaphone index 2. Find resources with matching labels • Default resources are “Work”s 3. Find related resources by one-way graph traversal • • owl: inverse. Of is used Threshold used for constraining search 4. Cluster results (group instances)
Search: Word. Net patterns that increase recall without sacrificing precisions
Term disambiguation is key issue in semantic search • Post-query – Sort search results based on different meanings of the search term – Mimics Google-type search • Pre-query – Ask user to disambiguate by displaying list of possible meanings – Interface is more complex, but more search functionality can be offered
Faceted search • Use Dublin Core scheme to formulate complex queries • Navigate through relevant metadata
What do you need to do to make your collection part of a Semantic Culture Web? Four activities
From metadata to semantic metadata 1. Make vocabulary interoperable 4. Align vocabulary 2. Align metadata schema 3. Enrich metadata
Activity 1: syntactic vocabulary interoperability • Making vocabularies available in the Web standard RDF • Many organizations already do this • W 3 C provides the SKOS template to make this almost straightforward • Effort required: at most a few days
Multi-lingual labels for concepts 33
Semantic relation: broader and narrower • No subclass semantics assumed! 34
Activity 2: aligning the metadata schema • Specify your collection metadata scheme as a specialization of Dublin Core • With RDF/OWL this is easy/trivial! • Cf. DC Application Profiles
Aligning VRA with Dublin Core • VRA is specialization of Dublin Core for visual resources • VRA properties “material. medium” and “material. support” are specializations of Dublin Core property “format” vra: material. medium rdfs: sub. Property. Of dc: fotmat. vra: material. medium rdfs: sub. Property. Of dc: format.
Activity 3: enriching the metadata • Extracting additional concepts from an annotation – Matching the string “Paris” to a vocabulary term • Information-extraction techniques exists (and continue to be developed) • Effort required can be up to a few weeks – The more concepts, the better, but no need to be perfect!
Example textual annotation
Resulting semantic annotation (rendered as HTML with RDFa)
RDFa: embedding RDF in (X)HTML Regular HTML with RDFa Resulting RDF statements 41
Activity 4: aligning the vocabulary • Find semantic links between vocabulary links – Derain (ULAN) related-to Fauve (AAT)) • Automatic techniques exists, but performance varies • Often combination of automatic and manual alignment • Effort strongly dependent on vocabularies – But “a little semantic goes a long way” (Hendler)
Learning alignments • Learning relations between art styles in AAT and artists in ULAN through NLP of art historic texts – “Who are Impressionist painters? ”
Extracting additional knowledge from scope notes
Principles for knowledge engineering on the Web
Principle 1: Be modest! • Ontology engineers should refrain from developing their own idiosyncratic ontologies • Instead, they should make the available rich vocabularies, thesauri and databases available in web format • Initially, only add the originally intended semantics
Principle 2: Think large! Doug Lenat "Once you have a truly massive amount of information integrated as knowledge, then the human-software system will be superhuman, in the same sense that mankind with writing is superhuman compared to mankind before writing. "
Principle 3: Develop and use patterns! • Don’t try to be (too) creative • Ontology engineering should not be an art but a discipline • Patterns play a key role in methodology for ontology engineering • See for example patterns developed by the W 3 C Semantic Web Best Practices group http: //www. w 3. org/2001/sw/Best. Practices/ • SKOS can also be considered a pattern
Principle 4: Don’t recreate, but enrich and align • Techniques: – Learning ontology relations/mappings – Semantic analysis, e. g. Onto. Clean – Processing of scope notes in thesauri
Principle 5: Beware of ontological over-commitment!
Principle 6: Specifying a data model in OWL does ot make it an ontology! • Papers about your own idiosyncratic “university ontology” should be rejected at SW conferences • The qality of an ontology does not depend on the number of OWL constrcts sed
Principle 7: Required level of formal semantics depends on the domain! • In our semantic search we use three OWL constructs: – owl: same. As, owl: Transitive. Property, owl: Symmetric. Property • But cultural heritage has is very different from medicine and bioinformatics – Don’t over-generalize on requirements for e. g. OWL
Perspectives • Basic Semantic Web technology is ready for deployment • Research themes: – Scalability, vocabulary alignment, metadata extraction • Web 2. 0 facilities fit well: – Involving community experts in annotation – Personalization • Social barriers have to be overcome!