64fca7f3525bdb429bb76c6095f5d5e3.ppt
- Количество слайдов: 73
Data. Mining versus Semantic. Web Veljko Milutinovic, vm@etf. bg. ac. yu http: //galeb. etf. bg. ac. yu/vm © Fraunhofer IPSI This material was developed with financial help of the WUSA fund of Austria.
Data. Mining versus Semantic. Web § Two different avenues leading to the same goal! § The goal: Efficient retrieval of knowledge, from large compact or distributed databases, or the Internet § What is the knowledge: Synergistic interaction of information (data) and their relationships (correlations). § The major difference: Placement of complexity © Fraunhofer IPSI 2/73
Essence of Data. Mining § Data and knowledge represented with simple mechanisms (typically, HTML) and without metadata (data about data). § Consequently, relatively complex algorithms have to be used (complexity migrated into the retrieval request time). § In return, low complexity at system design time! © Fraunhofer IPSI 3/73
Essence of Semantic. Web § Data and knowledge represented with complex mechanisms (typically XML) and with plenty of metadata (a byte of data may be accompanied with a megabyte of metadata). § Consequently, relatively simple algorithms can be used (low complexity at the retrieval request time). § However, large metadata design and maintenance complexity at system design time. © Fraunhofer IPSI 4/73
Major Knowledge Retrieval Algorithms (for Data. Mining) § § § Neural Networks Decision Trees Rule Induction Memory Based Reasoning, etc… § Consequently, the stress is on algorithms! © Fraunhofer IPSI 5/73
Major Metadata Handling Tools (Semantic. Web) § § XML RDF Ontology Languages Verification (Logic +Trust) Efforts in Progress § Consequently, the stress is on tools! © Fraunhofer IPSI 6/73
Semantic Web Tutorial Structure (Overview) § § § © Fraunhofer IPSI Introduction to the Semantic Web XML Technologies for the Semantic Web Defining vocabularies with RDF Ontologies and ontology languages Challenges for the Semantic Web References 7/73
World Wide Web - Today Information consumer preferences Information request preferences Search Engines (eg. Google), Information Portals Indexing, refences, collections © Fraunhofer IPSI Information and Service Providers 8/73
Semantic Web - Vision S+ User Preferences S+ … Calendar Request/Task Interpretation Preferences Interpretation Communication, Negotiation, Planning, Decisions, Proofs Agents Ratings, Signatures, Certificates Interpretation S+ Semanticly enriched information © Fraunhofer IPSI S+ S+ „Trust“-Services Information and Service Provider 9/73
A Definition of the Semantic Web “Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation” Tim Berners-Lee, James Hendler, Ora Lassila, The Semantic Web, Scientific American, May 2001 © Fraunhofer IPSI 10/73
Why? " To use the large amount of information on the Web more effectively " To enable more advanced automated processing on the Web - machines can “understand” the content " Intelligent browsers to help you find what you are looking for " To derive new information from existing information (reasoning) - Virtual global database " Advanced applications and services become possible, e. g. in - e-business - e-government - e-learning © Fraunhofer IPSI 11/73
Examples " Context-awareness -- linking based on the meaning of the information elements " Filtering -- you could rate the pages you visit, and this is later used for automatic general recommendations " Annotations -- you could add comments to the information on the Web, and these comments can be shown to other visitors " Privatization -- you can create your own database of information from the Web © Fraunhofer IPSI 12/73
How? - Semantic Web layer model © Fraunhofer IPSI 13/73
Trusted Web Resources DAML+OIL OWL XML RDF Terminology Describing SGML Hy Time © Fraunhofer IPSI 2010 Self Documents 2000 HTTP HTML Shared machine Foundation of Web today 1990 Machine Human Document Exchange Format 1985 14/73
Building Blocks Semantic Web Metadat a Data about data – labeling and structuring information in a document © Fraunhofer IPSI URI Universal Resource Identifier – an universal and unique name for any resource http: //www. something. com/on e 15/73
Minimalist Design • Making it as simple as possible • Simplicity helps future evolution of Semantic Web © Fraunhofer IPSI 16/73
Inference • Deriving new data from the existing ones • Merging data repositories gives new information • Allows the creation of more powerful applications (intelligent agents) • Unfortunately, inference can be achieved completely only when the semantics is defined formally in a language (e. g. "First Order Predicate Logic“ languages) © Fraunhofer IPSI 17/73
Tutorial Structure § § § © Fraunhofer IPSI Introduction to the Semantic Web XML Technologies for the Semantic Web Defining vocabularies with RDF Ontologies and ontology languages Challenges for the Semantic Web References 18/73
XML Technologies for the Semantic Web § § § © Fraunhofer IPSI Overview XML Instances XML Document Type Definition XML Linking XML Schema XML Query Language 19/73
What is an XML-Document ? <? xml version="1. 0"? > a <a> <b id="x 1"> <c>David</c> id=x 1 id=x 2 <c>Marie</c> b d b </b> <d/> <b id="x 2"> c c c <c>John</c> </b> </a> David Marie John File Format (Instance) © Fraunhofer IPSI Tree Structure Instance a * id b d * c Schema (Document Type Definition, DTD) 20/73
The XML Stack Specific Applications Standardized Applications XHTML, SVG, SMIL, P 3 P, Math. ML Layout Hyperlinks Metadata - XSL - RDF, RDFS - CSS - XLink - XPointer API Schemas Queries - DOM - XSD - Namespaces - XPath - SAX XML 1. 0 © Fraunhofer IPSI Locators (URI) - XQuery Unicode DTDs 21/73
Example of songs. xml • Example of describing a song in songs. xml using music. dtd parent element defined in music. dtd <song> <title>Gipsy song</title> <artist>Vlatko Stefanovski</artist> <type class=”ETHNO” /> <download class=“YES”/> <comments/> </song> child elements defined in music. dtd © Fraunhofer IPSI 22/73
Parent element <!ELEMENT song (title, artist, album? , type, format? , download, comments? )> <!ELEMENT title (#PCDATA)> Child <!ELEMENT artist (#PCDATA)> elements <!ELEMENT type EMPTY> Attributes describe <!ATTLIST type content class (CLASSICAL | ROCK | POP | RAP | JAZZ | TECHNO | ETHNO) #REQUIRED> <!ELEMENT download EMPTY> <!ATTLIST download class (YES | NO) "YES" List of values > for <!ELEMENT comments (#PCDATA)> download Music. dtd © Fraunhofer IPSI 23/73
XML Linking Simple Link XPointer © Fraunhofer IPSI Extended Link Group 24/73
XPath • A language that enables us to address parts of an XML document (elements, attributes, …) • Select the title elements of the song elements of the catalog element and all the artist elements in the document selects any element in the selects the child element document selects several paths /catalog/song/title | //artist • Selects all the song elements of the catalog element that have a download element with a value of yes: /catalog/song[download=yes]/title © Fraunhofer IPSI 25/73
Also… • Use * to select unknown XML elements /catalog/*/artist • Use @attribute_name to specify an attribute //song[@type=‘classical'] • XPath expressions – logical, arithmetical /catalog/song[duration<5] • XPath functions - count(), id(), last(), name(), concat(), string(), trenslate(), sum(), round(), false(), not(), … /catalog/song[last()] • To select nodes from the XML document (IE) xml. Doc. select. Nodes("/catalog/song/title/text()") the path © Fraunhofer IPSI 26/73
XPointer • Locates portions of other XML documents (elements, attributes…), without the need to place anchors inside those documents (as in HTML) • More robust to the changes in the target document • URL + XPath • http: //www. music. org/first. xml/#xpointer(//song/title[1]) URL of the document we point into © Fraunhofer IPSI XPointer expression (XPath language) 27/73
XML Schema • XML Schema defines a class of XML documents • Defines (explains) the datatypes, elements, and attributes • Defines and catalogues vocabularies for classes of XML documents • The document described by an XML schema can be called an instance (parallel to OOP) • The schema language, considerably extends the capabilities of XML 1. 0 document type definitions (DTDs), most importantly with datatypes © Fraunhofer IPSI 28/73
Syntax: Not XML Limitations of DTDs Practically no reuse of contentmodels <!ELEMENT song (title, artist, album? , type, format? , download, comments? )> <!ELEMENT title (#PCDATA)> <!ELEMENT artist (#PCDATA)> Constructors: Elementset with Content Model <!ELEMENT type EMPTY> <!ATTLIST type class (CLASSICAL | ROCK | POP | RAP | JAZZ | TECHNO | ETHNO) #REQUIRED> <!ELEMENT download EMPTY> <!ATTLIST download Datentypes: Essentially only "String" class (YES | NO) "YES"> <!ELEMENT comments (#PCDATA)> © Fraunhofer IPSI 29/73
XML Schema Components • An XML Schema is comprised of a set of schema components • There are three groups of components Primary components - Simple type definitions, Complex type definitions, Attribute declarations, Element declarations Secondary components - Attribute group definitions, Identity-constraint definitions, Model group definitions, Notation declarations “Helper” components – Annotations, Model groups, Particles, Wildcards, Attribute Uses © Fraunhofer IPSI 30/73
Example – song Type definition Type declarations Complex type <xsd: complex. Type name=“song" > <xsd: sequence> <xsd: choice Simple <xsd: element name=“title" type="xsd: string"/> > type <xsd: element name=“artist" type="xsd: string"/> </xsd: sequence> </xsd: choice > <xsd: attribute name=“length" type="xsd: duration"/> </xsd: complex. Type> xsd – used to denote XML Schema namespace © Fraunhofer IPSI 31/73
Reusability of schemas • xs: include – to include a schema from another document (copy-paste) <xs: include schema. Location=“collection. xsd"/> • xs: redefine – same, plus it lets you redefine schema • xs: import - reusing definitions from other namespaces (a system of libraries) <xs: import namespace="http: //www. w 3. org/XML/1998/namespace" schema. Location="myxml. xsd"/> Now we can reference an external element from the imported namespace in our schema © Fraunhofer IPSI 32/73
Tutorial Structure § § § © Fraunhofer IPSI Introduction to the Semantic Web XML Technologies for the Semantic Web Defining vocabularies with RDF Ontologies and ontology languages Challenges for the Semantic Web References 33/73
Defining vocabularies with RDF § Motivation for RDF § RDF Instances § Basic concepts and building blocks § Syntax options § Reification § Collections § RDF Schema: Defining your own Vocabularies § Supporting Interoperability with RDF © Fraunhofer IPSI 34/73
What do we NOT get from XML? Superimposing (meta) information: § XML combines metainformation and content Datatypes that we can „reason“ about: § Example: CLASSICAL | ROCK | POP | RAP | JAZZ | TECHNO | ETHNO is just a choice of allowed strings. We cannot represent that DIXIE is a subclass of JAZZ, BLUES overlaps with ROCK, ETHNO Bottom up reuse of vocabularies § Independently evolved XML Schemas for one and the same thing § How do you model an „address“? © Fraunhofer IPSI 35/73
RDF: Defining Semantics on the Web § There is a need to describe resources on the Web in a form that can be interpreted by machines across the Web § Interpretation depends on the context of a resource eg. Jaguar (car vs. beast) § Using their experience and cognitive abilities humans may infer the context of a resource in many ways, even if it is not made explicit § Software can interpret context only if it is described explicitly and formally § RDF and the ontology languages building upon RDF provide means to explicate (part of) this context © Fraunhofer IPSI 36/73
RDF-Resource Description Framework § Defines a framework for structuring and describing resources like documents in the Semantic Web § Enables the definition of vocabularies for the description of resources in an application domain; § Goals: Ø Extensibility, interoperability, and reuse of vocabularies; Ø Improved support for interpretation of data by machines © Fraunhofer IPSI 37/73
The RDF Data Model § Simple but powerful datamodel for the description of resources and the creation of metadata § Consists of three core concepts: Ø Resource Ø Property Ø Statement + Class (in RDF Schema) § Similar to other modeling approaches (e. g. object-oriented modeling), but property-centric, not class-centric © Fraunhofer IPSI 38/73
RDF Statement and Graph § Each triple (S, P, O) node - arc - node represents an RDF statement Gipsy song is performed by Vlatko Stefanovski. subject (resource ) http: //www. music. org/songs/g/gips y. Song represented by entry in a (fictive) song © Fraunhofer IPSI directory object predicate (resource or literal) (property ) http: //www. artist. org/stefanovsk Performed by Artist represented by his homepage 39/73
Arcs in the RDF Graph An Arc § represents the predicate of an RDF statement § is labeled with a URI referring to an RDF property § is directed pointing from the subject of a statement to the object of a statement subject http: //www. music. org/songs/g/gips y. Song © Fraunhofer IPSI object predicate http: //www. artist. org/stefanovsk music: performed by 40/73
RDF Resource • The Resource forms the central concept in RDF • Anything that can be described can act as a resource Web page, part of web page, web site, book, photograph, persons, … • Resources are identified by a resource identifier - URI (plus optional anchor IDs) • Compare for an entity (in the Entity Relationship model) or an object (in an object-oriented model) © Fraunhofer IPSI 41/73
RDF Property An RDF Property is used to express • A characteristic of an resource or • A binary relation between resources • A predicate in a statement • A property can be compared to a (binary) relationship among entities (in the Entity Relationship model) © Fraunhofer IPSI 42/73
Example URI reference The individual whose name is Vlatko Stefanovski and whose email is V. Stefanovski@artists. org, is the artist of http: //www. music. org/songs/g/gipsy. Song blank music: artist node person: name Vlatko Stefanovski © Fraunhofer IPSI person: homepa ge http: //www. artists. org/stefa novski literal 43/73
XML Serialization • How to translate the RDF graph structure into XML’s tree-oriented notation <rdf: Description rdf: about = “http: //www. music. org/songs/g/gipsy. Song”> <music: performedby> http: //www. music. org/songs/g/gips <rdf: Description> y. Song <person: name> Vlatko Stefanovski</person: name> music: performe <person: homepage> dby <rdf: Description about = “http: //www. artists. org/stefanovski”> </rdf: Description> </person: homepage> person: name person: homepag </rdf: Description> e </music: performadby> http: //www. artists. org/stefanov </rdf: Description> Vlatko © Fraunhofer IPSI 44/73 Stefanovski
Reification § Latin: Res. . . Thing -> Reification. . . “Thing Making“ § Statements themselves can be considered as resources (things) in RDF. Thus, it is possible to make statements about statements (Reification). § Possible applications: Ø Definition of a context for a statement with respect to time, place, validity, …. Ø Embed a statements into a discourse (claims, doubts, proofs of statements) Ø … Example: Statement A: <sonata XY> <composer> <Mozart> Statement B: <music expert A> <claims> <statement A> <music expert C> <doubts> <statement A> © Fraunhofer IPSI 45/73
Reification Syntax § The statement to be reified has to be modeled as an RDF resource; § The RDF vocabulary provides special constructs for this purpose: • The class rdf: Statements which is the type of all RDF statements. • The property rdf: type which is used to associate an RDF resource with a class. • The property rdf: subject refers to the subject of the modeled statement (i. e. to the described resource) • The property rdf: predicate refers to the property used as a predicate in the modeled statement • The property rdf: object refers to the object of the modeled statement (i. e. the property value) © Fraunhofer IPSI 46/73
How to create a reified statement? • Associate the subject, predicate and object of the statement with the resource rdf: Statement This is done by using the rdf: subject, rdf: predicate and rdf: object properties; rdf: Statement rdf: type rdf: subject rdf: predicate rdf: object music: composer www. operas. org/Zauberflöt e © Fraunhofer IPSI music: composer www. artists. org/Moza rt 47/73
How to create a reified statement? • Now the created node which represents the statements can be used as an object or subject of another RDF statement Statement becomes a resource www. music. Experts. org/Expert. A rdf: Statement music: claimed. By rdf: subject rdf: predicate rdf: type rdf: object music: composer www. operas. org/Sonata. XY © Fraunhofer IPSI www. artists. org/Moza rt 48/73
XML Syntax for Reification <rdf: RDF xmlns: rdf = "http: //w 3. org/1999/02/22 -rdf-syntax-ns#" xmlns: music="http: //ipsi. fhg. de/music-schema#"> <rdf: Description> <rdf: type resource=" http: //w 3. org/1999/02/22 -rdf-syntax-ns#Statement” > <rdf: subject resource="http: //www. operas. org/Sonata. XY " /> <rdf: predicate resource="http: //ipsi. fhg. de/music-schema#Composer" /> <rdf: object resource = “http: //www. artists. org/Mozart” /> <music: claimed. By resource = “http: //www. music. Experts. org/Expert. A” /> </rdf: Description> </rdf: RDF> Property of the statement © Fraunhofer IPSI 49/73
RDF Collections § An RDF Container models a collection of resources. § The RDF model supports three types of containers: • Bag - an unordered list of resources or literals. • Sequence - An ordered list of resources or literals. • Alternative - A list of resources or literals that represent alternatives for the (single) value of a property. § Bag and Sequence can be used for multivalued properties © Fraunhofer IPSI 50/73
Container - RDF Graph Syntax Example: Collection of Arias rdf: Sequenc e rdf: type …/Aria 1 http: //www. opera. org/Zauberflöte music: aria s rdf: _1 rdf: _2 rdf: _3 …/Aria 2 …/Aria 3 rdf: _4 …/Aria 4 © Fraunhofer IPSI 51/73
Container - XML Syntax <rdf: RDF> <rdf: Description about="http: //www. operas. org/Zauberflöte"> <music: arias> <rdf: Sequence> <rdf: li resource=“…/Aria 1”> <rdf: li resource=“…/Aria 2"/> <rdf: li resource=“…/Aria 3"/> <rdf: li resource=“…/Aria 4"/> </rdf: Sequence> </music: arias> </rdf: Description> </rdf: RDF> © Fraunhofer IPSI 52/73
Statements about a Container and its members § rdf: about is used to make a statement on the comtainer as a whole § rdf: about. Each is used to make a statement about each of the members of the container § rdf: about. Each. Prefix makes a statement about each member resource of a Bag that is only implicitly defined. This Bag contains all the resources whose fully resolved resource identifiers begin with the character string given as the value of the attribute © Fraunhofer IPSI 53/73
RDF Schema The RDF vocabulary description language - RDF Schema stresses Ø Reuse and extension of existing schemata Ø Semantic enrichment by concept hierarchies Enables statements on the schema level to: Ø define classes of resources Ø define relationships between these classes Ø define the kinds of properties that instances of that classes have Ø define relationships between properties Ø to restrict possible combinations of classes and relationships/properties Allows mixing of schemata © Fraunhofer IPSI 54/73
Defining an RDF Class (Example) Class <rdfs: Class rdf: about = “http: //www. ipsi. fhg. de/musicrdfs: Resource schema#Music. Composition“> <rdfs: sub. Class. Of rdf: resource =“http: //www. w 3. org/2000/rdf-schema#Resource”> rdfs: sub. Class. Of <rdfs: label>Music. Composition</rdfs: label> </rdfs: Class> music: Music. Composition Instance <rdf: Description about = “http: //www. operas. org/Zauberflöte”> <rdf: type rdf: typ rdf: resource = “http: //www. ipsi. fhg. de/music-schema#Music. Composotion > </rdf: type> … </rdf: Description> www. operas. org/Zauberflö © Fraunhofer IPSI 55/73
Class-centric vs. Property-centric Class-centric § Attributes as part of the class definition § Stresses common structure Property-centric § Property as first-class object § Stresses extensibility and flexibility with respect to properties music: composer rdfs: domain rdfs: Literal music: Music. Title rdfs: range person: Person rdfs: domain rdf: label © Fraunhofer IPSI 56/73
Defining Concept Hierarchies rdfs: sub. Class. Of represents a specialization relationship between RDF classes (transitive). music: Genre rdfs: sub. Class. Of music: Modern music: Classic rdfs: sub. Class. Of music: Sonata rdf: type rdfs: sub. Class. Of music: Rock music: Opera rdfs: sub. Class. Of music: Rock. Opera http: //www. operas. org/Zaube rflöte rdf: type © Fraunhofer IPSI music: Music. Title 57/73
RDF Property Hierarchies rdfs: sub. Property. Of • Is used to specify that one property is a specialization of another property • If a resource r has value v for property p 1 and property p 1 is subproperty of p 2 than r also has value v for property p 2. Some. Song is also a value of performs rdfs: sub. Property. Of sings Some. Song is a value for sings © Fraunhofer IPSI Cher “Some. Song“ performs 58/73
Tutorial Structure § § § © Fraunhofer IPSI Introduction to the Semantic Web XML Technologies for the Semantic Web Defining vocabularies with RDF Ontologies and ontology languages Challenges for the Semantic Web References 59/73
Ontologies and Ontology Languages § What is an Ontology? § The Ontology Language OWL Ø Taxonomies Ø Property Restrictions © Fraunhofer IPSI 60/73
Ontology „An ontology is a specification of a conceptualization. ” * • A conceptualization is an abstract, simplified view of the world that we wish to represent for some purpose. *T. R. Gruber. A translation approach to portable ontologies. Knowledge Acquisition, 5(2): 199 -220, 1993 © Fraunhofer IPSI 61/73
Ontology Languages • Ontology languages are semantic markup languages for defining ontologies • DAML+OIL is a combination of the two predecessor ontology languages • DAML – DARPA Agent Markup Language • OIL - Ontology Inference Layer • OWL (Web Ontology Language) is the successor DAML + OIL currently developed by the W 3 C Web Ontology Group (Status: Working Draft) • Building on RDF ideas • OWL Lite is a subset of OWL © Fraunhofer IPSI 62/73
OWL Characteristics § OWL enables the definition of Ø various types of relationships between classes (in addition to subclass hierarchies) Ø additional restrictions for property values Ø additional types of relationships between properties Ø different kinds of properties § OWL distinguishes between classes and instances (objects) on the one side, and data types and value on the other side (XML Schema datatypes) © Fraunhofer IPSI 63/73
Ontology Definition § The body of the ontology consists of: Ø classes Ø properties Ø instances (for use in class definitions) § The main component of an ontology is a taxonomy i. e. a class hierarchy © Fraunhofer IPSI 64/73
Further Class Relationships A class definition may also contain other class relationships § owl: disjoint. With – this property is used to express that a class is disjoint with another class (no instances in common); § owl: same. Class. As - this property is used to express that a class is equivalent to another class (same instances); § The values of these properties are defined by a class expressions, which in the simplest case is the name (URI) of a class; © Fraunhofer IPSI 65/73
Properties in OWL § OWL properties are deferred from RDF properties § It is possible to define Ø different types of properties, where several property types can be combined with each other Ø relationships between properties © Fraunhofer IPSI 66/73
Tutorial Structure § § § © Fraunhofer IPSI Introduction to the Semantic Web XML Technologies for the Semantic Web Defining vocabularies with RDF Ontologies and ontology languages Challenges for the Semantic Web References 67/73
Logic and Proof • Deduction; checking a document against a set of rules • Add predicate logic and quantifiers • Logic + Digital Signature Proof Oh yeah! Prove it. You owe me $30. The check is in the email! {Purchased(user 1. book 1. AOL); www. confirm. com#t 1221122} {Priceof(book 1, $30); AOL-history. DB#t 29293910} {Purchase(a, b, c) & Priceof(b, d) Owes(a, c, d); www. ont. com/prodont} © Fraunhofer IPSI 68/73
Instead of a Conclusion: IPSI Projects Scalable Technology and Applications for the Semantic Web © Fraunhofer IPSI 69/73
Individualized Electronic Newspaper © Fraunhofer IPSI 70/73
Dictionary of Art © Fraunhofer IPSI 71/73
XML Broker: Integrating Web Resources via XML XSL Query <golfplatz id="platz 0001"> <adresse> [. . . ] </adresse> <policy>. . . </policy> <handicap> <wochentag>34</wochentag> <wochenende>34</wochenende> </handicap> </golfplatz> © Fraunhofer IPSI <golfdemo <golfplatz> <adresse>. . . </adresse> <greenfee>. . . </golfplatz> <wetter>. . . </wetter> <route>. . . </route> </golfdemo> XML Broker <www. wetter. de> <wetter> <plz>87724</plz> <datum>981001</datum> <temperatur>16</temperatur > <regen>90</regen> <wind>9</wind> <prognose>13</prognose> </wetter> <www. reiseplanung. de > <route> <von>53757</von> <nach>93333</nach> <entfernung>481. 9</entfernung> <fahrzeit>274</fahrzeit> <karte>5375793333. gif</ karte> </route> <!--. . . --> </www. reiseplanung. de > <!--. . . --> <www. wetter. de> 72/73
Virtual Gallery © Fraunhofer IPSI 73/73
64fca7f3525bdb429bb76c6095f5d5e3.ppt