
0c5d3c28bab97486f84a5704aaa9eb10.ppt
- Количество слайдов: 69
The Growing Semantic Web Dr. Mark Greaves Vulcan Inc. markg@vulcan. com © 2009 Vulcan Inc.
What is the Semantic Web? 4 Answers n The Web of Data – – 2 A fully distributed web-based system to publish logical assertions A way to link to someone else’s data, augment it, and add to it Democratic, crowd-based, scalable knowledge engineering The hottest area of Artificial Intelligence right now
What is the Semantic Web? 4 Answers n The Web of Data – – n A fully distributed web-based system to publish logical assertions A way to link to someone else’s data, augment it, and add to it Democratic, crowd-based, scalable knowledge engineering The hottest area of Artificial Intelligence right now The set of W 3 C standards for simple logic languages on the web – RDF, RDFS, OWL (Lite, DL, Full dialects), OWL 2 (EL, QL, RL profiles) – Weaker than First Order Logic, more easily authorable, decidable, tractable in most cases using tableaux provers 3
What is the Semantic Web? 4 Answers n The Web of Data – – n A fully distributed web-based system to publish logical assertions A way to link to someone else’s data, augment it, and add to it Democratic, crowd-based, scalable knowledge engineering The hottest area of Artificial Intelligence right now The set of W 3 C standards for simple logic languages on the web – RDF, RDFS, OWL (3 versions), OWL 2 – Weaker than First Order Logic, more easily authorable, decidable, tractable in most cases using tableaux provers n The largest formal knowledge base on Earth – Also the messiest formal knowledge base on Earth 4
What is the Semantic Web? 4 Answers n The Web of Data – – n A fully distributed web-based system to publish logical assertions A way to link to someone else’s data, augment it, and add to it Democratic, crowd-based, scalable knowledge engineering The hottest area of Artificial Intelligence right now The set of W 3 C standards for simple logic languages on the web – RDF, RDFS, OWL (3 versions), OWL 2 – Weaker than First Order Logic, more easily authorable, decidable, tractable in most cases using tableaux provers n The largest formal knowledge base on Earth – Also the messiest formal knowledge base on Earth n 5 A revolution in the way we think of data, crowds, and schemas – Scaling at last comes to symbolic AI – Massive, partial, participatory, logically weak, dynamic, schemalast
Talk Outline: The Growing Semantic Web n The Origins of the Semantic Web – DARPA’s DAML Program – RDF, OWL, and the Semweb Infrastructure n Semantic Web Evolution to 2009 n Semantic Web Transformation in three areas of application – Markets and Companies 6
Talk Outline: The Growing Semantic Web n The Origins of the Semantic Web – DARPA’s DAML Program – RDF, OWL, and the Semweb Infrastructure n Semantic Web Evolution to 2009 n Semantic Web Transformation in three areas of application – Markets and Companies 7
The Roots of the Semantic Web n Semantic technology has been a distinct research field for decades – Symbolic Logic (from Russell and Frege) – Knowledge Representation Systems in AI • • Semantic Networks (Bill Woods, 1975) US and EU R&D programs in information integration (80 s and 90 s) Development of simple tractable “description logics” for classification Conceptual Graphs and Formal Concept Analysis – Relational Algebras and Schemas in Database Systems n n 8 Library Science (classifications, thesauri, taxonomies) What’s new was the Web! – The material needed to answer almost any question is somewhere on the web So, What Sparked the Semantic Web? – A massive infrastructure of data servers, protocols, authentication systems, presentation languages, and thin clients that can be leveraged
The Beginnings of the US Semantic Web: DARPA’s DAML Program Problem: Computers cannot process most of the information stored on web pages Solution: Augment the web to link machinereadable knowledge to web pages Extend RDF with Description Logic Extensibility via frame-based language design Create the first fully distributed webscale knowledge base out of networks of hyperlinked facts and data Approach: Design a family of new web languages Basic knowledge representation (OWL) Reasoning (SWRL, OWL/P, OWL/T) Process representation (OWL/S) 9 Build definition and markup tools Link new knowledge to existing web page elements Test design approach with operational pilots in US Computers require explicit knowledge to reason with web pages eb c W TTP) ti H an em over S L (OW Links via URLs eb TP) g W r HT n ve isti Ex ML o L/X TM (H People use implicit knowledge to reason with web pages
What is RDF? n RDF defines entities and relations for an area of knowledge – – n Assertions are triples of (resource, property, resource) or (resource, property, value) Assertions are precise enough to be interpreted set-theoretically by machines Graphically represented as a directed concept graph with typed links Simple RDF semantics enables reuse of domain knowledge via class hierarchies RDF is web data language with web-friendly syntax – All elements in triples are either primitive values or URIs – RDF Schema adds classes, domains, ranges, and inheritance of domains and ranges has. Manufacturer Vehicle Manufacturer manufactures isa n ed. I ctur ufa man isa Truck Car Year Mustang isa Resource Property Value 1964 Graphic from Mary Pulvermacher, MITRE 10 Ford isa has. Manufacturer manu 1964 Mustang factur ed. In type. Of Joe’s Mustang has. VIN 1234
What Does OWL Add? n Description Logic Semantics – Relations between classes • Equivalent class (e. g. , US_President and Principal. Residentof. White. House) • Disjoint class (e. g. , Male and Female), sub. Class. Of – Derived classes (intersection. Of, union. Of, complement. Of) – Property characteristics (inverse. Of, transitive, symmetric, sub. Property. Of, equiv. Property, etc. ) – Range and Cardinality constraints (e. g. , birth. Mother has exactly one value, which is a person) n Generic (logically-grounded) ability to combine assertions into inferences according to the rules of the DL (for OWL, SHIQ ) Given. . . And. . . spouse. Of rdf: type Symmetric. Property 11 Can conclude. . . Mary spouse. Of Jim spouse. Of Mary Graphic from Mary Pulvermacher, MITRE
From XML to OWL Increasingly Expressive Options for Web Data Markup XML RDF and RDF Schema Issue addressed: how to express data in text? Issue addressed: how can data support statemen XML Solution: “wrap” data within start tag/end tags, RDF Solution: use a subject, property, object patte and empower users to create their own tags Example: Instance of the Class Example: Start tag End tag <Fighter rdf: ID=“F 16”> <altitude>50, 000 feet</altitude> <altitude>1500 feet</altitude> Properties <builder>Lockheed</builder> </Fighter> Unconstrained text string altitude element Values XML Schema (XMLS) OWL Issue addressed: how should the type structure of the data be expressed? XML Schema Solution: XML templates Example: <element name=“altitude” type=“integer”/> altitude is constrained to be an integer Issue addressed: how to express data semantics? OWL Solution: use inheritance and a description l to express restrictions and describe entailment Example: <owl: Class rdf: ID=“Fighter”> <rdfs: sub. Class. Of rdf: resource=“#Aircraft”/> </owl: Class> Fighter class: a Fighter inherits properties type of Aircraft 12 HTML XML & XMLS RDF OWL
DAML Program Elements Web Ontology Language (OWL) (2/10/04) n n OWL Reasoning Languages n 5 M $4 ~ er v y 5 o – Allows discovery, matching, and execution of web services based on action descriptions – Unifies semantic data models (OWL) 5) 0 DAML Program Technical Flow Web Ontology Language (OWL) rs ea – SWRL: Supports business rules and linking between OWL ontologies (based on Rule. ML) – OWL/P Proof Language: Allows software components to exchange chains of reasoning – OWL/T Trust Language: Represents trust that OWL and SWRL inferences are valid Semantic Web Services (OWL/S) 13 – Enables knowledge representation and tractable inference in a web standard format – Based on Description Logics and RDF – Done by the W 3 C Web. Ont Working Group 01 FY ( FY – SWRL: Rules OWL/P: Proof OWL/S: Semantic Web Services OWL/T: Trust Completed standards process Started standards process Unfinished Each DAML Program Element includes specifications, software tools, coordination teams, and use cases
The Semantic Web in 2009 Still Research Cutting Edge Mature “The Famous Semantic Web Technology Stack” 14
The Semantic Web in 2009 Active Research and Standards Activity Commercial Cutting Edge Mature “The Famous Semantic Web Technology Stack” 15
The Semantic Web in 2009 Active Research and Standards Activity Commercial Cutting Edge Mature “The Famous Semantic Web Technology Stack” 16
The Semantic Web in 2009 Active Research and Standards Activity Commercial Cutting Edge Mature “The Famous Semantic Web Technology Stack” 17
Completing the Semantic Web Picture Scalable Reasoning Systems Combined RDF/OWL and RDBMS Systems More Ontologies Tag Systems Microformats Social Authorship Description LP, SWRL, SILK. . . A Huge Base Active Research of RDF data and Standards Activity Commercial Cutting Edge Mature Other Technologies Impact the Semantic Web 18
Beyond RDF and OWL: 2009 Semantic Web Infrastructure User-layer Tools Server Infrastructure n Markup Languages n – HTML-friendly markup dialects: Microformats and RDFa – OWL 2 is a Candidate Recommendation n Triplestores and SPARQL Servers – Stores for 1 B triples now available, though with caveats around write performance – Commercial: Allegro. Graph, Virtuoso, Big. OWLIM, Oracle 11 g Semantic Technologies. . . – Open Source: 4 Store, Sesame, Redland. . . – Next step is parallel web delivery architectures n 19 Semantic Web Reasoners – Ontologies: Dublin Core, FOAF, SIOC. . . – Open. Source: Protégé, SWOOP. . . – Commercial: Top. Braid Composer, Schema n – Commercial: Oracle 11 g RDFS/OWL engine, Ontobroker, Semweb Data Generation – RDF / RDBMS front-ends – Entity extractors and taggers (Open. Calais) – Zemanta-type author’s assistants – Semantic wikis n Semweb Data Exploitation – Semweb search engines (Sindice, Watson, Falcon. . . ) – Yahoo Search. Monkey / Google Rich Snippets – Browser extensions and facets Entity Name Service (Okkam, DBpedia) n Vocabularies and Design Tools n Visualization Tools – Simile Project (http: //simile. mit. edu/)
State of Semantic Web Work in the US n DAML finished in 2005, with no followons – NIH (Protégé, NCBO), NSF, some small Do. D funding – PAL/CALO funded broader semantic/AI work n But. . . leading-edge Venture Capital moved in – Vulcan, Crosslink, In-Q-Tel, Benchmark, Intel Capital. . . n An emerging commercialization ecosystem – – n 20 Startup/Small: Radar, Metaweb, Evri, Adaptive. Blue. . . Midsized: Metatomix, Dow Jones, Reuters/Open. Calais, Franz. . . Large: Yahoo!, Google, Oracle, IBM, HP, Microsoft. . . Recession is taking its toll on everybody Emphasis is mostly Semantic dimension of Semantic Web – That was where the money and customers were – RDBMS scale and orientation, powerful analytics for Business Intelligence – Centralized workflows for ontology definition and management – Use cases surrounding corporate data integration and document
Semantic Web Work in the EU n Continuing Large Public-Sector Investments – Framework 6 (2002 -6) – More than € 100 M in several different programs – Framework 7 (2007 -13) – ~€ 1 B/year for information and communications technologies • Semantics is more present as a general systems technology • Future Internet and Digital Libraries thrusts n Two Dedicated Multi-site Semweb R&D Institutes – DERI: 100+ people and the world leader in research – Semantic Technology Institute International – A strong and growing cadre of graduate students n Emphasis on the Social and Web Dimensions of the Semantic Web – Web-scale Linked Data, social networks, simple scalable imperfect inference – Ontology and data dynamics, imperfections, versioning Clear – Semantically-boosted collaboration with limited knowledge engineer R&D leadership but lags in Semweb commercialization involvement 21
Talk Outline: The Growing Semantic Web n The Origins of the Semantic Web – DARPA’s DAML Program – RDF, OWL, and the Semweb Infrastructure n Semantic Web Evolution to 2009 n Semantic Web Transformation in three areas of application – Markets and Companies 22
Evolving Conceptions for the Semantic Web Initial Semantic Web Conception* n Semantic markup would be tightly associated with individual web pages – Entity extraction and ontologyaware NLP – Document segmentation technologies – Manual annotation tools n Need an all-encompassing ontology or set of logically compatible ontologies – Agreement was always elusive n 23 * By most people but not Tim Berners-Lee Need to achieve scale and consistency in semantic page D Se F/ m O an W t L ic ov W er eb H TT P Core problem is labeling freetext web pages with a (predefined) ontology markup vocabulary (R n ) – “Translate the Web for machines” – Markup lives in associated RDF/XML docs Links via URLs ) P eb T W HT xt er rte ov e L yp XM H / L TM (H
The Situation for Semantic Web and ML/DM in 2007 n Problem: Semantic Web will never see network effects until a sufficiently large amount of semantic markup/metadata is available n Solution: ML/DM techniques can generate consistent, costeffective semantic markup, either semi-automatically or fully automatically – Data/instance level: ML/DM derives semantic metadata for web resources (URIs) • Segmentation of heterogeneous pages (simple web pages to blogs/forums/Amazon. . . ) • IE-based and ontology-driven annotation – Standard text-oriented techniques (catgorization, clustering, partial parsing, relation extraction, topic extraction, fully-/semi-/un-supervised learning, etc. ) – Schema induction from the web (see Google Squared) – Leverage links, tags, deep web query results & wrapper induction, MIME-types, – Stylized and ungrammatical language use including telegraphic speech • Content cleaning and normalization • Duplicate entity recognition and merging (currently >6000 “George Bush” mentions. . . ) – Model level: ML/DM generates semantic annotation schema Pretty Derive RDF/OWL vocabularies, taxonomies, lexical nets, ontologies from web • Standard ML/DM Stuff: Find some data, clean it, data annotate it, categorize/cluster it, and map it 24 • Propose ontology alignments, restrictions, and mapping rules
Evolving Conceptions for the Semantic Web (Part 2) The Web is a publishing platformal knowledge as well as pages . . . n 25 Core problem is maintaining a set of evolving and partial agreements on semantic models and labels – Semantic Web rule languages (SWRL, SILK, etc. ) can be used to specify mappings and symbolic relations – Overall semantic cohesion is increased as more data is mapped together ) – RDFa allows mixed webtext and RDF – RDF/OWL documents can be published by web-connected databases – Huge numbers of knowledge publishers – Linkage via owl: same. As, owl: equivalent. Class, owl: equivalent. Property, skos: exact. Match, etc. Links via URLs D Se F/ m O an W t L ic ov W er eb H TT P n Semantic Data Publishers (publication of RDF and SPARQL endpoints) (R The Semantic Web in 2009 RDFa ) P eb TT t. W r. H ex ve rt o pe ML Hy /X L TM (H
What does this mean for ML/DM on the Semantic Web? The Semantic Web is developing into a web-scale social platform for data integration and fusion n Semantic Web = “The World Wide Database” – Anyone can read, anyone can write, with very loose version control – “Schema-last” data engineering • The semweb is a database with a constantly-changing Entity/Relation diagram • High data flexibility at the cost of non-optimized queries – Most current Semantic Web data originates in RDBMSs via RDF and web servers n ML/DM on the Semantic Web is really about extending, cleaning, and making useful the largest and most dynamic database in the world n Opens the door to a more collaborative type of semantic annotation – 2007 idea: SW would be tied to the text web, and so annotation systems were key • Open. Calais is currently gaining a dominant position • Thompson/Reuters is also getting a vast amount of data about entities 26 – 2009 idea: SW is (mostly) an issue of translating relational DB semantics
Before we continue: 2 challenges about the Semantic Web n The Semantic Web is not consistent – Semantic Web data is published by many people who have not read the specifications • The extant Semantic Web is a mix of formal and informal usages • Consistent usage is only found in islands – Semantic structures are common and often unmaintained • There is substantial non-RDF semantic data describing the same concepts (XML with reasonable offline semantic models, microformats, tag data, link data, etc. ) • Sindice catalogs >36000 uses of the term “publication” in RDF, tags, RDFa, microformats); how do you choose? – Semantic Web rules are currently rare, but use is growing n RDF/OWL has very limited expressivity for quantities – N-ary relations are awkward to express • Reification is only a partial solution • W 3 C recommends class-based accounts of relations but this is clumsy – RDF/OWL core semantics are truth-functional, with no good support for units, probabilities, weights, vectors, uncertainty, etc. 27
Talk Outline: The Growing Semantic Web n The Origins of the Semantic Web – DARPA’s DAML Program – RDF, OWL, and the Semweb Infrastructure n Semantic Web Evolution to 2009 n Semantic Web Transformation in three areas of application – Markets and Companies 28
First Generation Semantic Web Applications Semantically-Boosted Search and Classification n A really old problem type – Semantics as the keystone technology for unstructured Information Retrieval – Requires powerful NLP and document interpretation systems • Often also requires powerful semantic representations (e. g. , events or causality) • Can use semantic web KR but usually augments it n Market Segments and Players – Enterprise Document Management (EDM) and search systems (Autonomy, Endeca. . . ) – Email autoclassifiers and inbox managers (Xiant) – Web question answering: Hakia, Powerset, True. Knowledge, Cycorp (in. Cyc). . . – Semantics for Search Result Presentation: Yahoo! Search. Monkey n 29 Some lessons with applying semantic web technology in this space – Still waiting for a compelling match between technical capability and business need
Semantic Web in Search and Presentation n As far as I know, no one is yet using Semantic Web data for query interpretation or relevance judgments on the web – Some issues are scale and trustworthiness (the lesson of the HTML <META> tag) – Powerset used to use DBpedia data in its knowledge base n Yahoo! Search. Monkey (see also Google’s Rich Snippets) use semantic data to enhance presentation – Grease. Monkey-style presentation reformatting for search snippets – Yahoo’s crawler indexes and interprets RDFa, microformats, delicious data, etc. – Display URL as an enhanced result, with standard or custom presentations – Incentives: “Structured data is the new SEO” (Dries Buytaert, Drupal) Alex Moskalyuk | Facebook Alex Moskalyuk is on Facebook. Join Facebook to connect with Alex Moskalyuk and others you may know. Facebook gives people the power to share and makes the. . . www. facebook. com/. . . /Alex-Moskalyuk/100000220291330? . . . - Cached - Similar 30
Second Generation Semantic Web Applications Strategic Enterprise Information Technology n An only slightly newer problem type – Business exploitation of structured enterprise data (RDBMS, Spreadsheet, ERP data) • Backwards to Data Management to reduce cost of managing, migrating, integrating • Forwards to Business Process Management – Support for unified query, analytics, and application access • Includes SOA integration, Enterprise Application Integration n Markets Segments and Players – Gartner estimates that EII software and services alone is $14 B/year, with 40% growth over 5 years (pre-recession numbers, though) – Very complex market space includes EAI, Entity Analytics, MDM, BI, BPM, CPM. . . – Huge entrenched players (IBM, SAP, Oracle. . . ) and major consulting shops n 31 Some lessons with applying semantic web technology in this space – Fundamental problem is understanding the semantics from legacy systems, not in KR
Layers of Semantic Data in Strategic Enterprise IT Semantic data modeling MDM – Master Data Management Automatic data management Inc rea EII - Information integration leadership sin g V alu BI -- Better business intelligence e o f Se ma Financial modeling & intelligence ntic Da ta M Semantic process modeling ode ls n BAM and CPM w/ predictive analytics Sales Pitch for Semantic Data Models – Promote flexibility and improvisation in the face of dynamism – Expose business processes as rules, for governance and compliance – Can be driven all the way through the architecture, from SOA to CPM dashboards 32
Semantic Web in Strategic Enterprise IT n A few companies are using semantic web technology in Enterprise IT – Majority of revenue is usually from consulting engagements, not software products or services – Carefully-curated external ontologies with logic engines (e. g. F-Logic), are used for high-end data integration – RDF has been useful as a KR language n General Data outsourcing – Do for general data what Bloomberg and Capital-IQ do for financial data • Collect data, clean it, add value, sell as a service – Semweb-driven variant of Wolfram Alpha • Ultra-calculator (Mathematica) combined with a massive handcurated almanac • Similar to Google’s special computations, but much more powerful • Displays the solution results using templates from Mathematica’s visualization tools – Semantic web has huge amounts of data, but cleaning, maintaining, and linking it has been not cost effective • ML/DM techniques on semantic web data are in use at Evri and Freebase 33 – [Unnamed Startups] use ML/DM and Semantic Web for
Third Generation Semantic Web Applications Web 2. 0 and the Socio-Semantic Web n A new problem type – “Semantic Web should allow people to have a better online experience” – Alex Iskold, Adaptive. Blue – Enhance the human activities of content creation, publishing, linking my data to other data, socializing, forming community, purchasing satisfying things, browsing, etc. – Improve the effectiveness of advertising n Market Segments and Players – Mashup systems and consumer-oriented semantic web services (Drupal, Ning, . . . ) – Semantic enhancements to blogs and wikis (Zemanta, Faviki, Ontoprise, Radar, . . . ) – Semantics for Social Networking (My. Space RDF service and microformats, Facebook RDF models, etc. ) n 34 Some lessons with applying semantic web technology in this space – If we don’t have semantic convergence, then semantics isn’t a differentiator
Web Knowledge Bases: Metaweb and Freebase Massive amounts of almanac-style RDF data (Creative Commons license) that is readily available from partners n Social authoring tools and wiki-style consensus combined with controlled reconciliation by Metaweb personnel n 35
Semantic Blogger Support: Zemanta n Automatic link, image, keyword, tag suggestions for blogs and email – Average semi-professional blogger spends ~20 mins adding “decorative” content Key technology is fast NLP and categorization, plus personalization and mining of aggregated choice streams n Accuracy is guaranteed because users explicitly add the suggestions (feedback) n 36 – Zemanta inserts RDFa and standard semantic markup in the background – Includes user specified friends/feeds/photos/etc as well as standard ones
Third Generation Example: Semantic Wikis n Wikis are tools for Publication and Consensus n Media. Wiki (software for Wikipedia, Wikimedia, Wikibooks, etc. ) – Most successful Wiki software • High performance: 10 K pages/sec served, scalability demonstrated • LAMP web server architecture, GPL license – Publication: simple distributed authoring model • Wikipedia: >2. 9 M English articles, 400 K Russian, >2. 5 M images, #8 Alexa traffic rank – Consensus achieved by global editing and rollback • Fixpoint hypothesis, although consensus is not static • Gardener/admin role for contentious cases n Semantic Wikis apply the wiki idea to structured (typically RDFS) information – Authoring includes instances, data types, vocabularies, classes – Natural language text used for explanations – Automatic list generation from structured data, basic analytics, database imports – Reuse of wiki knowledge Semantic Wiki Hypotheses: – See e. g. , http: //smwforum. ontoprise. com for one powerful semantic wiki (1) Significant interesting Semantic Data can be collected cheaply Wiki mechanisms can be used to maintain consensus on vocabularies and cla 37
An Example of (Manually Annotated) Semantic Wiki 38
Semantic Media. Wiki and Rules n SMW Simple Rules – Implement basic rules – Automates some authoring and data maintenance – Templates to help users create rules – Publishable in SWRL n SMW Importers/Exporters – – 39 Web Services Vocabulary and data files MS Office and Excel Docs Semantic Web representations
More complex: National Cancer Institute (NCI) Thesaurus 40
Using Semantic Web data from Wikis: Project Halo n Project Halo: SME-based authoring for scientific question-answering systems n Project Halo Goal: To determine whether tools can be built to facilitate robust knowledge formulation, query and evaluation by domain experts, with everdecreasing reliance on knowledge engineers – Can SMEs build robust question-answering systems that demonstrate excellent coverage of a given syllabus, the ability to answer novel questions, and produce readable domain appropriate justifications using reasonable computational resources? – Will SMEs be capable of posing questions and complex problems to these systems? – Do these systems address key failure, scalability and cost issues encountered in expert systems? n Experimental Scope: Selected portions of the AP syllabi for chemistry, biology and physics – Example: Balance the following reactions, and indicate whether they are examples of combustion, decomposition, or combination 41 (a)C 4 H 10 + O 2 CO 2 + H 2 O (b)KCl. O 3 KCl + O 2 (c) CH 3 CH 2 OH + O 2 CO 2 + H 2 O (d)P 4 + O 2 P 2 O 5 (e)N O + H O HNO
AURA – Automated User-centered Reasoning and Acquisition System Aura is a tool to help users formalize high school-level scientific knowledge n Aura can then reason with that knowledge n Users can ask English questions and get answers and explanations n 42
Symbiosis Between Aura and Semantic Wikis n Classical Knowledge Engineering n – Expressive knowledge representation – Sophisticated testing and debugging n Knowledge Engineering in Aura Semantic Web Knowledge Engineering – Simple knowledge representation – Quantity at some expense of quality n Knowledge Engineering in SMW+ – Acquires knowledge for deductive Q/A that can be used for answering AP questions in sciences – Tool for online authoring and consensus-building around semantic web content • Uses a DL-style class taxonomy, – Captures knowledge at the level of and logic programming style rules RDFS with many extensions – Collective editing for quality control – Requires substantial hours of – Gardening appropriate for scientific training for knowledge formulation knowledge n Can we use the Semantic Media Wiki to capture knowledge that could – Almost walk up and use system be used for Q/A in AURA? – Factual knowledge (e. g. , atomic number for carbon is 6, solubility constraints, etc. ) – Taxonomic knowledge (e. g. , eukaryotic and prokaryotic are two types of cells) n 43 Knowledge creation would be faster, distributed, and cheaper
Example: Wikipedia Article on Organelle 44
Source Text of Article on Organelle in Semantic Media. Wiki 45
Fact Box With the Annotations in Semantic Media. Wiki 46
Ontology Browser for Biology Data in Semantic Media. WIki 47
Aura/Semantic Media. Wiki Use Case n Semantic Media. Wiki includes relevant knowledge Aura knowledge formulation engineer (KFE) searches for knowledge during knowledge formulation n The KFE notices useful information in SMW+ n The KFE maps the knowledge into Aura n – Uses a derivative of Ontomap – Mapping suggestions from FOAM (Framework for Ontology Alignment and Mapping) from Univ. Karlsruhe AIFB – ETL workflow 48 n The knowledge is mapped into Aura and is
AURA User Searches for Information 49
Aura User Notices Useful Information in Wiki 50
Aura User Maps Wiki Knowledge into Aura KB (with Foa. M proposals) 51
Wiki Knowledge Available in Aura for Question. Answering 52
Talk Outline: The Growing Semantic Web n The Origins of the Semantic Web – DARPA’s DAML Program – RDF, OWL, and the Semweb Infrastructure n Semantic Web Evolution to 2009 n Semantic Web Transformation in three areas of application – Markets and Companies 53
Fourth Generation Semantic Web The Web of Data meets the Future Internet n A problem of scale – The number of Internet devices is starting to explode (again!) • Mobile devices, embedded systems, and sensors • In 2008, Google reported 1 trillion unique URLs, ~200 M web sites • Total 2008 web page estimates are ~30 billion (significant variation in these estimates) – Gartner (May 2007, Report G 00148725) • "By 2012, 70% of public Web pages will have some level of semantic markup, 20% will use more extensive Semantic Web-based ontologies” – Can Semantic Web technologies work at web scales? • Sindice (www. sindice. com) is now indexing >10 B triples/microformats over Material from Frank van Harmelen, 65 M pages Vrije Universiteit, Amsterdam • 20% of 30 billion pages @ 1000 triples per page = 6 trillion triples • 30 billion and 1000 are underestimates n 54 Some lessons with applying semantic web technology in this space – Can we exploit billions of triples, microformats, ontologies, rules, and services?
Fourth Generation Example: DBpedia n Mine Wikipedia for assertions – Mainly from Wikipedia Factboxes • ~23 M triples – Category assertions n DBpedia 3. 3 dataset (May 09 Wikipedia) – ~2. 6 M things, ~274 M triples • 213 K persons, 328 K places, 57 K music albums, 36 K films, 609 K links to images, 3. 2 M links to relevant external web pages, 4. 9 M links into RDF datasets – Classifications via Wikipedia categories, YAGO, and Word. Net synsets – One of the largest broad knowledge bases in the world n 55 Simple queries over extracted data – “Things near the Eiffel Tower” – “The official websites of companies with more than 50000 employees”
DBpedia for Users DBpedia Mobile Query Wikipedia like a database 56
Fourth Generation Example: Linked Open Data n Build an RDF Data Commons – Create a single, simple set of rules for publishing and linking RDF data • Use URIs as names for things • Use HTTP URIs so that people can look up those names • When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL) • Include links to other URIs, so that they can discover more things – Set RDF links between data items from different data sources n 57 May 2009 LOD dataset – ~4. 7 B triples, and ~140 M RDF interlinks, and growing faster than I can track – Database linkage means that LOD will soon be
The Growing Web of Linked Data May 2007 September 2008 58 September 2007 March 2009
Topic Distribution in the March 2009 Linked Datasets Music Online Activities Publications Geography Cross-Domain Life Sciences 4, 500, 000 triples 180, 000 data links 59
Common Tag Specification n Instead of tagging with language terms, tag with terms + RDFa – Distinguish between “jaguar” the animal, the car company, and the operating system – Provides metadata for each Common Tag and relations to other Common Tags 60 • The Barack Obama Common Tag includes <employment, President of the United States> and <spouse, Michelle Obama>
Semantic Dynamism at Web Scale n Semantics are always changing – Per minute, there are: • • 100 edits in Wikipedia (144 K/day) 200 tags in del. icio. us (288 K/day) 270 image uploads to flickr (388 K/day) 1100 blog entries (1. 6 M/day) – Will the Semantic Web be less dynamic? n There is no “right ontology” – Ontologies are abstractions • Different applications lead to different ontologies • Ontology authors make design choices all the time – Google Base: >250 K schemas – “Ontologies = Politics” n Intentionally false material (Spam) – Lesson of the HTML <META> tag 61 Material from Denny Vrandečić, AIFB How Do We Use this Dynamic Data for Decision Support?
Fourth Generation Application: The Large Knowledge Collider n EC Framework 7 Program – Lead partners: Univ. Innsbruck and Vrije University Amsterdam, plus 12 partners n Goals of Lar. KC – Scaling to Infinity – A platform for massive distributed incomplete reasoning – Remove the scalability barriers of currently existing reasoning systems for the Semantic Web – Combine reasoning/retrieval and search n Reasoning pipeline – Want to trade off answer quality – and answer timeliness Heavy emphasis on probability, decision theory, anytime algorithms – Plugin architecture, with sampling – Explicit cost models n 62 Public releases of Lar. KC platform, with APIs
Fourth Generation Application: The Large Knowledge Collider n 63 EC Framework 7 Program n Reasoning pipeline – Lead partners: Univ. Innsbruck – Heavy emphasis on probability, and Vrije University Amsterdam, decision theory, anytime plus 12 partners algorithms n Exploiting web-scale semantics is the new frontier – Plugin architecture, with – Generations 1 and 2 used web resources to support classical KR n Goals of Lar. KC – Scaling to sampling approaches Infinity – Explicit cost models – A platform for massive Generation 3 (social semantic web) leverages web social patterns – for KR distributed incomplete reasoning n Encourage participation through Thinking@home n Fourth generation applications address general web– Remove the scalability barriers of currently existing reasoning scale KR systems for the Semantic Web. n Public releases of Lar. KC – Combine reasoning/retrieval and platform, with APIs search
Fourth Generation Application: The Large Knowledge Collider n n n 64 EC Framework 7 Program n Reasoning pipeline – Lead partners: Univ. Innsbruck – Heavy emphasis on probability, and Vrije University Amsterdam, The real money in semantics will decision theory, anytime be made in plus 12 partners apps/tools that exploit web-scale algorithms data – Plugin architecture, with – The cost of semantic data creation is going to zero Goals of Lar. KC – Scaling to sampling – The size of semantic data is going to web-scale Infinity – Explicit cost models – this scale, logic-based techniques break down At A platform for massive distributed incomplete reasoning n Encourage participation ML/DM are key technologies to allow this exploitation through Thinking@home – Remove the scalability barriers of to currently existing reasoning happen systems for the Semantic Web. n Public releases of Lar. KC – Combine reasoning/retrieval and platform, with APIs search
Summing up: The Growing Semantic Web n In mid-2004. . . – RDF and OWL had just been standardized – Advances were made via traditional corporate/public R&D programs – The first wave of semantic web startups (many of which have since failed) – Implementations were technically very sophisticated, but fully custom and had no web involvement – A few early conferences (ISWC, Sem. Tech) and session tracks 65
Summing up: The Growing Semantic Web n In mid-2004. . . – RDF and OWL had just been standardized – Advances were made via traditional corporate/public R&D programs – The first wave of semantic web startups (many of which have since failed) – Implementations were technically very sophisticated, but fully custom and had no web involvement – A few early conferences (ISWC, Sem. Tech) and session tracks n Now in 2009. . . – The Semantic Web is the most exciting thing happening on the web – RDF assertions scaling into the billions, with little to no programmatic control – Search engines are starting to develop products – Bestbuy is publishing products, store descriptions, and hours in RDFa 66
Summing up: The Growing Semantic Web n In mid-2004. . . – RDF and OWL had just been standardized – Advances were made via traditional corporate/public R&D programs – The first wave of semantic web startups (many of which have since failed) – Implementations were technically very sophisticated, but fully custom and had no web involvement – A few early conferences (ISWC, Sem. Tech) and session tracks n Now in 2009. . . – The Semantic Web is the most exciting thing happening on the web – RDF assertions scaling into the billions, with little to no programmatic control – Search engines are starting to develop products – Bestbuy is publishing products, store descriptions, and hours in RDFa 67 n ML/DM is at the center of the needed techniques
Hvala (Thank You) 68 Disclaimer: The preceding slides represent the views of the author only. All brands, logos and products are trademarks or registered trademarks of their respective companies.
Google Squared Back 69
0c5d3c28bab97486f84a5704aaa9eb10.ppt