Скачать презентацию Open Link Virtuoso — Faceted Views over Large-Scale Скачать презентацию Open Link Virtuoso — Faceted Views over Large-Scale

12cbcae50675b69a128b9331cc9d9828.ppt

  • Количество слайдов: 19

Open. Link Virtuoso - Faceted Views over Large-Scale Linked Data Orri Erling Program Manager Open. Link Virtuoso - Faceted Views over Large-Scale Linked Data Orri Erling Program Manager - Virtuoso Development Team, Open. Link Software © 2007 Open. Link Software, All rights reserved

Dimensions of Web Usage • Web 1. 0: Publishing for All (Citizen Publisher) via Dimensions of Web Usage • Web 1. 0: Publishing for All (Citizen Publisher) via Web Sites • Web 2. 0: Commentary for All (Citizen Journalist) via Blogging and Social Networks, with User Generated Content across Data Silos • Web 3. 0: Analysis for All (Citizen Analyst), via Linked Data enabling Data Mobility and Meshing, Your Data Is Your Statement, Applications Float on a Cloud of Data across a federation of HTTP accessible Data Spaces • Meanwhile, in the DBMS world, ad hoc data access and manipulation has consistently won over hard-coded alternatives e. g. : SQL over CODASYL, Today we see Linked Data as delivering the "ad hoc" factor in "best of both worlds" fashion, relative to alternatives (including RDBMS), across the Web and/or within Intranets & Extranets. © 2007 Open. Link Software, All rights reserved

The Challenges n Scale of Instance Data - 10^9 - 10^11 Triples n Scale The Challenges n Scale of Instance Data - 10^9 - 10^11 Triples n Scale of Ontology 100, 000's of Classes n Faceted Browsing, Text and Structure n Deployment and Provisioning © 2007 Open. Link Software, All rights reserved

It Is Not Only About The Warehouse n Up until now, you design the It Is Not Only About The Warehouse n Up until now, you design the warehouse for the application, load the data, make a data island n With Linked Data, the warehouse is self-filling, based on published data using terms from commonly shared vocabularies n Virtuoso facilitates the above by integrated RDFization middleware; you populate the warehouse as you query, and system simply gets smarter in line with your natural work patterns. © 2007 Open. Link Software, All rights reserved

It Is Not Only About Publishing Your Data n Having secrets does not mean It Is Not Only About Publishing Your Data n Having secrets does not mean using a secret language n Private environments still benefit from common vocabularies and terms n People and organizations publish anyway: Now it is about publishing for use in applications and integration, internet, extranet, intranet n Linked data and Virtuoso deliver on the Data Spaces concept: Express any statement for which there is a vocabulary and the data exposed by the statement can be found, joined and processed (e. g. Meshups). Basically, The network is the database. © 2007 Open. Link Software, All rights reserved

Solutions n Virtuoso 6, Single Server and Cluster Editions n SPARQL and SQL With Solutions n Virtuoso 6, Single Server and Cluster Editions n SPARQL and SQL With The Right Extensions for serious BI style analytics n Integrated Web Services Platform, Suite of RDF-izers (Extractor & LOD Cloud Lookup variants) n Server Hosted Facet Browsing Service (via REST API), Entity Ranking, Other Building Blocks for Web 2. 0 Style Development © 2007 Open. Link Software, All rights reserved

The lod. openlinksw. com Demo n 4. 2 GTriples on 2 Commodity Servers n The lod. openlinksw. com Demo n 4. 2 GTriples on 2 Commodity Servers n Full Text and Structured Querying n SPARQL End Point n Faceted Browsing Interface for Quick Discovery and Simple Report Composition n Usage Statistics across Source & Reference Graph IRIs, plus IFP and owl: same. As usage stats n Voi. D Graph Providing Rich Description of hosted Data Sets If Open. Link does not host it with enough capacity or the right data, you can procure your own infrastructure and get the software from us. From now on, anybody who chooses can be a search and analytics player. © 2007 Open. Link Software, All rights reserved

Technology n SPARQL Augmented With Run Time Inferencing n Entity Ranks for Better Search Technology n SPARQL Augmented With Run Time Inferencing n Entity Ranks for Better Search n Anytime Query Answering for Quick Approximate Results n A User Interface Combining Discovery and Query Building n Easy Web Services API's and SPARQL for Developing Applications © 2007 Open. Link Software, All rights reserved

Technology © 2007 Open. Link Software, All rights reserved Technology © 2007 Open. Link Software, All rights reserved

Run Time Taxonomies n No Materialization, Select Taxonomy At Query Time n Query Optimization Run Time Taxonomies n No Materialization, Select Taxonomy At Query Time n Query Optimization Knows About Class and Property Hierarchies © 2007 Open. Link Software, All rights reserved

Run Time Identity n Optionally Follow owl: same. As links n Optionally consider any Run Time Identity n Optionally Follow owl: same. As links n Optionally consider any two sharing an IFP to be the same n No materialization, Control same. As and IFP following at query time, at the triple pattern level n For Ad Hoc, Do Identity at Run Time n For Deep Analytics and Batch Processing, Normalize Identities at Load Time © 2007 Open. Link Software, All rights reserved

Entity Ranking n References and the Rank of the Referrer Contribute to Rank, as Entity Ranking n References and the Rank of the Referrer Contribute to Rank, as In Web Search n Can Customize Weight By Graph, Predicate n Can Run Ranks on Selected Subsets n Ranks Are Calculated in a Batch Run © 2007 Open. Link Software, All rights reserved

Entity Name Service n Autocompletion of URI's n Autocompletion of Label-Like Properties n Ranked Entity Name Service n Autocompletion of URI's n Autocompletion of Label-Like Properties n Ranked List of Synonyms n Statistics on Where a URI is Defined and Where it is Referenced © 2007 Open. Link Software, All rights reserved

Virtuoso Anytime Query Feature n Partial Results in Fixed Time n Useful for Interactive Virtuoso Anytime Query Feature n Partial Results in Fixed Time n Useful for Interactive Browsing, Query Development over large data sets (e. g. LOD Cloud) n On public SPARQL end points, Protects Against DOS, still giving samples of the answers n Metering of query resource utilization © 2007 Open. Link Software, All rights reserved

The LOD Cloud Faceted Search, Find, and Lookup Services n Access via Web Services, The LOD Cloud Faceted Search, Find, and Lookup Services n Access via Web Services, SPARQL n Developed in Virtuoso using SQL, SPARQL, and Stored Procedures n Part of Virtuoso 6. x Open Source Edition (Single Server Edition only) © 2007 Open. Link Software, All rights reserved

Experience n If Data In Memory, Interactive Time and Linear Scale n RDF Aware Experience n If Data In Memory, Interactive Time and Linear Scale n RDF Aware Query Optimizer is Key n Parallel Execution Engine 1 Thread/Query/Partition n For Generic Linked Data, RDF Representation With 4 Indices, plus Full Text Indexing on Literal Objects n For Specialized Tasks, SQL + Stored Procedures With Parallel Programming Model (of course output will be Linked Data) n Unlimited Cross Partition Joining, Near Full Platform Utilization, and not a problem with the right message flow © 2007 Open. Link Software, All rights reserved

Some Performance Data n Current Live Instance Setup: 2 Linux boxes with 2 x Some Performance Data n Current Live Instance Setup: 2 Linux boxes with 2 x 4 core Xeons each with 32 G RAM for a data set in excess of 4. 2 Billion Triples n 3. 2 Million Single Triple Lookups Per Second n Load Rates over 100 K Triples/sec n Entity Ranks for 4. 2 GTriples in 30 m/Iteration © 2007 Open. Link Software, All rights reserved

Deployment Rent of Buy? To Handle 10 GT 100% in RAM or 50 GT Deployment Rent of Buy? To Handle 10 GT 100% in RAM or 50 GT With Decent Working Set: n For Intermittent Use, 1 TB of RAM, 256 Virtual Cores at EC 2 is $1228 Per Day n For Purchase, Cluster of 1 TB RAM, 120 Nehalem Cores Lists Around $75 K * April 2009 US retail prices © 2007 Open. Link Software, All rights reserved

Conclusions n Applications exploiting open data access across heterogenous data sources at Web Scale Conclusions n Applications exploiting open data access across heterogenous data sources at Web Scale now within anyone's reach n Usable for public Web Sites or for in-house Business Analytics n Web enabled Open Data Access & Analysis for All! © 2007 Open. Link Software, All rights reserved