
ace8c88d990b49797c7da1221bcb3c13.ppt
- Количество слайдов: 24
Massive Scalability for RDF Storage and Analysis Presented by David Wood, CTO Tom Adams, Sales Engineer Andrew Newman, Software Engineer Tucana Technologies, Inc. Reston, Virginia USA May 2004 © Copyright Tucana Technologies, Inc. 2003 -2004. All rights reserved. T 004 v 03
Agenda • The Tucana Knowledge Server and Kowari • Where we fit • Performance metrics & scaling • Real-world deployment examples • Where are we headed? © Copyright Tucana Technologies, Inc. 2003 -2004. All rights reserved. 2
Tucana and Kowari • The Tucana Knowledge Server is a secure, distributed, scalable, transaction-safe, native RDF database. – – – – • Stores, manages and analyzes RDF data i. TQL/RDQL query language support Single instance scales to 1 B triples Federated query capability available JRDF & Jena AP I support Pluggable data models (full text, RDBMSs, etc) Commercial (academic licenses available) 100% Java 1. 4. 2 http: //www. tucanatech. com/ © Copyright Tucana Technologies, Inc. 2003 -2004. All rights reserved. 3
Tucana and Kowari • Kowari is the Open Source basis of the Tucana Knowledge Server – – – • MPL v 1. 1 No security, limited APIs/documentation, no pluggable data models Limited data types (string, URI, datetime, number) Limited scaling (>10 M triples on 32 -bit, >50 M on 64 -bit) No graph-based analysis algorithm support (graph segment matching) http: //www. kowari. org/ Colophon: Kowari is a small Australian marsupial and Tucana is a constellation in the Southern sky. © Copyright Tucana Technologies, Inc. 2003 -2004. All rights reserved. 4
Tucana Knowledge Server (TKS) in Enterprise Architecture © Copyright Tucana Technologies, Inc. 2003 -2004. All rights reserved. 5
Tucana Knowledge Server (TKS) Data Flow & Federation © Copyright Tucana Technologies, Inc. 2003 -2004. All rights reserved. 6
Tucana System Interfaces Data Sources Access • • RDF native • Structured data sources (e. g. RDBMS) via importation • Metadata from unstructured data sources via entity extractors • • XML or other tagged formats via XSLT • • Rich Site Summary (RSS) feeds • © Copyright Tucana Technologies, Inc. 2003 -2004. All rights reserved. • • 7 Web services (SOAP, WSDL) COM (ASP, etc. ) Java. Bean Java APIs JRDF & Jena JSP tag library XSLT Descriptors Query language Command line Web UI RDF/OWL editors/viewers via evolving industry APIs
Tucana Supported Platforms • Runs 64 - or 32 -bit (requires Java 1. 4. 2) • GNU/Linux on Intel or Opteron • Sun Solaris on SPARC or Intel (Opteron coming Dec ‘ 04) • Windows on Intel NT 4 – 2000 – XP – • Note: AIX, HP/UX, Mac OS X operational – Future support on roadmap based upon customer demand © Copyright Tucana Technologies, Inc. 2003 -2004. All rights reserved. 8
Performance Metrics • Read/Write comparisons to RDBMSs when storing RDF • Load performance • Query execution performance • Go triple crazy! © Copyright Tucana Technologies, Inc. 2003 -2004. All rights reserved. 9
Read/Write Comparison © Copyright Tucana Technologies, Inc. 2003 -2004. All rights reserved. 10
Load Performance © Copyright Tucana Technologies, Inc. 2003 -2004. All rights reserved. 11
Query Execution Performance © Copyright Tucana Technologies, Inc. 2003 -2004. All rights reserved. 12
Go triple crazy! • 32 bit: about 100 million statements (using explicit I/O, which is now the default on 32 bit platforms) • 64 bit: about a billion statements (using mapped I/O) © Copyright Tucana Technologies, Inc. 2003 -2004. All rights reserved. 13
Why do we scale? • Designed from the ground up to be scalable • Optimized for reads/very fast writing • Dealing with low level aspects of file system • Have lots of room for further speedups – • Drop indices, increase triple block size, flatten tree Bottlenecks Virtual memory limits of OS – Thread stacks – Sharing same area of VM – © Copyright Tucana Technologies, Inc. 2003 -2004. All rights reserved. 14
Real-World Deployment Examples • Business Needs Satisfied • Enterprise Software Company • Automobile Manufacturer • Genomics Research • Defense Integrator © Copyright Tucana Technologies, Inc. 2003 -2004. All rights reserved. 15
Business Needs Satisfied • Get answers to questions Inferencing and discovery – Change impact and dependency analysis – Variable views of data elements and their relationships – • Unify disparate information sources Metadata repositories – Unstructured information (MSOffice and PDF documents, email, content mgt, web pages, RSS sources/news feeds) – Other complex data sources – • Share and re-use knowledge – Within and between enterprises © Copyright Tucana Technologies, Inc. 2003 -2004. All rights reserved. 16
Enterprise Software Company • Critical Need: Provide automated document routing based on a business-specific ontology. • Solution: Classify documents against ontology, store classifications and ontology in the Tucana Knowledge Server and build multiple business applications on top. • Result: Standards-based metadata management unifies and delivers change impact analysis across a multi-application distributed, staged, software environment. © Copyright Tucana Technologies, Inc. 2003 -2004. All rights reserved. 17
Auto Manufacturer • Critical Need: Analyze quality test and measurement over time for trends Relying on entrenched vendor - a Tucana OEM – OEM tried RDBMS – not an option – • Solution: Embed Tucana Knowledge Server into OEM’s existing product for test and measurement. • Result: Enables high value trend analyses that have not been possible before for customer. © Copyright Tucana Technologies, Inc. 2003 -2004. All rights reserved. 18
Genomics Research • Critical Need: Collaborative project with big pharma Concerned with Oracle flexibility & “schema hell” – Need scalability & secure collaborative environment – • Solution: Rapidly analyze data in Tucana Knowledge Server using application they codevelop with integration partner. • Result: Deliver collaborative research system for use with strategic customer to accelerate joint discovery and competitive advantage to both companies. © Copyright Tucana Technologies, Inc. 2003 -2004. All rights reserved. 19
Defense Integrator • Critical Need: Intel agency overwhelmed with data Automated analysis to improve decision speed & accuracy. – Proto-type software does not scale / agency requires COTS – • Solution: Deploy Tucana Knowledge Server with metadata extraction incumbent (SRA Net. Owl) and scale to billions of records • Result: More automated analysis, faster accurate decisions against large data volumes on scalable COTS platform © Copyright Tucana Technologies, Inc. 2003 -2004. All rights reserved. 20
Analyze Disparate Data - Now Query Engine API RDF API RSS Feeds Full Text (Lucene) © Copyright Tucana Technologies, Inc. 2003 -2004. All rights reserved. RDF 21
Analyze Disparate Data - Soon Query Engine XPath SQL RDBMS © Copyright Tucana Technologies, Inc. 2003 -2004. All rights reserved. XML DB 22
Analyze Disparate Data - Soon Single Query RDBMS Other Data Sources Representation Note: Distributed queries already supported. © Copyright Tucana Technologies, Inc. 2003 -2004. All rights reserved. 23
Thank You David Wood (david@tucanatech. com) Tom Adams (tom@tucanatech. com) Andrew Newman (andrew@tucanatech. com) Tucana Technologies, Inc. http: //www. tucanatech. com/ http: //www. kowari. org/ © Copyright Tucana Technologies, Inc. 2003 -2004. All rights reserved. 24
ace8c88d990b49797c7da1221bcb3c13.ppt