Скачать презентацию Knowledge and Provenance A knowledge model perspective Carole Скачать презентацию Knowledge and Provenance A knowledge model perspective Carole

925d306376c58027f1555a000b76858e.ppt

  • Количество слайдов: 40

Knowledge and Provenance: A knowledge model perspective Carole Goble, University of Manchester, UK Knowledge and Provenance: A knowledge model perspective Carole Goble, University of Manchester, UK

Talk roadmap What is this provenance about and for? Knowledge for Provenance Knowledge technologies Talk roadmap What is this provenance about and for? Knowledge for Provenance Knowledge technologies How do we represent knowledge for and about provenance? The Provenance of Knowledge Where do knowledge assertions come from?

my Context Knowledge-driven Middleware for data intensive in silico experiments in biology http: //www. my Context Knowledge-driven Middleware for data intensive in silico experiments in biology http: //www. mygrid. org. uk

Any and every experimental item attracts provenance (so long as you can ID it). Any and every experimental item attracts provenance (so long as you can ID it). • • • Experimental design components – workflow specifications; query specifications; notes describing objectives; applications; databases; relevant papers; the web pages of important workers, services Experimental instances that are records of enacted experiments – data results; a history of services invoked by a workflow engine; instances of services invoked; parameters set for an application; notes commenting on the results Experimental glue that groups and links design and instance components – a query and its results; a workflow linked with its outcome; links between a workflow and its previous and subsequent versions; a group of all these things linked to a document discussing the conclusions of the biologist

Provenance is metadata … • intended for sharing, retrieving, integrating, aggregating and processing. • Provenance is metadata … • intended for sharing, retrieving, integrating, aggregating and processing. • generated with the hope that it is comprehensive enough to be future-proofed. • recorded for those who we do not yet know will use the object and who will likely use it in a different way. • machine computational: free text of limited help. • Provenance is the knowledge that makes – An item interpretable and reusable within a context – An item reproducible or at least repeatable. • Its part of the information model of any system

Question: What ATPase superfamily proteins are found in mouse? 1. Q 9 CQV 8 Question: What ATPase superfamily proteins are found in mouse? 1. Q 9 CQV 8 O 70468 143 B_MOUSE from Swiss-Prot version Database query 30, 05/11/02, 16: 45 GMT, EBI server. (know-what) 2. O 70455, P 54775 143 B_MOUSE from Swiss-Prot version 29, 05/11/02 16: 45 GMT, local copy. 3. P 43686 and P 54775 derived by a distributed query over Virtual data products DB 1 and DB 2. (know-how) 4. Inter. Pro (no particular version) is a pattern database for protein superfamilies and domains for GPCR’s but you need Workflow an account. (know-how) 5. The publicly available workflow mouse ATPase (http: //www. somelab. edu/bio/carole/wf/3345. wsfl) will generate the result from data in your personal repository and Personalised profile you have permission to run the services it needs. Click to run (know-whom-to) it. 6. The Attwood lab expertise is in nucleotide binding proteins Collaboration & (ATPase superfamily proteins are nucleotide binding proteins). community (know 7. Jones published a new paper on this in Nature Genetics where, two weeks ago, and you have an account to access it on-line. know-when) 8. Smith in your lab asked this question yesterday and the answer he got is annotated by a commentary in his e-Log Book. Digital archive 9. P 43686 (human) calculated by applying the algorithm ABC (know-which) located at NCBI using data in database AAA Provenance (know-wherefrom) Replicas (know-which) Ontology and Inference (know-whether) Authorisation, Authentication and Accounting (know-who) Explanation (know-why) Annotation & notes (know-that)

Provenance forms mass = 200 decay = bb • Derivations – A path like Provenance forms mass = 200 decay = bb • Derivations – A path like a workflow, script or query. – Linking items, usually in a directed graph. – An explanation of when, who, how something produced. – Execution Process-centric • Annotations – Attached to items or collections of items, in a structured, semi-structured or free text form. – Annotations on one item or linking items. – An explanation of why, when, where, who, what, how. – Data-centric mass = 200 decay = ZZ mass = 200 decay = WW stability = 3 mass = 200 decay = WW stability = 1 Low. Pt = 20 High. Pt = 10000 mass = 200 decay = WW stability = 1 mass = 200 event = 8 mass = 200 decay = WW plot = 1 mass = 200 decay = WW stability = 1 event = 8 mass = 200 decay = WW stability = 1 plot = 1

Semantic discovery – services & workflows • Services and workflows in registry have RDF Semantic discovery – services & workflows • Services and workflows in registry have RDF and OWL A registry browser descriptions • Selection by the types of inputs they use, outputs they produce, the bioinformatics tasks they perform… • Querying using RDQL over RDF UDDI registry for operational metadata • Matching using Fa. CT OWL classification for concept-based metadata A workflow wizard

Provenance forms in my. Grid • Derivations – Free. Fluo Workflow Enactment Engine provides Provenance forms in my. Grid • Derivations – Free. Fluo Workflow Enactment Engine provides a detailed provenance record stored in the my. Grid Information Repository (m. IR) describing what was done, with what services and when – XML document, soon to be an RDF model • Annotations – Every m. IR object has Dublin Core provenance properties described in an attribute value model

Provenance of data • Operational execution trail Gene: AC 005412. 6 SNP: 000010197 input Provenance of data • Operational execution trail Gene: AC 005412. 6 SNP: 000010197 input run_for urn: Clare Jennings output process start time end time by_service lsid: HGVBase_retrieve

Provenance of knowledge • Declarative semantic execution trail contains_single_nucleotide_polymorphism Gene: AC 005412. 6 input Provenance of knowledge • Declarative semantic execution trail contains_single_nucleotide_polymorphism Gene: AC 005412. 6 input as stated by run_for urn: Claire Jennings SNP: 000010197 output process start time end time by_service lsid: HGVBase_retrieve

Provenance of knowledge urn: Carole Goble • Trust and attribution disputed by contains_single_nucleotide_polymorphism Gene: Provenance of knowledge urn: Carole Goble • Trust and attribution disputed by contains_single_nucleotide_polymorphism Gene: AC 005412. 6 input as stated by run_for urn: Claire Jennings SNP: 000010197 output process start time end time by_service lsid: HGVBase_retrieve

Provenance of knowledge • Aggregation and integration run_for urn: Bill Jones process start time Provenance of knowledge • Aggregation and integration run_for urn: Bill Jones process start time end time by_service lsid: BIGDbretrieve as stated by contains_single_nucleotide_polymorphism Gene: AC 005412. 6 input as stated by run_for urn: Claire Jennings SNP: 000010197 output process start time end time by_service lsid: HGVBase_retrieve

20, 000 feet and ground level Top Down provenance – What is going on? 20, 000 feet and ground level Top Down provenance – What is going on? – Unification and summaries of collective provenance knowledge. – Collaborative, Awareness, Experience base, Scientific Corporate memory. – “What projects have something to do with human SNPs? ” – “What experiments use the PSI-BLAST service regardless of version? ” Bottom Up provenance – Where did this data object http: //doh. dah. ac. uk/… come from? – Which version of Swiss. Prot was run in workflow http: /blah. ac. uk/…? User Trust Domain Experiment Execution Data Services Workflow Build up layers of provenance knowledge

Provenance for People and Machines Subjective People Experiment User Manual/ semi-automated Trust Services Domain Provenance for People and Machines Subjective People Experiment User Manual/ semi-automated Trust Services Domain Objective Data Contextual Execution Workflow Context-free Machines Automated

1. Explicitly capture Context Reuse methods and strategies (e. g. , protocols) Make explicit 1. Explicitly capture Context Reuse methods and strategies (e. g. , protocols) Make explicit the situational bias that is normally implicit Enable future generations of scientists to follow our work To capture meaning, we must devise a way of representing concepts and their relationships Hero http: //hero. geog. psu. edu/ Hero_knowledge_management. pdf Downloaded 301103

1. Explicitly capture Context Using models and terms that can be shared and interpreted 1. Explicitly capture Context Using models and terms that can be shared and interpreted that are extensible and preclude premature restrictions that are navigable and computationally processable Hero http: //hero. geog. psu. edu/ Hero_knowledge_management. pdf Downloaded 301103

2. Bridge islands of exported provenance Service 1 Workflow 1 Experimental Investigation 1 Service 2. Bridge islands of exported provenance Service 1 Workflow 1 Experimental Investigation 1 Service 2 Data 1

Not all exports are the same Service 1 Workflow 1 Experimental Investigation 1 Service Not all exports are the same Service 1 Workflow 1 Experimental Investigation 1 Service 2 Data 1

So we need to… • Uniquely identify items through URIs and Life Science Identifiers So we need to… • Uniquely identify items through URIs and Life Science Identifiers (GSH/GSR/Handle. net…) • Explicitly expose provenance by assertions in a common data model… • Publish and share consensually agreed ontologies so we can share the provenance metadata and add in background knowledge… • Then we can query, filter, integrate and aggregate the provenance metadata … • and reason over it to infer more provenance metadata using rules … • and attribute trust to the provenance … • Flexibly so that do not cast in stone models and terms, and so can cope with different degrees of description. What’s an Ontology? A common vocabulary of terms Some specification of the meaning of the terms Concepts, relationships, axioms A shared consensual understanding for people and machines

W 3 C Metadata language/model Resource Description Framework • Common model for metadata • W 3 C Metadata language/model Resource Description Framework • Common model for metadata • Assertions as triples (subject, predicate, object) forming graphs. • Associate URIs (LSIDs) with other URIs (LSIDs). • Associate URIs with OWL concepts (which are URIs). • RDQL, repositories, integration tools, presentation tools • Query over, Link together, Aggregate, Integrate assertions. • Avoids pre-commitment – – Data Workflow Experiment User Service Self-describing Incremental Extensible Advantage and drawback. Graphic based on Tim Berners-Lee http: //www. w 3. org/2003/Talks/0521 -www-keynote-tbl/slide 22 -0. html

Bridging islands Service 1 Workflow 1 Experimental Investigation 1 Service 2 Data 1 Bridging islands Service 1 Workflow 1 Experimental Investigation 1 Service 2 Data 1

Bridging islands: Concepts and LSID Service 1 Service 2 Workflow 1 RDF RDF RDF Bridging islands: Concepts and LSID Service 1 Service 2 Workflow 1 RDF RDF RDF Experimental Investigation 1 RDF Data 1

W 3 C Ontology language/model: OWL • Continuum of expressivity – Concepts, roles, individuals, W 3 C Ontology language/model: OWL • Continuum of expressivity – Concepts, roles, individuals, axioms – From simple frames to description logics – Sound and complete formal semantics – Compositional and property based • Reasoning to infer classification • Eas(ier) to extend and evolve and merge ontologies • A web language • Tools, tools! DAML OIL RDF DAML+OIL OWL

Bridging islands: Concepts and LSIDs Service 1 Service 2 Workflow 1 RDF RDF RDF Bridging islands: Concepts and LSIDs Service 1 Service 2 Workflow 1 RDF RDF RDF Experimental Investigation 1 RDF Data 1

Bridging islands: Concepts and LSIDs LSID Service 1 LSID Workflow 1 Service 2 RDF Bridging islands: Concepts and LSIDs LSID Service 1 LSID Workflow 1 Service 2 RDF LSID RDF LSID Experimental Investigation 1 LSID Data 1 LSID

Layers of Knowledge Languages Attribution Explanation Rules & Inference Ontologies Metadata Standard Syntax Identity Layers of Knowledge Languages Attribution Explanation Rules & Inference Ontologies Metadata Standard Syntax Identity Wedding cake courtesy of Tim Berners-Lee

my. Grid everything has a concept & LSID Workflows Literature Provenance record of workflow my. Grid everything has a concept & LSID Workflows Literature Provenance record of workflow runs Notes Ontologies People Data holdings Services

Linking objects to objects via URIs and LSIDs People who wrote the workflow Literature Linking objects to objects via URIs and LSIDs People who wrote the workflow Literature People to notify of the workflow status Provenance record of workflow runs Provenance of the workflow template. Related workflows. Notes Data holdings Ontologies describing workflows Services used

Generated link anchors Lymphocyte and neutrophil are subsumed by the concept white blood cell Generated link anchors Lymphocyte and neutrophil are subsumed by the concept white blood cell

Annotating a workflow log with concepts 5. Create the annotation 4. Provide a description Annotating a workflow log with concepts 5. Create the annotation 4. Provide a description 3. Select the concept 1. Choose the ontology 2. Select an area to annotate with

Generating provenance Data and metadata from the run RDF+OWL Scufl Workflow execution Template start. Generating provenance Data and metadata from the run RDF+OWL Scufl Workflow execution Template start. Time, end. Time, service instances invoked … RDF+OWL Identify workflow m. IR Input data & parameters OWL descriptions RDF registry Workflow knowledge template Bind services Free. Fluo WFEE Execution Provenance log Knowledge arising from workflow RDF+OWL

P Afflard et al The Grid(s)? @ Novartis presented at PRISM Pharma. Grid retreat, P Afflard et al The Grid(s)? @ Novartis presented at PRISM Pharma. Grid retreat, July 2003

William Pike, Ola Ahlqvist, Mark Gahegan, Sachin Oswal Supporting Collaborative Science through a Knowledge William Pike, Ola Ahlqvist, Mark Gahegan, Sachin Oswal Supporting Collaborative Science through a Knowledge and Data Management Portal in 1 st Semantic Web Conference (ISWC 2003) Workshop on Retrieval of Scientific Data, Florida, USA, October 2003

Two views of a gravity model concept from the Hero CODEX web tool William Two views of a gravity model concept from the Hero CODEX web tool William Pike, Ola Ahlqvist, Mark Gahegan, Sachin Oswal Supporting Collaborative Science through a Knowledge and Data Management Portal in 1 st Semantic Web Conference (ISWC 2003) Workshop on Retrieval of Scientific Data, Florida, USA, October 2003 • An ontological description shows how one geoscientist constructs a model • a social network reveals which users favour different instances of the model, with edge length suggesting the degree of support.

Collaboratory for Multi-Scale Chemical Science CMCS “Pedigree Graph” portlet showing provenance relationships between resources Collaboratory for Multi-Scale Chemical Science CMCS “Pedigree Graph” portlet showing provenance relationships between resources (colour coded by original relationship type). CMCS Pedigree Browser showing the metadata and relationships of the selected data set.

Provenance dimensions connected by concepts and identifiers project Services Workflow instances pr oj ec Provenance dimensions connected by concepts and identifiers project Services Workflow instances pr oj ec t Author workflow template Based on http: //www. w 3. org/2003/Talks/0521 -www-keynote-tbl/slide 22 -0. html

Reflections: annotations • Annotation metadata model for my. Grid holdings are a Graph – Reflections: annotations • Annotation metadata model for my. Grid holdings are a Graph – If it waddles like RDF and quacks like RDF, its RDF – Experiments in RDF scalability – Co-existence of RDF and other data models (relational) • Acquisition of annotations and adverts – Automated by mining WSDL docs, mining ws-info docs – Deep annotation works ok for bioinformatic service concepts (it’s an EMBL record) but… – Annotating with biologically meaningful concepts is harder • Data in the m. IR (it’s a lymphocyte) • Manual annotation cost is high! – Service/workflow publication tools • Dealing with change – Ontology changes; service changes; annotations change.

Random Thoughts • • • Where does the knowledge come from (see Luc)? How Random Thoughts • • • Where does the knowledge come from (see Luc)? How do we model trust (see Luc)? Scalability of Semantic Web technologies? Visualisation of knowledge (see monica)? What’s the lifecycle of provenance? Different knowledge models for different disciplines? knowledge • • • Layers of provenance Provenance that is domain knowledge Provenance for context vs execution workflow provenance People vs machine Different models for different items but still needs to be integrated • Technologies for sharing and integrating that are flexible.

Talk provenance • my. Grid http: //www. mygrid. org. uk – Jun Zhao, Mark Talk provenance • my. Grid http: //www. mygrid. org. uk – Jun Zhao, Mark Greenwood, Chris Wroe, Phil Lord, Chris Greenhalgh, Luc Moreau, Robert Stevens • Hero http: //hero. geog. psu. edu/ – William Pike, Ola Ahlqvist, Mark Gahegan, Sachin Oswal • Collaboratory for Multi-Scale Chemical Science CMSC – James D. Myers, Carmen Pancerella, Carina Lansing, Karen L. Schuchardt, Brett Didier • Chimera – Michael Wilde, Ian Foster • Knowledge Space – Novartis • And special thanks to Ian Cottam for heroic support when my laptop died yesterday. Afternoon.