Скачать презентацию Translational Medicine from a Semantic Web Perspective Eric Скачать презентацию Translational Medicine from a Semantic Web Perspective Eric

89f1c31f088f10206b0ac906cd0b421b.ppt

  • Количество слайдов: 90

Translational Medicine from a Semantic Web Perspective Eric Neumann W 3 C June 16, Translational Medicine from a Semantic Web Perspective Eric Neumann W 3 C June 16, 2006

Drug Discovery and Medicine • Health • Practice • Safety • Prevention • Privacy Drug Discovery and Medicine • Health • Practice • Safety • Prevention • Privacy • Knowledge Hygieia, G. Klimt 2

Data Expansion Large Data Sets Variables >> Samples Many New Data Types Combine Which Data Expansion Large Data Sets Variables >> Samples Many New Data Types Combine Which Formats? 3

Where Information Advances are Most Needed • Supporting Innovative Applications in R&D – Translational Where Information Advances are Most Needed • Supporting Innovative Applications in R&D – Translational Medicine (Biomarkers) – Molecular Mechanisms (Systems) – Data Provenance, Rich Annotation • Clinical Information – e. Health Records, EDC, Clinical Submission Documents – Safety Information, Pharmacovigilance, Adverse Events, Biomarker data • Standards – Central Data Sources • Genomics, Diseases, Chemistry, Toxicology – Meta. Data • Ontologies • Vocabularies 4

Knowledge “--is the human acquired capacity (both potential and actual) to take effective action Knowledge “--is the human acquired capacity (both potential and actual) to take effective action in varied and uncertain situations. ” How does this translate into using Information Systems better in support of Innovation? 5

Drug Discovery Challenges Knowledge Predictiveness • Knowledge of Target Mechanisms • Knowledge of Toxicity Drug Discovery Challenges Knowledge Predictiveness • Knowledge of Target Mechanisms • Knowledge of Toxicity • Knowledge of Patient-Drug Profiles 6

Current Challenges: Drug Discovery • Business – – Costly, lengthy drug discovery process (12 Current Challenges: Drug Discovery • Business – – Costly, lengthy drug discovery process (12 -14 years) Poor funding to find new uses for existing therapies (ie antibiotics) Insufficient economic drivers for certain disease areas Discovery and clinical trials design not well aligned with anticipating adverse effect detection • Post-launch surveillance is weak • Science & Technology – Counteracting the legacy of “Silos” – How to break away from the DD “conveyor belt model” to the “Translation model” • gaining and sharing insights throughout the process – The Benefit of New Targets for New Diseases – How to best identify safety and efficacy issues early on, so that cost and failure are reduced • A D 3 Knowledge-base: Drugability and Safety 7

The Big Picture - Hard to understand from just a few Points of View The Big Picture - Hard to understand from just a few Points of View 8

9 9

Complete view tells a very different Story 10 Complete view tells a very different Story 10

Distributed Nature of R&D Silos of Data… 11 Distributed Nature of R&D Silos of Data… 11

Existing Web Data Throttles the R&D Potential R&D Scientist Integrating Data Manually Static, Untagged, Existing Web Data Throttles the R&D Potential R&D Scientist Integrating Data Manually Static, Untagged, Disjoint Dolor Sit Amet Consectetuer Lacreet Dolore Euismod Volutpat Lacreet Dolore Magna Volutpat Nibh Euismod Tincidunt Aliguam Erat LIMS Bioinformatics Dolor Sit Amet Consectetuer Lacreet Dolore Euismod Volutpat Lacreet Dolore Magna Volutpat Nibh Euismod Tincidunt Aliguam Erat Cheminformatics Public Data Sources 12

Data Integration: Biology Requirements Papers Disease Proteins Genes Retention Policy Assays Compounds Audit Trail Data Integration: Biology Requirements Papers Disease Proteins Genes Retention Policy Assays Compounds Audit Trail Curation Ontology Experiment Tools 13

Semantic Web Data Integration R&D Scientist Dynamic, Linked, Searchable LIMS Bioinformatics Cheminformatics 14 Public Semantic Web Data Integration R&D Scientist Dynamic, Linked, Searchable LIMS Bioinformatics Cheminformatics 14 Public Data Sources

Raw Data MAGE ML Decision Support GO CDISC Bio. PAX Biomarker Qualification Translational Research Raw Data MAGE ML Decision Support GO CDISC Bio. PAX Biomarker Qualification Translational Research Psi XML ICH ASN 1. XLS SAS Tables Target Validation Semantic Bridge New Applications Safety CSV Toxicity 15

Key Technologies Pharmaceuticals use to Exchanging Knowledge 16 Key Technologies Pharmaceuticals use to Exchanging Knowledge 16

New Regulatory Issues Confronting Pharmaceuticals Tox/Efficacy ADME Optim from Innovation or Stagnation, FDA Report New Regulatory Issues Confronting Pharmaceuticals Tox/Efficacy ADME Optim from Innovation or Stagnation, FDA Report March 2004 17

Key Functionality • Ubiquity – Same identifiers for anything from anywhere • Discoverability – Key Functionality • Ubiquity – Same identifiers for anything from anywhere • Discoverability – Global search on any entity • Interoperability – => Application independence: “Recombinant Data” 18

Additional Functionality • Provenance – Origin and history of data and annotations • Scalability Additional Functionality • Provenance – Origin and history of data and annotations • Scalability – Over all potentially relevant data and content • Authentication/Security – – Single user and team identity and granular data security Non-repudiation of authorship Encryption of graphs Policy Awareness • Data Preservation – Long-term persistence by minimizing API needs 19

Translational Research and Personalized Medicine Biomedical Research Biological Clinical Translational Medicine -Two significant areas Translational Research and Personalized Medicine Biomedical Research Biological Clinical Translational Medicine -Two significant areas of HCLS activity - Span most areas of activity Clinical Research Clinical Practice Research Personalized Medicine 20 Practice

HCLS Framework: Biomedical Research • Molecular, Cellular and Systems Biology/Physiology – Organism as an HCLS Framework: Biomedical Research • Molecular, Cellular and Systems Biology/Physiology – Organism as an integrated an interacting network of genes, proteins and biochemical reactions – Human body as a system of interacting organs • Molecular Cell Biology/Genomic and Proteomic Research – Gene Sequencing, Genotyping, Protein Structures – Cell Signaling and other Pathways • Biomarker Research – Discovery of genes and gene products that can be used to measure disease progression or impacts of drug • Pharmaco-genomics – Impact of genetic inheritance on • Drug Discovery and Translational Research – Use of preclinical research to identify promising drug candidates 21

HCLS Framework: Clinical Research • Clinical Trials – Determination of efficacy, impact and safety HCLS Framework: Clinical Research • Clinical Trials – Determination of efficacy, impact and safety of drugs for particular diseases • Pharmaco-vigilance/ADE Surveillance – Monitoring of impacts of drugs on patients, especially safety and adverse event related information • Patient Cohort Identification and Management – Identifying patient cohorts for drug trials is a challenging task • Translational Research – Test theories emerging from pre-clinical experimentation on disease affected human subjects • Development of EHRs/EMRs for both clinical research and practice – Currently EHRs/EMRs focussed on clinical workflow processes – Re-using that information for clinical research and trials is a challenging task 22

Translational Research • Improve communication between basic and clinical science so that more therapeutic Translational Research • Improve communication between basic and clinical science so that more therapeutic insights may be derived from new scientific ideas - and vice versa. • Testing of theories emerging from preclinical experimentation on disease-affected human subjects. • Information obtained from preliminary human experimentation can be used to refine our understanding of the biological principles underpinning the heterogeneity of human disease and polymorphism(s). • http: //www. translational-medicine. com/info/about • Reference NIH Digital Roadmap activity 24

Personalized Medicine • Propagation of insights from Genomic research into clinical practice • Impact Personalized Medicine • Propagation of insights from Genomic research into clinical practice • Impact of new Molecular diagnostic tests hitting the market – How can they be incorporated into clinical care? – How does one update current clinical guidelines to incorporate the use of these tests – How can one enable novel clinical decision support? • How can phenotypic characteristics and genomic markers be used to: – Stratify patient populations – “Personalize” clinical care • Genetic test results as risk factors • Therapeutic use of genomic markers 27

Ecosystem: Current State Characterized by silos with uncoordinated supply chains leading to inefficiencies in Ecosystem: Current State Characterized by silos with uncoordinated supply chains leading to inefficiencies in the system Patients National Institutes Of Health Hospitals Patients, Public FDA Pharmaceutical Companies Center for Disease Control Payors Universities, Academic Medical Centers (AMCs) Biomedical Research Clinical Practice Clinical Research Organizations (CROs) Hospitals Doctors Patients Clinical Trials/Research 29 Patients Clinical Practice

Ecosystem: Goal State /* Need to expand this with Biomedical Research + Clinical Practice Ecosystem: Goal State /* Need to expand this with Biomedical Research + Clinical Practice */ Biomedical Research Clinial Practice /* Need to expand this to include Healthcare and Biomedical Research Players as well… Show an integrated picture with “continuous” information flow */ 30

Use Case Flow: Drug Discovery and Development Qualified Targets Lead Generation Lead Optimization Toxicity Use Case Flow: Drug Discovery and Development Qualified Targets Lead Generation Lead Optimization Toxicity & Safety KD Biomarkers Molecular Mechanisms Pharmacogenomics Clinical Trials 32

Drug Discovery & Development Knowledge Qualified Targets Molecular Mechanisms Lead Generation Toxicity & Safety Drug Discovery & Development Knowledge Qualified Targets Molecular Mechanisms Lead Generation Toxicity & Safety Lead Optimization Pharmacogenomics Biomarkers Clinical Trials 33 Launch

Semantic Web Drug DD Application Space sa fe Therapeutics ty Critical Path Chem Lib Semantic Web Drug DD Application Space sa fe Therapeutics ty Critical Path Chem Lib manufacturing NDA Production Genomics Clinical Studies HTS e. ADME Biology Compound Opt DMPK genes 35 Patent informatics

Opportunities for Semantics in Health. Care • Enhanced interoperability via: – Semantic Tagging – Opportunities for Semantics in Health. Care • Enhanced interoperability via: – Semantic Tagging – Grounding of concepts in Standardized Vocabularies – Complex Definitions • Semantics-based Observation Capture • Inference on Diseases – Phenotypes – Genetics – Mechanisms • Semantics-based Clinical Decision Support – Guided Data Interpretation – Guided Ordering • Semantics-based Knowledge Management 36

Data Semantics in the Life Sciences Pathways, Biomarkers Publications + data Image + Text Data Semantics in the Life Sciences Pathways, Biomarkers Publications + data Image + Text Categorical Taxonomic Data Items Text + data items Histology Profiling genomics Systems Biology Complex Objects with Categorical/ Taxonomic Data Items Gene expression Complex Objects Clinical Findings Composite Objects with Embedded “process” Clinical trials Unstructured Data Types Structured and Complex Data Types 37

RDB => RDF Virtualized RDF 39 RDB => RDF Virtualized RDF 39

Use-Case: COSA Row Semantic <rdf: type Subject> Column Semantic <rdf: type Gene> Data Set Use-Case: COSA Row Semantic Column Semantic Data Set 42

Use-Case: Experimental Design Definition Treatment W Cultured Cells Control Visible Microscopy Time Points Image Use-Case: Experimental Design Definition Treatment W Cultured Cells Control Visible Microscopy Time Points Image Analysis Staining Fluorescent Microscopy Treatment Z 43

Case Study: Drug Safety ‘Safety Lenses’ • Lenses can ‘focus data in specific ways Case Study: Drug Safety ‘Safety Lenses’ • Lenses can ‘focus data in specific ways – Hepatoxicity, genotoxicity, h. ERG, metabolites • Can be “wrapped” around statistical tools • Aggregate other papers and findings (knowledge) in context with a particular project • Align animal studies with clinical results • Support special “Alert-channels” by regulators for each different toxicity issue • Integrate JIT information on newly published mechanisms of actions 44

Example: Knowledge Aggregation 45 Courtesy of BG-Medicine Example: Knowledge Aggregation 45 Courtesy of BG-Medicine

Case Study: Omics Apo. A 1 … … is produced by the Liver … Case Study: Omics Apo. A 1 … … is produced by the Liver … is expressed less in Atherosclerotic Liver … is correlated with DKK 1 … is cited regarding Tangier’s disease … has Tx Reg elements like HNFR 1 Subject Verb Object 46

Scenario: Biomarker Qualification • Biomarker Roles – Disease – Toxicity – Efficacy • Molecular Scenario: Biomarker Qualification • Biomarker Roles – Disease – Toxicity – Efficacy • Molecular and cytological markers – Tissue-specific – High content screening derived information – Different sets associated with different predictive tools • Statistical discrimination based on selected samples – Predictive power – Alternative cluster prediction algorithms – Support qualifications from multiple studies (comparisons) • Causal mechanisms – Pathways – Population variation 48

Bio. Marker Semantics Disease Pathways +Samples Biomarker Set Significance & Strength 49 -Samples Bio. Marker Semantics Disease Pathways +Samples Biomarker Set Significance & Strength 49 -Samples

Scenario: Toxicity • Mechanisms – – – • Tissue-selective, Species-specific Pathways, Off-Targets Metabolites, PK Scenario: Toxicity • Mechanisms – – – • Tissue-selective, Species-specific Pathways, Off-Targets Metabolites, PK sensitivity Evidence – Biomarkers • In vitro assays (cell lines), Animal models, Clinical Phase 1 – Literature • Population Variation – – – • Potential vs. Demonstrated Predictions – – • Drug Metabolism to toxic forms (CYP, SULT, UGT) Target interaction variability Data Mining Patterns Computational Modeling Working Solutions – – – Chemical modifications Dosing, Reformulation Documented animal <=> human similarity and variation 50

Knowledge Mining using Semantic Web “Gene Prioritization through Data Fusion” - Aerts et al, Knowledge Mining using Semantic Web “Gene Prioritization through Data Fusion” - Aerts et al, 2006, Nature -Use of quantitative and qualitative information for statistical ranking. -Can be used to identify novel genes involved in diseases 51

Case Study: Bio. PAX (Pathways) <bp: PATHWAYSTEP rdf: ID= Case Study: Bio. PAX (Pathways) 52

Case Study: Bio. PAX (Pathways) <bp: PATHWAYSTEP rdf: ID= Case Study: Bio. PAX (Pathways) Dishevelled to GSK 3 beta IRREVERSIBLE-LEFT-TO-RIGHT INHIBITION 53

Case Study: Bio. PAX (Pathways) <bp: PATHWAYSTEP rdf: ID= Case Study: Bio. PAX (Pathways) Modulation Dishevelled to GSK 3 beta IRREVERSIBLE-LEFT-TO-RIGHT INHIBITION 54 affected. By CHIR 99102

Potential Linked Clinical Ontologies Clinical Obs Disease Descriptions SNOMED Applications CDISC ICD 10 RCRIM Potential Linked Clinical Ontologies Clinical Obs Disease Descriptions SNOMED Applications CDISC ICD 10 RCRIM (HL 7) Clinical Trials Disease Models ontology Mechanisms Pathways (Bio. PAX) IRB Tox Genomics Extant ontologies Molecules Under development Bridge concept 55

Case Study: Drug Discovery Dashboards • Dashboards and Project Reports • Next generation browsers Case Study: Drug Discovery Dashboards • Dashboards and Project Reports • Next generation browsers for semantic information via Semantic Lenses • Renders OWL-RDF, XML, and HTML documents • Lenses act as information aggregators and logic style-sheets add { ls: Thera. Topic hs: class. View: Topic. View } 56

Drug Discovery Dashboard http: //www. w 3. org/2005/04/swls/Bio. Dash Topic: GSK 3 beta Topic Drug Discovery Dashboard http: //www. w 3. org/2005/04/swls/Bio. Dash Topic: GSK 3 beta Topic Disease: Diabetes. T 2 Alt Dis: Alzheimers Target: GSK 3 beta Cmpd: SB 44121 CE: DBP Team: GSK 3 Team Person: John Related Set Path: WNT 57

Bridging Chemistry and Molecular Biology Semantic Lenses: Different Views of the same data Bio. Bridging Chemistry and Molecular Biology Semantic Lenses: Different Views of the same data Bio. Pax Components Target Model urn: lsid: uniprot. org: uniprot: P 49841 Apply Correspondence Rule: if ? target. xref. lsid == ? bpx: prot. xref. lsid then ? target. corresponds. To. ? bpx: prot 58

Bridging Chemistry and Molecular Biology • Lenses can aggregate, accentuate, or even analyze new Bridging Chemistry and Molecular Biology • Lenses can aggregate, accentuate, or even analyze new result sets • Behind the lens, the data can be persistently stored as RDF-OWL • Correspondence does not need to mean “same descriptive object”, but may mean objects with identical references 59

Pathway Polymorphisms • Merge directly onto pathway graph • Identify targets with lowest chance Pathway Polymorphisms • Merge directly onto pathway graph • Identify targets with lowest chance of genetic variance Non-synonymous polymorphisms from db-SNP • Predict parts of pathways with highest functional variability • Map genetic influence to potential pathway elements • Select mechanisms of action that are minimally impacted by polymorphisms 60

Knowledge Channels <item rdf: about= Knowledge Channels High Mda-7 expression promotes malignant cell survival and p 38 MAP kinase activation in chronic lymphocytic leukemia. http: //www. connotea. org/user/hannahr/uri/48 e 905 bdb 66310 af 85 ad 2 e 8503628 e 01 Posted by hannahr to CLLSignalling&#x 26; Processes on Thu Jan 19 2006 hannahr 2006 -01 -19 T 11: 24: 03 Z CLLSignalling&#x 26; Processes High Mda-7 expression promotes malignant cell survival and p 38 MAP kinase activation in chronic lymphocytic leukemia. A Sainz-Perez H Gary-Gouy 16408101 PMID: 16408101 2006 -01 -12 Leukemia 0887 -6924 61

Knowledge Channels <item rdf: about= Knowledge Channels High Mda-7 expression promotes malignant cell survival and p 38 MAP kinase activation in chronic lymphocytic leukemia. http: //www. connotea. org/user/hannahr/uri/48 e 905 bdb 66310 af 85 ad 2 e 8503628 e 01 Posted by hannahr to CLLSignalling&#x 26; Processes on Thu Jan 19 2006 hannahr 2006 -01 -19 T 11: 24: 03 Z CLLSignalling&#x 26; Processes P 38 paper Giles Day pf#P 38 pf#Kinases nugget This paper suggests a mechanism for P 38 protection of CLL B-cells N 251 expert High Mda-7 expression promotes malignant cell survival and p 38 MAP kinase activation in chronic lymphocytic leukemia. Giles Day A Sainz-Perez topic H Gary-Gouy pf#P 38 16408101 k. Channel PMID: 16408101 Pf#Kinases 2006 -01 -12 Leukemia 0887 -6924 62

Case Study: Drug Safety ‘Safety Lenses’ • Lenses can ‘focus data in specific ways Case Study: Drug Safety ‘Safety Lenses’ • Lenses can ‘focus data in specific ways – Hepatoxicity, genotoxicity, h. ERG, metabolites • Can be “wrapped” around statistical tools • Aggregate other papers and findings (knowledge) in context with a particular project • Align animal studies with clinical results • Support special “Alert-channels” by regulators for each different toxicity issue • Integrate JIT information on newly published mechanisms of actions 63

Gene. Logic Gene. Express Data • Additional relations and aspects can be defined additionally Gene. Logic Gene. Express Data • Additional relations and aspects can be defined additionally Diseased Tissue Links to OMIM (RDF) 65

Bar View of Gene. Express 66 Bar View of Gene. Express 66

Clin. Dash: Clinical Trials Browser Subjects • Values can be normalized across all measurables Clin. Dash: Clinical Trials Browser Subjects • Values can be normalized across all measurables (rows) Clinical Obs • Samples can be aligned to their subjects using RDF rules Expression Data • Clustering can now be done over all measureables (rows) 67

68 68

69 69

70 70

71 71

W 3 C Launches Semantic Web for Health. Care and Life Sciences Interest Group W 3 C Launches Semantic Web for Health. Care and Life Sciences Interest Group • Interest Group formally launched Nov 2005: http: //www. w 3. org/2001/sw/hcls • First Domain Group for W 3 C - “…take SW through its paces” • An Open Scientific Forum for Discussing, Capturing, and Showcasing Best Practices • Recent life science members: Pfizer, Merck, Partners Health. Care, Teranode, Cerebra, NIST, U Manchester, Stanford U, Alz. Forum • SW Supporting Vendors: Oracle, IBM, HP, Siemens, AGFA, • Co-chairs: Dr. Tonya Hongsermeier (Partners Health. Care); Eric Neumann (Teranode) 76

HCLS Objectives • Share use cases, applications, demonstrations, experiences • Exposing collections • Developing HCLS Objectives • Share use cases, applications, demonstrations, experiences • Exposing collections • Developing vocabularies • Building / extending (where appropriate) core vocabularies for data integration 77

HCLS Activities • • • Bio. RDF - data + NLP as RDF Bio. HCLS Activities • • • Bio. RDF - data + NLP as RDF Bio. ONT - ontology coordination Scientific Publishing - evidence management Adaptive Clinical Protocols and Pathways Clinical Trials 78

Bio. RDF: Neuro. Commons. org The Neurocommons project, a collaboration between Science Commons and Bio. RDF: Neuro. Commons. org The Neurocommons project, a collaboration between Science Commons and the Teranode Corporation, is creating a free, public Semantic Web for neurological research. The project has three distinct goals: 1. To demonstrate that scientific impact and innovation is directly related to the freedom to legally reuse and technically transform scientific information. 2. To establish a legal and technical framework that increases the impact of investment in neurological research in a public and clearly measurable manner. 3. To develop an open community of neuroscientists, funders of neurological research, technologists, physicians, and patients to extend the Neurocommons work in an open, collaborative, distributed manner. 79

Bio. RDF: Reagents RDF resources that describes various kinds of experimental reagents, starting with Bio. RDF: Reagents RDF resources that describes various kinds of experimental reagents, starting with antibodies: • Initial RDF that captures: Gene, the fact that this is an antibody, various kinds of pages about the antibody, such as vendor documentation, and any other properties that are explicitly captured in the source material • Work with the Ontology task force to identify appropriate ontologies and vocabularies to use in the RDF. • Write queries against the RDF to answer questions of the sort posed on the Alzforum's 80

Bio. RDF: NCBI • NCBI Data: URIs and as RDF • Terminology Integration: NLM’s Bio. RDF: NCBI • NCBI Data: URIs and as RDF • Terminology Integration: NLM’s UMLS, MESH – SNOMED • Olivier Bodensreider 81

Bio. RDF Neuro Tasks • Aggregate facts and models around Parkinson’s Disease • BIRN Bio. RDF Neuro Tasks • Aggregate facts and models around Parkinson’s Disease • BIRN / Human Brain Project • SWAN: scientific annotations and evidence • Use RDF and OWL to describe – ’Brain Connectivity' – N data in Sense. Lab euronal 82

What does RDF get you? • Structure is not format-rigid (i. e. tree) – What does RDF get you? • Structure is not format-rigid (i. e. tree) – Semantics not implicit in Syntax – No new parsers need to be defined for new data • Entities can be anywhere on the web (URI) • Define semantics into graph structures (ontologies) – Use rules to test data consistency and extract important relations • Data can be merged into complete graphs • Multiple ontologies supported 89

RDF vs. XML example Wang et al. , Nature Biotechnology, Sept 2005 AGML HUPML RDF vs. XML example Wang et al. , Nature Biotechnology, Sept 2005 AGML HUPML 90

RDF Stripe Mode Node>Edge>Node >Edge…. 91 RDF Stripe Mode Node>Edge>Node >Edge…. 91

RDF Graph 92 RDF Graph 92

gsk: KENPAL rdf: type : Compound ; dc: source http: //www. ncbi. nlm. nih. gsk: KENPAL rdf: type : Compound ; dc: source http: //www. ncbi. nlm. nih. gov/entrez/query. fcgi? cmd=Retrieve& db=pubmed& dopt=A bstract& list_uids=14698171 ; chem. ID “ 3820” ; clog. P “ 2. 4” ; k. A “e-8” ; mw “ 327. 17” ; ic 50 { rdf: type : IC 50 ; value “ 23” ; units : n. M ; for. Target gsk: GSK 3 beta } ; chem. Structure “C 16 H 11 Br. N 2 O” ; rdfs: label “kenpaullone” ; synonym “bromo-paullone” ; smiles “C 1 C 2=C(C 3=CC=CC=C 3 NC 1=O)NC 4=C 2 C=C(C=C 4)B” ; in. Ch. I “ 1/C 16 H 11 Br. N 2 O/c 17 -9 -5 -6 -14 -11(7 -9)12 -8 -15(20)18 -13 -4 -2 -1 -3 -10(13)16(12)1914/h 1 -7, 19 H, 8 H 2, (H, 18, 20)/f/h 18 H” ; xref http: //pubchem. ncbi. nlm. nih. gov/summary. cgi? cid=3820. 94

Multiple Ontologies Used Together Disease OMIM UMLS Group FOAF Disease Polymorphisms SNP Drug target Multiple Ontologies Used Together Disease OMIM UMLS Group FOAF Disease Polymorphisms SNP Drug target ontology Uni. Protein Bio. PAX Person Pub. Chem Patent ontology Extant ontologies Chemical entity 95 Under development Bridge concept

Case Studies 96 Case Studies 96

Case Study: Neuro. Commons. org • • Public Data & Knowledge for CNS R&D Case Study: Neuro. Commons. org • • Public Data & Knowledge for CNS R&D Forum Available for industry and academia All based on Semantic Web Standards 97

Neuro. Commons. org The Neurocommons project, a collaboration between Science Commons and the Teranode Neuro. Commons. org The Neurocommons project, a collaboration between Science Commons and the Teranode Corporation, is creating a free, public Semantic Web for neurological research. The project has three distinct goals: 1. To demonstrate that scientific impact and innovation is directly related to the freedom to legally reuse and technically transform scientific information. 2. To establish a legal and technical framework that increases the impact of investment in neurological research in a public and clearly measurable manner. 3. To develop an open community of neuroscientists, funders of neurological research, technologists, physicians, and patients to extend the Neurocommons work in an open, collaborative, distributed manner. 99

HCLS Neuro Tasks • Aggregate facts and models around Parkinson’s Disease • SWAN: scientific HCLS Neuro Tasks • Aggregate facts and models around Parkinson’s Disease • SWAN: scientific annotations and evidence • Use RDF and OWL to describe – – – Brain scans in the The Whole Brain Atlas Neural entries in NCBI’s Entrez Gene Database ’Brain Connectivity' N data in Sense. Lab euronal Neurological Disease entries in OMIM 102

Conclusions: Key Semantic Web Principles • • Plan for change Free data from the Conclusions: Key Semantic Web Principles • • Plan for change Free data from the application that created it Lower reliance on overly complex Middleware The value in "as needed" data integration Big wins come from many little ones The power of links - network effect Open-world, open solutions are cost effective Importance of "Partial Understanding" 104

What is the Semantic Web ? It’s Semantic Webs It’s Text Extraction It’s AI What is the Semantic Web ? It’s Semantic Webs It’s Text Extraction It’s AI It’s Web 2. 0 It’s Data Tracking It’s a Global Conspiracy • http: //www. w 3. org/2006/Talks/0125 -hclsig-em/ 106 It’s Ontologies

W 3 C Roadmap • Semantic Web foundation specifications – RDF, RDF Schema and W 3 C Roadmap • Semantic Web foundation specifications – RDF, RDF Schema and OWL are W 3 C Recommendations as of Feb 2004 • Standardization work is underway in Query, Best Practices and Rules • Goal of moving from a Web of Document to a Web of Data The Only Open and Web-based Data Integration Model Game in Town 107

The Current Web Ø What the computer sees: “Dumb” links Ø No semantics - The Current Web Ø What the computer sees: “Dumb” links Ø No semantics - treated just like Ø Minimal machineprocessable information 108

The Semantic Web Ø Machine-processable semantic information Ø Semantic context published – making the The Semantic Web Ø Machine-processable semantic information Ø Semantic context published – making the data more informative to both humans and machines 109

Google Graphs Ranking Sites based on Topology Associate Word frequencies with ranked sites 110 Google Graphs Ranking Sites based on Topology Associate Word frequencies with ranked sites 110

The Technologies: RDF • Resource Description Framework • W 3 C standard for making The Technologies: RDF • Resource Description Framework • W 3 C standard for making statements of fact or belief about data or concepts • Descriptive statements are expressed as triples: (Subject, Verb, Object) – We call verb a “predicate” or a “property” Subject Property 111 Object

What RDF Gets You Universal, semantic connectivity supports the construction of elaborate structures. 112 What RDF Gets You Universal, semantic connectivity supports the construction of elaborate structures. 112

Losing Connectedness in Tables Fast Uptake and ease of use, but loose binding to Losing Connectedness in Tables Fast Uptake and ease of use, but loose binding to entities and terms ? Casp 2 Colon 113 Casp 2 Endodermal

Data Integration? • Querying Databases is not sufficient • Data needs to include the Data Integration? • Querying Databases is not sufficient • Data needs to include the Context of Local Scientists • Concepts and Vocabulary need to be associated • More about Sociology than Technology Information Knowledge 114

Standards- Why Not? • Good when there’s a majority of agreement • By vendors, Standards- Why Not? • Good when there’s a majority of agreement • By vendors, for vendors? • Mainly about Data Packing-- should be more about Semantics (user-defined) • API dominated (Time trapped) • Ease and Expressivity • Too often they’re Brittle and Slow to develop • “They’re great, that’s why there are so many of them” 115

Data Integration Enables Business Integration: Efficiency and Innovation • • • Searching Visualization Analysis Data Integration Enables Business Integration: Efficiency and Innovation • • • Searching Visualization Analysis Reporting Notification Navigation 116

Searching… #1 way for finding information in companies… 117 Searching… #1 way for finding information in companies… 117