Скачать презентацию Database Integration in the context of the GUS Скачать презентацию Database Integration in the context of the GUS

a4846cedd9cb2c38bdb3a290d7a4b526.ppt

  • Количество слайдов: 9

Database Integration in the context of the GUS (“Genomics Unified Schema”) Data Warehouse Jonathan Database Integration in the context of the GUS (“Genomics Unified Schema”) Data Warehouse Jonathan Crabtree Senior Programmer/Analyst Computational Biology and Informatics Laboratory Center for Bioinformatics University of Pennsylvania http: //www. bil. upenn. edu c 2001 Bioinformatics and Data Integration Copyright © 2001 Computational Biology and Informatics Laboratory

Outline • The problem(s) of database integration • A warehouse-based solution • The databases Outline • The problem(s) of database integration • A warehouse-based solution • The databases involved – GUS: an integrated warehouse of sequence data – RAD: a gene expression database • Resources developed using GUS and/or RAD • Queries enabled by the system • Future directions 2001 Bioinformatics and Data Integration Copyright © 2001 Computational Biology and Informatics Laboratory

The Purpose of GUS • Integrate existing databases and tools – a single point The Purpose of GUS • Integrate existing databases and tools – a single point of access to what is already known • Provide an automated “lab. notebook” – a permanent record of work in progress – e. g. , similarity searches, array data, etc. • And ultimately: support data mining – a potential source of novel discoveries 2001 Bioinformatics and Data Integration Copyright © 2001 Computational Biology and Informatics Laboratory

What is GUS? • A relational schema with over 180 tables – integrates many What is GUS? • A relational schema with over 180 tables – integrates many types of data relevant to genomics – is based on the central dogma of biology – represents organisms, biological systems, projects • A relational data warehouse that contains – public sequence data – gene & genome annotation generated “in-house” – additional data sets from collaborators 2001 Bioinformatics and Data Integration Copyright © 2001 Computational Biology and Informatics Laboratory

What is GUS (II)? • A Perl API and annotation subsystem – lightweight object What is GUS (II)? • A Perl API and annotation subsystem – lightweight object layer – supports high-level programmatic access. . . – but does not mandate it (i. e. , SQL is an option) • A generic user interface – Java Servlet-based (Apache JServ) – supports browsing – and also restricted ad-hoc queries 2001 Bioinformatics and Data Integration Copyright © 2001 Computational Biology and Informatics Laboratory

Schema Overview Ontologies • GO • Species • Tissue • Dev. Stage Special Features Schema Overview Ontologies • GO • Species • Tissue • Dev. Stage Special Features • Ownership • Protection • Algorithm • Evidence • Similarity • Versioning Genomic Sequence • Genes, gene models • STSs, repeats, etc • Cross-species analysis Transcribed Sequence • Characterize transcripts • RH mapping • Library analysis • Cross-species analysis • DOTS Protein Sequence • Domains • Function • Structure • Cross-species analysis free text RAD RNA Abundance DB Transcript Expression Pathways Networks • Arrays • SAGE • Conditions • Representation • Reconstruction under development 2001 Bioinformatics and Data Integration Copyright © 2001 Computational Biology and Informatics Laboratory

Views of GUS To support in-depth annotation of specific organisms and biological systems: organisms Views of GUS To support in-depth annotation of specific organisms and biological systems: organisms Human GUS biological systems Mouse Plasmodium falciparum CNS GUS Endocrine pancreas Hematopoiesis *not drawn to scale* 2001 Bioinformatics and Data Integration Copyright © 2001 Computational Biology and Informatics Laboratory

URLs/References • UPenn. Computational Biology and Informatics Laboratory (CBIL) – http: //www. bil. upenn. URLs/References • UPenn. Computational Biology and Informatics Laboratory (CBIL) – http: //www. bil. upenn. edu c • GUS (“Genomics Unified Schema”) – schema browser: • http: //www. llgenes a. org/cgi-bin/schema. Browser. pl? db=GUS – an example of a controlled vocabulary/ontology in GUS: • http: //www. bil. upenn. edu/anatomy. php 3 c – sites implemented (in part or whole) as views of GUS: • http: //www. llgenes a. org • http: //www. lasmodb. org p • http: //www. bil. upenn. edu/EPCon. DB c – reference: K 2/Kleisli and GUS: Experiments in Integrated Access to genomic data sources. Davidson, S. B. , Crabtree, J. , Brunk, B. P. , Schug, J. , Tannen, V. , Overton, G. C. , Stoeckert, C. J. , Jr. IBM Systems Journal: in press. 2001 Bioinformatics and Data Integration Copyright © 2001 Computational Biology and Informatics Laboratory

URLs/References (II) • RAD (“RNA Abundance Database”) – schema browser: • http: //www. bil. URLs/References (II) • RAD (“RNA Abundance Database”) – schema browser: • http: //www. bil. upenn. edu/cgi-bin/RAD 2/schema. Browser. RAD c. pl – public web site: • http: //www. bil. upenn. edu/RAD 2 c – reference: A relational schema for both array-based and SAGE gene expression experiments. Stoeckert, C. , Pizarro, A. , Manduchi, E. , Gibson, M. , Brunk, B. , Crabtree, J. , Schug, J. , Shen-Orr, S. , Overton, G. C. Bioinformatics: in press. 2001 Bioinformatics and Data Integration Copyright © 2001 Computational Biology and Informatics Laboratory