
6c575fdeec589a294fd664822ea7dc5d.ppt
- Количество слайдов: 45
Session id: 40263 Oracle Life Sciences Platform and 10 g Preview Charlie Berger Sr. Director of Product Management, Life Sciences and Data Mining charlie. berger@oracle. com Oracle Corporation
Welcome to the Oracle Life Sciences User Group Meeting Oracle HQ Bldg 350 Conference Center Redwood Shores, CA September 10 th, 2003 8: 30 am-7: 30 pm
Oracle Life Sciences Day & User Group Meeting Agenda 8: 00 -8: 30 -8: 45 -9: 45 Breakfast Welcome Oracle's Platform for Life Sciences - New 10 G Features Preview & Solicitation Process for Features in Next Release Charlie Berger, Oracle Corporation 9: 45 -10: 30 New In Silico Drug Discovery Integrated Demo Joyce Peng, Oracle Corporation 10: 30 -10: 50 Break 10: 50 -11: 30 European Bioinformatics Institutes (EBI), Peter Stoehr Managing Scientific Literature (Medline) and XML Data Within Oracle 11: 30 -12: 10 The Wellcome Trust Sanger Institute, Martin Widlake Implementing a Terascale Data Store (20 TB) 12: 10 -1: 00 Lunch & Wish List Feature Post-it Notes 1: 00 -1: 40 Wyeth Research, Peter Smith 21 CFR PART 11 via Oracle Auditing at Wyeth
Oracle Life Sciences Day & User Group Meeting Agenda 1: 40 -2: 20 -3: 00 -3: 20 -4: 00 -4: 40 -5: 20 -5: 30 -6: 30 -7: 30 Sequence Search Capabilities in the Database, Myriad Proteomics Johnson & Johnson, Richard Guida & Rajesh Shah Building a Secure Infrastructure with Oracle in Life Sciences, J & J PKI and Secure Connectivity to Oracle Break & Afternoon Refreshments Kyoto University, Japan, Susumu Goto Integrating Biological Information and Pathways using Oracle, KEGG at Kyoto University Bio. Med Central Limited, Matthew Cockerill Managing Scientific Images with Oracle - Multimedia Database Improves the Bottom Line Abbott Laboratories, Shon Naeymirad Electronic Records, 21 CFR Part 11 and Oracle 9 i Break ISV Lightening Rounds, Life Sciences ISV Partners ISV Reception and Demo Grounds
Oracle’s Commitment "My industry is going to become pretty boring soon – I don't believe you'll ever see this proliferation of informatics companies or computer companies like you saw in the decade of the Nineties. The life sciences industry is where the horizons are wide open. There'll be lots and lots of companies born, lots of new products, lots of new science at least for the next 50 years. Because of that. . . we've decided to focus heavily on the life sciences industry. ” -Larry Ellison, CEO, Oracle Corporation, Bio-IT World magazine, premier issue March 2002
Life Sciences Value Chain Sample Data Public/ Private Data ico Sil In Discovery Development Pharmaceutical b l La Company a t Pharmaceutical We nic ls li -C Tria Company e Biotech / Pr l Pharmaceutical ica s n Research Labs Cli rial T Biomedical Regulatory Firm Agency Biomedical Firm Contract Research Organization Manufacturing, Sales and Marketing Pharmaceutical Mfg. Plant Distribution Pharmacy Hospital
Oracle’s Solutions for Life Sciences Discovery Finance Sales & Marketing HR Projects Development & Clinical Maintenance Manufacture/ Supply Chain Management Database Manage all your data Application Server Run all your applications
Drug Discovery Economics 101 Better Data Management Accelerates Discovery Competition from Generics Goal: Accelerate the Discovery Process Revenue Sales Revenue 15 R R & D Costs 20 Product Launch Costs Identify Clinical Identify Pre- Pre. Clinical and Trials and Trails. Clinical Validate Leads Validate Trails Targets Years Patent Expiry Clinical Trials Leads Source: Ernst & Young, Price Waterhouse
Life Sciences Discovery Genes and Proteins Run the Cell Organism Cell Nucleus Chromosome Protein Gene (m. RNA) Graphics courtesy of the National Human Genome Research Institute Gene (DNA)
Life Sciences Challenge Correlate Biological and DNA Variation 3. 2 billion letters of human DNA ~ 2 million variation points (SNPs) SNP = Single Nucleotide Polymorphism agaatttcat at[T/C]gtg gaagaggac Graphics courtesy of the National Human Genome Research Institute
Life Sciences Challenge Correlate Diseases, Genes and Environment Stroke Breast cancer Diabetes Schizophrenia Manic-depression Myocardial Infarction Hypertension Obesity Hyperlipidemia Inflammatory Bowel Disease Graphics courtesy of the National Human Genome Research Institute
Life Science Challenge Exploding Volumes of Data 500 TB 450 TB 400 TB 350 TB 300 TB 250 TB Data Storage Today 200 TB 150 TB 100 TB 50 TB 1994 1995 1996 1997 1998 Oct-1999 Apr-2000 Nov-2001 Jan-01 2002 2003 2004 2005 2006 “To meet the scientific goals we believe we need to add around 80 - 100 TB of storage each year for the next 5 years” 0 P. Butcher, The Sanger Centre
Life Science Challenge Many Different Kinds of Data Genomics Proteomics Modeling Pathways Clinical Pharmacogenomics Functional Genomics Graphic modified from original courtesy of Sun Microsystems Cheminformatics
Life Science Challenge Just A Few Biological Databases
Life Science Challenge Typical Research Environment Manage vast Local Databases quantities of data Local Copies Access heterogeneou s Data Public Databases Find Patterns and Private/Service Databases insights Industrial Research Lab Partner or Collaborator
Oracle Vision : At the core is a data management platform Run All Your Applications Manage All Your Data Browser Mobile Device Clients Oracle 10 g App Server Oracle 10 g Database Server
Introducing Oracle 10 g Ÿ Runs all your applications Ÿ Stores all your information Ÿ Highly scalable, available, reliable Ÿ Secure Ÿ Easy to manage – Make individual systems self-managing – Manage thousands of servers at once
Oracle’s Platform for Life Sciences Genomics Proteomics Cheminformatics Pathways Clinical 1. 2. 3. 4. 5. Access heterogeneous data Integrate a variety of data types Manage vast quantities of data Find patterns and insights Collaborate securely
Oracle Life Sciences Platform Access heterogeneou s Data Manage vast quantities of data Find Patterns and insights
Oracle Life Sciences Platform Transparent Gateways Fast access using Oracle OCI e. g. Pub. Med Distributed Queries Manage Generic Gateways vast Access any data using ODBC quantities of Real Application Clusters data Linear scalability Perform searches across domains Build personalized portals Application Server Provide scalability for the middle tier e. g. Swiss. Prot SP-ML SQL Loader High performance data loader Web Services Standard communication between applications Merge/Upsert Enabling update and insert in one step Access heterogeneou s Data Oracle Portal XML DB Security Flexibly manage data Enforce security inter. Media Auditing Store & manage images Collaboration Suite Create audit trail to facilitate FDA compliance Workflow Extensibility Framework (Data cartridges), manage complex scientific data Collaborate securely Automate laboratory & business processes Index & query text, e. g. literature searches External Tables Ability to index and query external files Ultra. Search external sites & repositories My. SQL Toolkit Easily move My. SQL data into Oracle i. FS/Files Share documents Data Mining Discover patterns & insights Find Patterns Statistics Perform basic statistics LOBs and Manage unstructured data Table Functions Text My. SQL Gen. Bank e. g. Transportable Tablespaces Rapidly exchange tables Oracle Streams Implement complex algorithms Rule-based subscription for information sharing OLAP & Discoverer Interactive query & drill-down insights
Oracle Life Sciences Platform Transparent Gateways Fast access using Oracle OCI e. g. Pub. Med My. SQL Gen. Bank e. g. Distributed Queries Perform searches across domains External Tables Generic Gateways Ability to index and query external files Access any data using ODBC Real Application Clusters Oracle Portal Build personalized portals Application Server Provide scalability for the middle tier e. g. Swiss. Prot SP-ML SQL Loader High performance data loader Web Services Standard communication between applications Merge/Upsert Enabling update and insert in one step Linear scalability XML DB Security Flexibly manage data Enforce security inter. Media Auditing Store & manage images (Data cartridges), manage complex scientific data LOBs Manage unstructured data Text Index & query text, e. g. literature searches Collaborate securely Create audit trail to facilitate FDA compliance Workflow Extensibility Framework Collaboration Suite Automate laboratory & business processes Ultra. Search external sites & repositories My. SQL Toolkit Easily move My. SQL data into Oracle i. FS/Files Share documents Data Mining Discover patterns & insights Statistics Perform basic statistics Table Functions Transportable Tablespaces Rapidly exchange tables Oracle Streams Implement complex algorithms Rule-based subscription for information sharing OLAP & Discoverer Interactive query & drill-down
1. Access Heterogeneous Data Ultra. Search External Sites Distributed query Flat files External Table Sybase My. SQL Generic Connectivity My. SQL Migration Toolkit DBlinks Transportable Tablespaces DB 2 Transparent Gateway
1. Access Heterogeneous Data Flat files Ÿ Oracle Transparent Gateways – Ÿ Dblinks Access data from flat files Query across multiple Oracle and heterogeneous data sources Ÿ Transportablespaces – High performance data loader ODBC/JDBC connectivity Ÿ Distributed Queries – – Ÿ Oracle Streams Ÿ External Tables – Ÿ SQL*Loader Integrate data from disparate systems Ÿ Generic Connectivity – My. SQL Rapidly move tablespaces between Oracle databases – – Rule-based subscription for information sharing Connectivity between databases Ÿ Ultra. Search – Query range of data repositories (web sites, files, email, databases, etc. ) Ÿ Migration Toolkits – Tools to facilitate movement of data into Oracle Ÿ Merge / Upsert – Update and insert in one step
2. Integrate a Variety of Data Types Genomics Proteomics Modeling Pathways Clinical Pharmacogenomics Functional Genomics Graphic modified from original courtesy of Sun Microsystems Cheminformatics
2. Integrate a Variety of Data Types Ÿ XML DB – – Unite XML content and relational data SQL & XML become one Ÿ LOBs – Manage unstructured data Ÿ Internet File System (Oracle Files) – Manage files and folders Ÿ Text – Index and query of text content & documents (Word, Powerpoint, HTML, Adobe PDFs, etc. ) Ÿ inter. Media – Manage audio, video and image data XML
European Bioinformatics Institute (EBI) Ÿ Hosts major public databases (e. g. Swiss. Prot, EMBL Nucleotide Sequence Database, Medline) on Oracle. (Total: > 5 TB) Ÿ Uses Oracle XML DB and Oracle Text for Medline – in development. – Size: 11 million records, 200 GB Ÿ Uses Oracle 9 i Database and Application Server.
2. Integrate a Variety of Data Types Extensibility Framework (Data Cartridges) - Manage complex scientific data Oracle 9 i Server
Chemical Searching Ÿ Chemistry searching requires special techniques – Chemical name is not unique
Chemical Searching Ÿ Chemistry searching requires special techniques “Viagra®” – Chemical name is not unique
Chemical Searching Ÿ Chemistry searching requires special techniques “Viagra®” – Chemical name is not unique “sildenafil citrate”
Chemical Searching Ÿ Chemistry searching requires special techniques “Viagra®” – Chemical name is not unique “sildenafil citrate” – Chemists think graphically
Chemical Searching Ÿ Chemistry searching requires special techniques – Chemical name is not unique “Viagra®” “sildenafil citrate” – Chemists think graphically Ÿ The solution: – A graphical user interface –Specialized operators such as substructure search (“sss”) = a chemical “contains” finds
MDL Information Systems, Inc. Ÿ MDL Discovery Framework A multi-tier system for managing and integrating discovery data and workflows – Domain-specific application and database services and API – Chemistry rules, drawing, and rendering – Single application access to multiple DBs and services Ÿ Key Advantages – – – Integrate data sources across R&D Easily create web or client solutions Quickly adopt new tools and methods for development Ÿ www. mdl. com Ÿ Oracle Features – – – Oracle 8 i/9 i Database Extensibility Option (chemical data cartridge) Replication support Oracle 9 i. AS J 2 EE services
IDBS Ÿ The Activity. Base Suite – – – Capture, manage and use chemical and biological data in life sciences discovery Manage full range of disparate data types The leading application for drug discovery research worldwide Ÿ Key Advantages – – Integration framework for cheminformatics and Ÿ Oracle Features bioinformatics data – Chemistry cartridge (Chem. Xtra) Rich data context enables data – PL/SQL stored procedures quality – JAVA stored procedures Supports manual and automated – XML data capture & management – Materialized views Maximizes the value of discovery data – Data warehousing Ÿ www. id-bs. com – 9 i compatible
3. Manage Vast Quantities of Data Ÿ Grid support in Oracle 10 g Ÿ Oracle Scales to Petabytes – – Largest life sciences databases run Oracle 80% market share - IDC 500 TB 450 TB 400 TB 350 TB 300 TB 250 TB Data Storage Today Ÿ Partitioning Divide and conquer Ÿ Oracle 10 g Application Server – Provide scalability for middle tier Ÿ Oracle Data Guard – Protect data from human or system failures 150 TB 100 TB 50 TB 1994 1995 1996 1997 1998 Oct-1999 Apr-2000 Nov-2001 Jan-01 2002 2003 2004 2005 2006 – 200 TB 0
3. Manage Vast Quantities of Data Support for Grid Ÿ Distributed queries, External Tables, Security, RAC Ÿ Grid Access to Oracle Utilities through Globus Resource Allocation Manager (GRAM) – Export, Import, SQLPlus Ÿ Grid Access to Oracle 10 g Database – Invoke PL/SQL routines specified in Globus Resource Specification Language Ÿ Grid Resource Information Service (GRIS) for Oracle Database – Discover & monitor Oracle databases
3. Manage Vast Quantities of Data • Real Application Clusters (RAC) – Start with one server, one database and grow as you grow – Linear scalability out of the box – Save on Hardware and Storage costs Data Loads Proteomics Portal Sample/Lab – Works with ALL applications – Fail-over transparent to users – Easy to administer High-speed interconnect A-Z
Oracle Real Application Clusters Works for All Applications Oracle 1. Add new node 2. Start instance on new node No Code Change
Oracle Real Application Clusters Greater Than 85% Scalability
Genentech, Inc. Ÿ Leading biotech company – – – Over 2 TBs of data in Oracle serves as a centralized information resource for gene searching and database crossreferencing. Oracle used for the entire pipeline from research to clinical data to manufacturing and sales applications. Ÿ Key Advantages of Oracle – – – Improved performance Greater reliability Genentech's corporate goal is 99. 999% availability in a 24 x 7 environment Ÿ Oracle Environment – – Oracle 9 i database Real Application Clusters Ÿ Oracle 9 i Real Application Clusters provide the foundation for the scalable and highly available database infrastructure we require to meet our growing data demands in all areas of our business. " --Scooter Morris, Genentech, Inc.
The Dragon Genomics Center of Takara Bio Inc. , specializing in large-scale sequencing, is among the highest speed genome-analyzing centers in Asia. Ÿ High-Level Project Goals – – Oracle Database Enterprise Manage data throughout every Edition step of a complicated process – Oracle 9 i. AS Enterprise Edition Create a laboratory information management system (LIMS) enabling large scale sequencing Ÿ "We trust Oracle in its ability to run terabyte-class databases in Provide reliable back up and clustered environments with recovery of vast amounts of data Ÿ Key Benefits – – Ÿ Oracle Environment Provided easy access and management for vast amounts of data Ensured scalability needed to accommodate future growth high availability. And we're pleased to say that Oracle has not disappointed us. " -- Toru Suzuki, Project Manager, Dragon Genomics Center, Takara Bio Inc.
Bioinformatics Center Institute for Chemical Research Kyoto University The Bioinformatics Center Institute for Chemical Research Kyoto University is leading biotechnology research thanks to its comprehensive studies in various areas, including the life sciences, information sciences, chemistry and physics. “In order to manage this massive amount of genetic information and to operate efficiently, it is essential to have a platform with paramount stability. Our web site receives accesses from all over the world continuously, 24 hours a day. In order to offer the latest information under such circumstances, performance is also an issue. In this sense, the Oracle Database was the most appropriate since it can handle this enormous amount of data in a fast and stable manner, 24 hours a day. ” – Professor and Director Minoru Kanehisa, Bioinformatics Center Institute for Chemical Research Kyoto University
4. Find Patterns and Insights Ÿ Oracle Data Mining – Find relationships and clusters associated with healthy and diseased states Ÿ Naïve Bayes, Adaptive Bayes Networks, Attribute Importance, Association Rules, K-Means, O-Cluster, SVM, NMF algorithms Ÿ Data Mining for Java (DM 4 J) GUI wizards and results browser Ÿ Oracle Discoverer & Oracle OLAP – Interactive query & drill-down Ÿ Statistical functions – Perform basic statistics in Oracle Ÿ e. g. summary statistics, e. g. mean, stdev, median, quantiles, hypothesis testing, distribution fitting, correlations, linear regression Ÿ Oracle Text & Text Mining – Classify & cluster documents relevant to area of interest Ÿ Table Functions – Implement complex algorithms within the database
4. Find Patterns and Insights Life Sciences data Functional Genomic Databases Deductive Analysis Clinical Databases Proteomics Database Pharmacological databases Answer complex questions about the relationships in genomic, clinical and pharmacological data Inductive Analysis Finding relationships for classification, class discovery and prediction