- Количество слайдов: 34
Virginia Center for Grid Research The Global Bio Grid Andrew Grimshaw University of Virginia January, 2006
• Why Bio Grids? • Grid Basics • The Global Bio Grid
In ten years the world will be very different.
Think back ten years. • No web • Wide-spread internet was new • Human Genome Project still far from completion • Science (biology) done primarily in individual labs
Today • Billions a year in e-commerce • Internet everywhere • Broadband to your home • Wireless becoming pervasive • Pervasive device are proliferating – motes • Sequencing of organisms a daily event. Bioinformatics hitting the main stream
Tomorrow • • $1000/sequnce for humans – becomes standard clinical practice “Biology is becoming an information science” (Large Scale Biomedical Science: Exploring Strategies for future research, Institute of Medicine, National Research Council, 2003) • Global interconnected networks – grids • • • Provide transparent, secure, access to data, applications, and on-demand compute. Research using not just your data, but all trusted data, not just your applications, but any trusted application. Implications for progress are significant.
There a number of “catches” • So much data! • So many organizations with so little trust! • So much complexity!
An IT guys view • Data is all over, of all different forms, with lots of different policies • Need to get the right data in the right place at the right time • Ontology problem – how do we compare, integrate, the databases • Need to understand semantics, automatically transform • Semantics • Knowledge Discovery – “mining”
This is where grids enter the picture (we do the plumbing)
Some lessons learned • 10+ years in academic and commercial grids • All/most problems are not technical • Users don’t want change! • • • Too many grids are technology centric Must keep “activation energy low” Need a user-centric approach There at least four classes of users Wide variance in computational savvy
What is a Grid? A grid is all about gathering together resources and making them accessible to users and applications. A grid enables users to collaborate securely by sharing processing, applications, work flows and processes, and data across heterogeneous systems and administrative domains for collaboration, faster application execution, and easier access to data. The emphasis is on secure access to a wide variety of resources
Characteristics of Grid systems Numerous Resources Connected by Heterogeneous, Multi-Level Networks Ownership by Mutually Distrustful Organizations & Individuals Different Security Requirements & Policies Required Potentially Faulty Resources Grid System Resources are Heterogeneous Different Resource Management Policies Geographically Separated
Characteristics of a Grid system Numerous Resources Connected by Heterogeneous, Multi-Level Networks Ownership by Mutually Distrustful Organizations & Individuals Different Security Requirements & Policies Required Potentially Faulty Resources Different Resource Management Policies Resources are Heterogeneous Geographically Separated
What grids are not • • The solution to all problems Clusters of machines [email protected] Any one particular technology
Users view Users Access Data Run programs Provide shared services Users Collaborate Grid Site 0 Site 1 Site 2 Site 3 HPSS Cluster
Grid Computing Scenarios e t pu m a ftw So id id Gr r n a. G io t eg Da L d an re o –C Partner Grids • Multiple owners, sites, domains • Multiple file systems • Internet connectivity Campus/Enterprise Grids Desktop Cycle Aggregation • Multiple owners, domains • Multiple file systems Cluster Grids • WAN connection • Single owner, department, project • Single domain, file system • LAN connection • Limited acceptance in commercial enterprises
Standards • Global Grid Forum – ggf. org • OGSA – Open Grid Services Architecture • • • Web-Services based IPC WSRF and possibly other OGSA-BES – Basic Execution Service OGSA-Byte. IO – file IO WS-Naming – abstract name to EPR RNS-lite – Resource Name Space
The Global Bio Grid
GBG concept • Federated access to multiple • Data sources • Public databases • Commercial databases • In-house databases, annotations, etc. • Application suites (including processes and workflows) • Compute resources • Shared among collaborative research teams • Multiple research locations • Virtual organizations • Built on evolving computing standards (GGF, I 3 C, WS-*)
Global Bio Grid • Datagrid using Avaki DG technology • • • Working on ADG available free for “. edu” UVA, NCBIO, U-Texas, Texas Tech Already operational Flat file and relational Working on an OGSA-compliant implementation • Compute grid at UVA on-line • • 64 dual processor Opteron’s available Sunfires Hundreds of Windows machines Legion 1. 8 based – moving towards OGSA-compliant services • Applications • Biomarker • Searching pub med • Hospital info integration
Three resource classes illustrate the Grid-effect • Data • Processing • Applications
Data • Suppose you have collaborators with critical databases (clinical, protein, other) that you need to use. • You use a number of databases that change on a regular basis. • You want to “mine” heterogeneous data sets (relational, flat-file, XML, …) in different locations – say in a hospital • Want to produce, consume, or share derivative data products, e. g. , the result of a set of joins and data transformation steps. • This applies to business data (BI/EII) as well as life science data
Data. Grid: Unifying fabric for data access • • Public DB Transparent access to multiple DBs Multiple domains Highly-secure, flexible access control Automatic cache management and coherence Public DB PDB NCBI EMBL SEQ_1 Data SEQ_1 SEQ_2 APP 1 Biology Partner Institution Research Institution SEQ_3 APP 2 Biochemistry Partner Institution
Three Concrete Examples • KDS – “data mining” on widely separated data sets such as Pub. Med. • “Map” Uni. Prot datasets into data grid • Researchers no longer need to spend time downloading latest • Extended Hospital
Extended Hospital Non-related Hospitals Authorized Family Data Warehouse Clinics / Large Practices Research Department Domain Data Emergency vehicles HOSPITAL Insurance companies
Processing • Classic high-throughput computing • Suppose you have thousands of computationally intensive jobs to run • SW, CHARMm, Sequest, a. out • Your usage is bursty – need a lot over short period of time, but often have idle resources • You wish you had more!
Public DB Compute Grid: Shared access to processing Public DB • Flexible, location-independent access to virtually unlimited processing, on-demand • Scheduling, usage, management policies • System detects, recovers from job failures • Heterogeneous platform support • Usage accounting, as required PDB Cluster 1 NCBI Cluster 2 EMBL SEQ_1 Data SEQ_1 Cluster N Processing SEQ_2 APP 1 Biology Partner Institution Research Institution SEQ_3 APP 2 Biochemistry Partner Institution
Concrete Examples • Biomarkers project wants to run Sequest-2 using public databases • Charmm/Amber • Gnomad (Altman et al) • BLAST, FASTA, …. • Autodock
Applications • Suppose you want to use applications or workflows developed, maintained, and supported by others – without the hassle of installing all of them on your gear. • Suppose you want to couple multiple applications developed at different institutions together.
Public DB Grid users share applications, employing multiple data & processing resources Public DB • Flexible binary management • No need to recompile applications • Securely share applications • Restrict who gains access • Restrict where apps run PDB Cluster 1 APP 1 Cluster 2 APP 2 Cluster N APP N NCBI EMBL SEQ_1 Data Processing Applications PDB NCBI EMBL SEQ_N Data SEQ_1 SEQ_2 APP 1 Biology Partner Institution Research Institution SEQ_3 APP 2 Biochemistry Partner Institution
Public DB Better Research, Faster • Secure, wide-area access to global breadth of consistent, current data • Access to vast processing power • Ability to securely share proprietary data and applications, as needed PDB Cluster 1 APP 1 Cluster 2 APP 2 Cluster N APP N NCBI EMBL SEQ_1 Data SEQ_1 Processing SEQ_2 APP 1 Biology Partner Institution Research Institution Applications SEQ_3 APP 2 Biochemistry Partner Institution
Summary Evolution in action Now & Future! Today 60’s to 80’s Grid & WS 50’s Batch OS Bare Metal Programming Multi-User Timeshare Low Level Network Programming
Summary • Grids will have a huge impact on the life sciences • Prototype GBG operational • Applications are underway • We’re always looking for new applications