Скачать презентацию Virginia Center for Grid Research The Global Bio Скачать презентацию Virginia Center for Grid Research The Global Bio

50d98a3ff0856db4697b4ec860e11b02.ppt

  • Количество слайдов: 34

Virginia Center for Grid Research The Global Bio Grid Andrew Grimshaw University of Virginia Virginia Center for Grid Research The Global Bio Grid Andrew Grimshaw University of Virginia January, 2006

 • Why Bio Grids? • Grid Basics • The Global Bio Grid • Why Bio Grids? • Grid Basics • The Global Bio Grid

In ten years the world will be very different. In ten years the world will be very different.

Think back ten years. • No web • Wide-spread internet was new • Human Think back ten years. • No web • Wide-spread internet was new • Human Genome Project still far from completion • Science (biology) done primarily in individual labs

Today • Billions a year in e-commerce • Internet everywhere • Broadband to your Today • Billions a year in e-commerce • Internet everywhere • Broadband to your home • Wireless becoming pervasive • Pervasive device are proliferating – motes • Sequencing of organisms a daily event. Bioinformatics hitting the main stream

Tomorrow • • $1000/sequnce for humans – becomes standard clinical practice “Biology is becoming Tomorrow • • $1000/sequnce for humans – becomes standard clinical practice “Biology is becoming an information science” (Large Scale Biomedical Science: Exploring Strategies for future research, Institute of Medicine, National Research Council, 2003) • Global interconnected networks – grids • • • Provide transparent, secure, access to data, applications, and on-demand compute. Research using not just your data, but all trusted data, not just your applications, but any trusted application. Implications for progress are significant.

There a number of “catches” • So much data! • So many organizations with There a number of “catches” • So much data! • So many organizations with so little trust! • So much complexity!

An IT guys view • Data is all over, of all different forms, with An IT guys view • Data is all over, of all different forms, with lots of different policies • Need to get the right data in the right place at the right time • Ontology problem – how do we compare, integrate, the databases • Need to understand semantics, automatically transform • Semantics • Knowledge Discovery – “mining”

This is where grids enter the picture (we do the plumbing) This is where grids enter the picture (we do the plumbing)

Some lessons learned • 10+ years in academic and commercial grids • All/most problems Some lessons learned • 10+ years in academic and commercial grids • All/most problems are not technical • Users don’t want change! • • • Too many grids are technology centric Must keep “activation energy low” Need a user-centric approach There at least four classes of users Wide variance in computational savvy

What is a Grid? A grid is all about gathering together resources and making What is a Grid? A grid is all about gathering together resources and making them accessible to users and applications. A grid enables users to collaborate securely by sharing processing, applications, work flows and processes, and data across heterogeneous systems and administrative domains for collaboration, faster application execution, and easier access to data. The emphasis is on secure access to a wide variety of resources

Characteristics of Grid systems Numerous Resources Connected by Heterogeneous, Multi-Level Networks Ownership by Mutually Characteristics of Grid systems Numerous Resources Connected by Heterogeneous, Multi-Level Networks Ownership by Mutually Distrustful Organizations & Individuals Different Security Requirements & Policies Required Potentially Faulty Resources Grid System Resources are Heterogeneous Different Resource Management Policies Geographically Separated

Characteristics of a Grid system Numerous Resources Connected by Heterogeneous, Multi-Level Networks Ownership by Characteristics of a Grid system Numerous Resources Connected by Heterogeneous, Multi-Level Networks Ownership by Mutually Distrustful Organizations & Individuals Different Security Requirements & Policies Required Potentially Faulty Resources Different Resource Management Policies Resources are Heterogeneous Geographically Separated

What grids are not • • The solution to all problems Clusters of machines What grids are not • • The solution to all problems Clusters of machines [email protected] Any one particular technology

Users view Users Access Data Run programs Provide shared services Users Collaborate Grid Site Users view Users Access Data Run programs Provide shared services Users Collaborate Grid Site 0 Site 1 Site 2 Site 3 HPSS Cluster

Grid Computing Scenarios e t pu m a ftw So id id Gr r Grid Computing Scenarios e t pu m a ftw So id id Gr r n a. G io t eg Da L d an re o –C Partner Grids • Multiple owners, sites, domains • Multiple file systems • Internet connectivity Campus/Enterprise Grids Desktop Cycle Aggregation • Multiple owners, domains • Multiple file systems Cluster Grids • WAN connection • Single owner, department, project • Single domain, file system • LAN connection • Limited acceptance in commercial enterprises

Standards • Global Grid Forum – ggf. org • OGSA – Open Grid Services Standards • Global Grid Forum – ggf. org • OGSA – Open Grid Services Architecture • • • Web-Services based IPC WSRF and possibly other OGSA-BES – Basic Execution Service OGSA-Byte. IO – file IO WS-Naming – abstract name to EPR RNS-lite – Resource Name Space

The Global Bio Grid The Global Bio Grid

GBG concept • Federated access to multiple • Data sources • Public databases • GBG concept • Federated access to multiple • Data sources • Public databases • Commercial databases • In-house databases, annotations, etc. • Application suites (including processes and workflows) • Compute resources • Shared among collaborative research teams • Multiple research locations • Virtual organizations • Built on evolving computing standards (GGF, I 3 C, WS-*)

Global Bio Grid • Datagrid using Avaki DG technology • • • Working on Global Bio Grid • Datagrid using Avaki DG technology • • • Working on ADG available free for “. edu” UVA, NCBIO, U-Texas, Texas Tech Already operational Flat file and relational Working on an OGSA-compliant implementation • Compute grid at UVA on-line • • 64 dual processor Opteron’s available Sunfires Hundreds of Windows machines Legion 1. 8 based – moving towards OGSA-compliant services • Applications • Biomarker • Searching pub med • Hospital info integration

Three resource classes illustrate the Grid-effect • Data • Processing • Applications Three resource classes illustrate the Grid-effect • Data • Processing • Applications

Data • Suppose you have collaborators with critical databases (clinical, protein, other) that you Data • Suppose you have collaborators with critical databases (clinical, protein, other) that you need to use. • You use a number of databases that change on a regular basis. • You want to “mine” heterogeneous data sets (relational, flat-file, XML, …) in different locations – say in a hospital • Want to produce, consume, or share derivative data products, e. g. , the result of a set of joins and data transformation steps. • This applies to business data (BI/EII) as well as life science data

Data. Grid: Unifying fabric for data access • • Public DB Transparent access to Data. Grid: Unifying fabric for data access • • Public DB Transparent access to multiple DBs Multiple domains Highly-secure, flexible access control Automatic cache management and coherence Public DB PDB NCBI EMBL SEQ_1 Data SEQ_1 SEQ_2 APP 1 Biology Partner Institution Research Institution SEQ_3 APP 2 Biochemistry Partner Institution

Three Concrete Examples • KDS – “data mining” on widely separated data sets such Three Concrete Examples • KDS – “data mining” on widely separated data sets such as Pub. Med. • “Map” Uni. Prot datasets into data grid • Researchers no longer need to spend time downloading latest • Extended Hospital

Extended Hospital Non-related Hospitals Authorized Family Data Warehouse Clinics / Large Practices Research Department Extended Hospital Non-related Hospitals Authorized Family Data Warehouse Clinics / Large Practices Research Department Domain Data Emergency vehicles HOSPITAL Insurance companies

Processing • Classic high-throughput computing • Suppose you have thousands of computationally intensive jobs Processing • Classic high-throughput computing • Suppose you have thousands of computationally intensive jobs to run • SW, CHARMm, Sequest, a. out • Your usage is bursty – need a lot over short period of time, but often have idle resources • You wish you had more!

Public DB Compute Grid: Shared access to processing Public DB • Flexible, location-independent access Public DB Compute Grid: Shared access to processing Public DB • Flexible, location-independent access to virtually unlimited processing, on-demand • Scheduling, usage, management policies • System detects, recovers from job failures • Heterogeneous platform support • Usage accounting, as required PDB Cluster 1 NCBI Cluster 2 EMBL SEQ_1 Data SEQ_1 Cluster N Processing SEQ_2 APP 1 Biology Partner Institution Research Institution SEQ_3 APP 2 Biochemistry Partner Institution

Concrete Examples • Biomarkers project wants to run Sequest-2 using public databases • Charmm/Amber Concrete Examples • Biomarkers project wants to run Sequest-2 using public databases • Charmm/Amber • Gnomad (Altman et al) • BLAST, FASTA, …. • Autodock

Applications • Suppose you want to use applications or workflows developed, maintained, and supported Applications • Suppose you want to use applications or workflows developed, maintained, and supported by others – without the hassle of installing all of them on your gear. • Suppose you want to couple multiple applications developed at different institutions together.

Public DB Grid users share applications, employing multiple data & processing resources Public DB Public DB Grid users share applications, employing multiple data & processing resources Public DB • Flexible binary management • No need to recompile applications • Securely share applications • Restrict who gains access • Restrict where apps run PDB Cluster 1 APP 1 Cluster 2 APP 2 Cluster N APP N NCBI EMBL SEQ_1 Data Processing Applications PDB NCBI EMBL SEQ_N Data SEQ_1 SEQ_2 APP 1 Biology Partner Institution Research Institution SEQ_3 APP 2 Biochemistry Partner Institution

Public DB Better Research, Faster • Secure, wide-area access to global breadth of consistent, Public DB Better Research, Faster • Secure, wide-area access to global breadth of consistent, current data • Access to vast processing power • Ability to securely share proprietary data and applications, as needed PDB Cluster 1 APP 1 Cluster 2 APP 2 Cluster N APP N NCBI EMBL SEQ_1 Data SEQ_1 Processing SEQ_2 APP 1 Biology Partner Institution Research Institution Applications SEQ_3 APP 2 Biochemistry Partner Institution

Summary Evolution in action Now & Future! Today 60’s to 80’s Grid & WS Summary Evolution in action Now & Future! Today 60’s to 80’s Grid & WS 50’s Batch OS Bare Metal Programming Multi-User Timeshare Low Level Network Programming

Summary • Grids will have a huge impact on the life sciences • Prototype Summary • Grids will have a huge impact on the life sciences • Prototype GBG operational • Applications are underway • We’re always looking for new applications