Скачать презентацию Bio Grid Challenges Problems and Opportunities Dheeraj Bhardwaj Скачать презентацию Bio Grid Challenges Problems and Opportunities Dheeraj Bhardwaj

93db92c903136b06c862c71e5ced4146.ppt

  • Количество слайдов: 76

Bio. Grid Challenges, Problems and Opportunities Dheeraj Bhardwaj Department of Computer Science & Engineering Bio. Grid Challenges, Problems and Opportunities Dheeraj Bhardwaj Department of Computer Science & Engineering Indian Institute of Technology, Delhi – 110 016 India http: //www. cse. iitd. ac. in/~dheerajb Dheeraj Bhardwaj December 2003 1

BIOLOGICAL PHENOMENON measurement process inference, conclusions DATA MODEL data analysis, learning Dheeraj Bhardwaj <dheerajb@cse. BIOLOGICAL PHENOMENON measurement process inference, conclusions DATA MODEL data analysis, learning Dheeraj Bhardwaj December 2003 2

Bioinformatics Vs. Biocomputing Bioinformatics IT BT Biocomputing Dheeraj Bhardwaj <dheerajb@cse. iitd. ac. in> December Bioinformatics Vs. Biocomputing Bioinformatics IT BT Biocomputing Dheeraj Bhardwaj December 2003 3

“Maze” on a Jigsaw Puzzle Phenome Genome Biological Data Dheeraj Bhardwaj <dheerajb@cse. iitd. ac. “Maze” on a Jigsaw Puzzle Phenome Genome Biological Data Dheeraj Bhardwaj December 2003 4

Equipments for New Quest High Performance Computers Data, Knowledge and Tools Collaboration of Human Equipments for New Quest High Performance Computers Data, Knowledge and Tools Collaboration of Human Experts The illustrations are quoted from the following sites: www. dnr. state. wi. us/org/ aw/air/ed/educatio. htm www. mtnbrook. k 12. al. us/academy/2 ndgrade/mtn/map. htm www. dnr. state. wi. us/org/ aw/air/ed/educatio. htm Dheeraj Bhardwaj December 2003 5

Needs of High Performance Computing • Increase of Genome Sequence Information • Combinatorial Increase Needs of High Performance Computing • Increase of Genome Sequence Information • Combinatorial Increase of Search Space Genome * Transcriptome * Proteome*. . . * Phenome • Computer Simulation and Unknown Parameter Estimation Knowledge integration in “Omic Space” Dheeraj Bhardwaj December 2003 6

Needs of High Performance Computing • Impact of Genome Sequence Projects Þ Human Genome Needs of High Performance Computing • Impact of Genome Sequence Projects Þ Human Genome (3, 000 Mbp, 2000) Þ Rapid Increase of Genome Sequence Databases Þ Strong Computation Demand for Homology Search • Start of Structural Genomics Projects Þ Determine 10, 000 folds in 5 years Þ Strong Computation Demand for Molecular Simulation Dheeraj Bhardwaj December 2003 7

1 st Issue: Homology Search ・Rapid Increase of Data Size; double per year, daily 1 st Issue: Homology Search ・Rapid Increase of Data Size; double per year, daily update (17 million entry, 50 Giga Bytes @ 2002 Oct. ) Rough Estimation Homology Search Time for Mouse c. DNA (5, 000 Seq. ) * Human Genome (3, 000 M bp) 1 cpu 8 cpu 32 cpu 256 cpu 6, 400 cpu 1 1 1 year month week day hour Dheeraj Bhardwaj December 2003 8

2 nd Issue Molecular Simulation Nano seconds order Molecular Dynamics simulation of protein molecules 2 nd Issue Molecular Simulation Nano seconds order Molecular Dynamics simulation of protein molecules with 100, 000 – 1, 000 molecular weight • Stability Analysis • Affinity Analysis • Folding Simulation Mg Ex. Ras p 21 G # of residues: 189 Molecular weight: 21 k. D Oncogene Variant Gly 12 →Val 5 ns 1000 h/32 Gflops Computer Dheeraj Bhardwaj GTP Lys 16 December 2003 9

Needs of Resource Sharing • Biological Databases (Unigene, Tr. EMBL, . . . ) Needs of Resource Sharing • Biological Databases (Unigene, Tr. EMBL, . . . ) • Bioinformatics Tools (BLAST, HMMER, . . . ) • Programming Language (Bioperl, Biojava, . . . ) Dheeraj Bhardwaj December 2003 10

Needs of Human Collaboration Dheeraj Bhardwaj <dheerajb@cse. iitd. ac. in> December 2003 11 Needs of Human Collaboration Dheeraj Bhardwaj December 2003 11

Grid for Bioinformatics • Effective for “Embarrassing Parallel Computation”: Homology Search, Motif Search, Unknown Grid for Bioinformatics • Effective for “Embarrassing Parallel Computation”: Homology Search, Motif Search, Unknown Parameter Estimation for Cellular Models etc • “Distributed Resource Sharing” among organizations: Web Services, Workflow and Computational Pipeline, Autonomous Database Update, etc • “Field” for Human Collaboration: Group Works for Genome Annotation, Whole Cell Simulation, Collaboration between Biologists and Computer Scientists, etc Dheeraj Bhardwaj December 2003 12

Summary of Bioinformatics Trend • Rapid increase of Genomic database size causes severe overhead Summary of Bioinformatics Trend • Rapid increase of Genomic database size causes severe overhead for database service • Demand for Molecular Dynamics Simulation requires High performance computers (including special-purpose computers) Needs a new Bioinformatics Platform for sharing Databases and High performance computers Dheeraj Bhardwaj December 2003 13

Strategic Technology Domain Information Integration from Genome to Phenome Modeling and Simulation From Molecular Strategic Technology Domain Information Integration from Genome to Phenome Modeling and Simulation From Molecular to Cell Grid High Performance Computing (PC-cluster, SMP, Vector) Dheeraj Bhardwaj December 2003 14

Evolution of the Scientific Process • Pre-electronic – Theorize &/or experiment, alone or in Evolution of the Scientific Process • Pre-electronic – Theorize &/or experiment, alone or in small teams; publish paper • Post-electronic – Construct and mine very large databases of observational or simulation data – Develop computer simulations & analyses – Exchange information quasi-instantaneously within large, distributed, multidisciplinary teams Dheeraj Bhardwaj December 2003 15

Algorithmic Complexity/Data Volume COMPUTATIONAL GRID CRAY T 3 E LINUX ~1000 GFLOPS CLUSTERS SGI Algorithmic Complexity/Data Volume COMPUTATIONAL GRID CRAY T 3 E LINUX ~1000 GFLOPS CLUSTERS SGI Origin per $ million IBM SP ~100 GFLOPS SUN ES 10000 per $ million CRAY T 3 E ~20 GFLOPS SGI Origin per $ million IBM SP CRAY YMP CONVEX C 2 CRAY XMP CONVEX C 1 ALLIANT SGI Power Ch IBM SP 2 CM 5 ~5 -8 GFLOPS per $ million ~2 -3 GFLOPS per $ million ~200 -400 MFLOPS per $ million • ~60 MFLOPS per $ million CRAY 1 CDC 203 Systems getting larger by 2 - 3 - 4 x per year !! – Increasing parallelism: add more and more processors DEC VAX/FPS ~20 MFLOPS IBM, CDC per $ million UNIVAC ~5 MFLOPS IBM 360/370 per $ million CDC 1604/600 • New Kind of Parallelism: GRID – Harness the power of Computing Resources which are growing UNIVAC 1100 ~3 MFLOPS per $ million Compute Requirements 1970 1975 1980 1985 Mainframes Vector Processors Supercomputers Dheeraj Bhardwaj 1990 MPP/SMP 1995 December 2003 2000 Scalable Parallel Systems 2005 Distributed 16 & Grid

HPC Applications Issues • Architectures and Programming Models – Distributed Memory Systems MPP, Clusters HPC Applications Issues • Architectures and Programming Models – Distributed Memory Systems MPP, Clusters – Message Passing – Shared Memory Systems SMP – Shared Memory Programming – Specialized Architectures – Vector Processing, Data Parallel Programming – The Computational Grid – Grid Programming • Applications I/O – Parallel I/O – Need for high performance I/O systems and techniques, scientific data libraries, and standard data representation • • Checkpointing and Recovery Monitoring and Steering Visualization (Remote Visualization) Programming Frameworks Dheeraj Bhardwaj December 2003 17

Future of Scientific Computing • Require Large Scale Simulations, beyond reach of any machine Future of Scientific Computing • Require Large Scale Simulations, beyond reach of any machine • Require Large Geo-distributed Cross Disciplinary Collaborations • Systems getting larger by 2 - 3 - 4 x per year !! – Increasing parallelism: add more and more processors • New Kind of Parallelism: GRID – Harness the power of Computing Resources which are growing Dheeraj Bhardwaj December 2003 18

What do we want to Achieve ? • Develop High Performance Computing Applications (HPC) What do we want to Achieve ? • Develop High Performance Computing Applications (HPC) which are • Portable ( Laptop Supercomputers Grid) • Future Proof – Grid Ready • Develop HPC Infrastructure (Parallel & Grid Systems) which is • User Friendly • Based on Open Source • Efficient in Problem Solving • Able to Achieve High Performance • Able to Handle Large Data Volumes Dheeraj Bhardwaj December 2003 19

Parallel Computer and Grid A parallel computer is a “Collection of processing elements that Parallel Computer and Grid A parallel computer is a “Collection of processing elements that communicate and co-operate to solve large problems fast”. A Computational Grid is an emerging infrastructure that enables the integrated use of remote high-end computers, databases, scientific instruments, networks and other resources. Dheeraj Bhardwaj December 2003 20

A Comparison SERIAL PARALLEL GRID v Fetch/Store v Compute/ communicate v Discovery of Resources A Comparison SERIAL PARALLEL GRID v Fetch/Store v Compute/ communicate v Discovery of Resources v Cooperative game v Interaction with remote application v Authentication / Authorization v Security v Compute/Commu nicate Dheeraj Bhardwaj v Etc December 2003 21

Serial and Parallel Algorithms - Evaluation • Serial Algorithm – Execution time as a Serial and Parallel Algorithms - Evaluation • Serial Algorithm – Execution time as a function of size of input • Parallel Algorithm – Execution time as a function of input size, parallel architecture and number of processors used Parallel System A parallel system is the combination of an algorithm and the parallel architecture on which its implemented Dheeraj Bhardwaj December 2003 22

What is the Grid • “Grid Computing [is] distinguished from conventional distributed computing by What is the Grid • “Grid Computing [is] distinguished from conventional distributed computing by its focus on large-scale resource sharing, innovative applications, and, in some cases, high performance orientation…we review the “Grid problem”, which we define as flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions and resources- what we refer to as virtual organizations. ” From “The Anatomy of the Grid: Enabling Scalable Virtual Organizations” by Foster, Kesselman and Tuecke Dheeraj Bhardwaj December 2003 23

Distributed Computing vs. GRID • Grid is an evolution of distributed computing – – Distributed Computing vs. GRID • Grid is an evolution of distributed computing – – Dynamic Geographically independent Built around standards Internet backbone • Distributed computing is an “older term” – Typically built around proprietary software and network – Tightly couples systems/organization Dheeraj Bhardwaj December 2003 24

Web vs. GRID • Web – Uniform naming access to documents http: // • Web vs. GRID • Web – Uniform naming access to documents http: // • Grid - Uniform, high performance access to computational resources Software Catalogs Sensor nets Colleges/R&D Labs Dheeraj Bhardwaj December 2003 25

Is the World Wide Web a Grid ? • • Seamless naming? Yes Uniform Is the World Wide Web a Grid ? • • Seamless naming? Yes Uniform security and Authentication? No Information Service? Yes or No Co-Scheduling? No Accounting & Authorization ? No User Services? No Event Services? No Is the Browser a Global Shell ? No Dheeraj Bhardwaj December 2003 26

What does the World Wide Web bring to the Grid ? • Uniform Naming What does the World Wide Web bring to the Grid ? • Uniform Naming • A seamless, scalable information service • A powerful new meta-data language: XML – XML will be standard language for describing information in the grid – SOAP – simple object access protocol • Uses XML for encoding. HTML for protocol – SOAP may become a standard RPC mechanism for Grid services • Uses XML for encoding. HTML for protocol • Portal Ideas Dheeraj Bhardwaj December 2003 27

The Ultimate Goal • In future I will not know or care where my The Ultimate Goal • In future I will not know or care where my application will be executed as I will acquire and pay to use these resources as I need them Dheeraj Bhardwaj December 2003 28

Why Grids? • Large-scale science and engineering are done through the interaction of people, Why Grids? • Large-scale science and engineering are done through the interaction of people, heterogeneous computing resources, information systems, and instruments, all of which are geographically and organizationally dispersed. • The overall motivation for “Grids” is to facilitate the routine interactions of these resources in order to support large-scale science and Engineering. Dheeraj Bhardwaj December 2003 29

Why Now ? • Moore’s law improvements in computing produce highly functional endsystems • Why Now ? • Moore’s law improvements in computing produce highly functional endsystems • The internet and burgeoning wired and wireless provide universal connectivity • Changing modes of working and problem solving emphasize teamwork, computation • Network exponentials produce dramatic changes in geometry and geography Dheeraj Bhardwaj December 2003 30

Network Exponentials • Network vs. computer performance – Computer speed doubles every 18 months Network Exponentials • Network vs. computer performance – Computer speed doubles every 18 months – Network speed doubles every 9 months – Difference = order of magnitude per 5 years • 1986 to 2000 – Computers: x 500 – Networks: x 340, 000 • 2001 to 2010 – Computers: x 60 – Networks: x 4000 Moore’s Law vs. storage improvements vs. optical improvements. Graph from Scientific American (Jan 31 Dheeraj Bhardwaj source Vined Khoslan, December 2003 2001) by Cleo Vilett, Kleiner, Caufield and Perkins.

Why Grid ? We are seeing a Fundamental Change in Scientific Applications • They Why Grid ? We are seeing a Fundamental Change in Scientific Applications • They have become multidisciplinary • Require incredible mix of varies technologies and expertise “Many problems require tightly coupled computers, with low latencies and high communication bandwidths; Grid computing may well increase … demand for such systems by making access easier” - Foster, Kesselman, Tuecke The Anatomy of the Grid Motivation: When the network is as fast as the computer's internal links, the ma disintegrates across the net into a set of special purpose appliances. Glider Technology Report, June 2002 Dheeraj Bhardwaj December 2003 32

Convergence between e-Science and e-Business • A biochemist exploits 10, 000 computers to screen Convergence between e-Science and e-Business • A biochemist exploits 10, 000 computers to screen 100, 000 compounds in an hour • A biologist combines a range of diverse and distributed resources (databases, tools, instruments) to answer complex questions • 1, 000 physicists worldwide pool resources for petaop analyses of petabytes of data • Civil engineer collaborate to design, execute, & analyze shake stable experiments. • An enterprise configures internal & external resources to support e. Business workload From Steve Tuecke 12 Oct’ 01 Dheeraj Bhardwaj December 2003 33

Convergence between e-Science and e-Business • Climate Scientist visualize, annotate, & analyze terabytes simulation Convergence between e-Science and e-Business • Climate Scientist visualize, annotate, & analyze terabytes simulation datasets • An emergency response team couples real time data, weather model, population data • A multidisciplinary analysis in aerospace couples code and data in four companies • A home user invokes architectural design functions at an application service provider • An insurance company mines data from partner hospitals for fraud detection Dheeraj Bhardwaj December 2003 34

Important Grid Applications • Data-intensive • Distributed computing (metacomputing) • Collaborative • Remote access Important Grid Applications • Data-intensive • Distributed computing (metacomputing) • Collaborative • Remote access to, and computer enhancement of, experimental facilities Dheeraj Bhardwaj December 2003 35

An Example Virtual Organization: CERN’s Large Hadron Collider 1800 Physicists, 150 Institutes, 32 Countries An Example Virtual Organization: CERN’s Large Hadron Collider 1800 Physicists, 150 Institutes, 32 Countries 100 PB of data by 2010; 50, 000 CPUs? www. griphyn. org www. ppdg. org Dheeraj Bhardwaj www. eu-datagrid. org December 2003 36

Grid Communities & Applications: Data Grids for High Energy Physics ~PBytes/sec Online System ~100 Grid Communities & Applications: Data Grids for High Energy Physics ~PBytes/sec Online System ~100 MBytes/sec ~20 TIPS There are 100 “triggers” per second Each triggered event is ~1 MByte in size ~622 Mbits/sec or Air Freight (deprecated) France Regional Centre Spec. Int 95 equivalents Offline Processor Farm There is a “bunch crossing” every 25 nsecs. Tier 1 1 TIPS is approximately 25, 000 Tier 0 Germany Regional Centre ~100 MBytes/sec CERN Computer Centre Fermi. Lab ~4 TIPS Italy Regional Centre ~622 Mbits/sec Tier 2 ~622 Mbits/sec Institute ~0. 25 TIPS Physics data cache Caltech ~1 TIPS Institute ~1 MBytes/sec Tier 4 Tier 2 Centre Tier 2 Centre ~1 TIPS Physicists work on analysis “channels”. Each institute will have ~10 physicists working on one or more channels; data for these channels should be cached by the institute server Physicist workstations www. griphyn. org www. ppdg. net Dheeraj Bhardwaj December 2003 37 www. eu-datagrid. org

A Brain is a Lot of Data! (Mark Ellisman, UCSD) And comparisons must be A Brain is a Lot of Data! (Mark Ellisman, UCSD) And comparisons must be made among many We need to get to one micron to know location of every cell. We’re just now starting to get to 10 microns – Grids will help get us there and further Dheeraj Bhardwaj December 2003 38

Biomedical Informatics Research Network (BIRN) • Evolving reference set of brains provides essential data Biomedical Informatics Research Network (BIRN) • Evolving reference set of brains provides essential data for developing therapies for neurological disorders (multiple sclerosis, Alzheimer’s, etc. ). • Today – One lab, small patient base – 4 TB collection • Tomorrow – 10 s of collaborating labs – Larger population sample – 400 TB data collection: more brains, higher resolution – Multiple scale data integration and analysis Dheeraj Bhardwaj December 2003 39

The Grid: A Brief History • Early 90 s – Gigabit testbeds, metacomputing • The Grid: A Brief History • Early 90 s – Gigabit testbeds, metacomputing • Mid to late 90 s – Early experiments (e. g. , I-WAY), academic software projects (e. g. , Globus, Legion), application experiments • 2002 – – – Dozens of application communities & projects Major infrastructure deployments Significant technology base (esp. Globus Toolkit. TM) Growing industrial interest Global Grid Forum: ~500 people, 20+ countries Dheeraj Bhardwaj December 2003 40

Today’s Grid • A single system interface • Transparent wide-area access to large data Today’s Grid • A single system interface • Transparent wide-area access to large data banks • Transparent wide-area access to applications on heterogeneous platforms • Transparent wide-area access to processing resources • Security, certification, single sing -on authentication – Grid Security Infrastructure • Data access, Transfer & Replication – Grid. FTP, Giggle • Computational resource discovery, allocation and Process creation – GRAM, Unicore, Condor-G Dheeraj Bhardwaj December 2003 41

Grid Evolution • First Generation Grid – – Computationally intensive, file access/transfer Bag of Grid Evolution • First Generation Grid – – Computationally intensive, file access/transfer Bag of various heterogeneous protocols & toolkits Recognizes internet, ignores web Academic Team • Second Generation Grid – – – Data intensive knowledge intensive Service based architecture Recognizes Web and Web services Global Grid Forum Industry participation Dheeraj Bhardwaj December 2003 42

Challenging Technical Requirements • Dynamic formation and management of virtual organizations • Online negotiation Challenging Technical Requirements • Dynamic formation and management of virtual organizations • Online negotiation of access to services: who, what, why, when, how • Establishment of applications and systems able to deliver multiple qualities of service • Autonomic management of infrastructure elements Open Grid Services Architecture http: //www. . org/ globus ogsa Dheeraj Bhardwaj December 2003 43

Elements of the Problem • Resource sharing – Computers, storage, sensors, networks, … – Elements of the Problem • Resource sharing – Computers, storage, sensors, networks, … – Heterogeneity of device, mechanism, policy – Sharing conditional: negotiation, payment, … • Coordinated problem solving – Integration of distributed resources – Compound quality of service requirements • Dynamic, multi-institutional virtual orgs – Dynamic overlays on classic org structures – Map to underlying control mechanisms Dheeraj Bhardwaj December 2003 44

The Grid • Diverse Resources – Dynamic – Unreliable – Shared Dheeraj Bhardwaj <dheerajb@cse. The Grid • Diverse Resources – Dynamic – Unreliable – Shared Dheeraj Bhardwaj • Administrative Issues – Security – Multiple organizations – Coordinated problem Solving December 2003 45

Grid Technologies: Resource Sharing Mechanisms That … • Address security and policy concerns of Grid Technologies: Resource Sharing Mechanisms That … • Address security and policy concerns of resource owners and users • Are flexible enough to deal with many resource types and sharing modalities • Scale to large number of resources, many participants, many program components • Operate efficiently when dealing with large amounts of data & computation Dheeraj Bhardwaj December 2003 46

Aspects of the Problem 1) Need for interoperability when different groups want to share Aspects of the Problem 1) Need for interoperability when different groups want to share resources – – Diverse components, policies, mechanisms E. g. , standard notions of identity, means of communication, resource descriptions 2) Need for shared infrastructure services to avoid repeated development, installation – – • E. g. , one port/service/protocol for remote access to computing, not one per tool/appln E. g. , Certificate Authorities: expensive to run A common need for protocols & services Dheeraj Bhardwaj December 2003 47

Hence, a Protocol-Oriented View of Grid Architecture, that Emphasizes … • Development of Grid Hence, a Protocol-Oriented View of Grid Architecture, that Emphasizes … • Development of Grid protocols & services – – Protocol-mediated access to remote resources New services: e. g. , resource brokering “On the Grid” = speak Intergrid protocols Mostly (extensions to) existing protocols • Development of Grid APIs & SDKs – Interfaces to Grid protocols & services – Facilitate application development by supplying higher-level abstractions Dheeraj Bhardwaj December 2003 48

The Hourglass Model • Focus on architecture issues A p p l i c The Hourglass Model • Focus on architecture issues A p p l i c a t i o n s – Propose set of core services as basic infrastructure – Use to construct high-level, domainspecific solutions Diverse global services • Design principles – – Keep participation cost low Enable local control Support for adaptation “IP hourglass” model Core services Local OS Dheeraj Bhardwaj December 2003 49

Layered Grid Architecture (By Analogy to Internet Architecture) “Coordinating multiple resources”: ubiquitous infrastructure services, Layered Grid Architecture (By Analogy to Internet Architecture) “Coordinating multiple resources”: ubiquitous infrastructure services, app-specific distributed services “Sharing single resources”: negotiating access, controlling use Collective Application Resource “Talking to things”: communication (Internet protocols) & security Connectivity Transport Internet “Controlling things locally”: Access to, & control of, resources Fabric Link Dheeraj Bhardwaj December 2003 50 Internet Protocol Architecture Application

Globus Toolkit™ • A software toolkit addressing key technical problems in the development of Globus Toolkit™ • A software toolkit addressing key technical problems in the development of Grid-enabled tools, services, and applications – Offer a modular set of orthogonal services – Enable incremental development of grid-enabled tools and applications – Implement standard Grid protocols and APIs – Available under liberal open source license – Large community of developers & users – Commercial support Dheeraj Bhardwaj December 2003 51

Building Grid Architecture & Globus Tool. Kit Application Collective Core Grid Services Resource Connectivity Building Grid Architecture & Globus Tool. Kit Application Collective Core Grid Services Resource Connectivity Local OS Dheeraj Bhardwaj Fabric Grid Information Index service Replica management Certificate repository (My proxy) Co-allocation library Grid Resource Information Service Grid Resource Access & Management Grid. FTP Internet protocol Globus Security Infrastructure Resources to Share December 2003 52

Key Protocols • The Globus Toolkit™ centers around four key protocols – Connectivity layer: Key Protocols • The Globus Toolkit™ centers around four key protocols – Connectivity layer: • Security. Grid Security Infrastructure (GSI) : – Resource layer: • Resource Management Resource Allocation Management : Grid (GRAM) • Information Services Resource Information Protocol (GRIP) : Grid and Index Information Protocol (GIIP) • Data Transfer : Grid File Transfer Protocol (Grid. FTP) • Also key collective layer protocols – Info Services, Replica Management, etc. Dheeraj Bhardwaj December 2003 54

Why Grid Security is Hard? • Resources being used may be extremely valuable & Why Grid Security is Hard? • Resources being used may be extremely valuable & the problems being solved extremely sensitive • Resources are often located in distinct administrative domains – Each resource may have own policies & procedures • The set of resources used by a single computation may be large, dynamic, and/or unpredictable – Not just client/server • It must be broadly available & applicable – Standard, well-tested, well-understood protocols – Integration with wide variety of tools Dheeraj Bhardwaj December 2003 55

Grid Security Requirements User View Resource Owner View 1) Easy to use 1) Specify Grid Security Requirements User View Resource Owner View 1) Easy to use 1) Specify local access control 2) Single sign-on 2) Auditing, accounting, etc. 3) Run applications ftp, ssh, MPI, Condor, Web, … 3) Integration w/ local system Kerberos, AFS, license mgr. 4) User based trust model 4) Protection from compromised resources 5) Proxies/agents (delegation) Developer View API/SDK with authentication, flexible message protection, flexible communication, delegation, . . . Direct calls to various security functions (e. g. GSS-API) Or security integrated into higher-level SDKs: E. g. Globus. IO, Condor-G, MPICH-G 2, HDF 5, etc. Dheeraj Bhardwaj December 2003 56

Convergence on Service Oriented Architecture • Development of service oriented grid middleware using different Convergence on Service Oriented Architecture • Development of service oriented grid middleware using different technologies (such as Java/Jini, web services) to instantiate the service architecture. Service Requester A typical SOA e ic es v er tch S a M ce vi r Se Int wi erac th Se tion rvi ce up ok Lo Register Service provider Dheeraj Bhardwaj Discover Service December 2003 Service locator 58

The future. . Web Services • Web services are self-describing applications that can find The future. . Web Services • Web services are self-describing applications that can find and interact with other web applications to complete complex tasks over the internet. • Unlike the hard-wired applications of the client-server computing days, web services are loosely coupled software components that can find and interact with other components on the internet without manual human intervention Dheeraj Bhardwaj December 2003 59

The future… Web services • Increasingly popular standards-based frameworks for accessing network applications – The future… Web services • Increasingly popular standards-based frameworks for accessing network applications – W 3 C standardization, Microsoft, IBM, SUN, others • WSDL: Web Services Description Language – Interface definition Language for web services • SOAP: Simple Object Access Protocol – XML based RPC protocol, common WSDL target • WS-inspection – Conventions for locating service descriptions • UDDI: Universal Description, Discovery, & Integration – Discovery for Web services. Dheeraj Bhardwaj December 2003 60

Open Grid Service Architecture (OGSA) • Utilize standard Web services infrastructures • Building on Open Grid Service Architecture (OGSA) • Utilize standard Web services infrastructures • Building on current Globus toolkit: – – Grid service: semantics for service interactions Management of transient instances (&state) Factory, registry, Discovery, other services Reliable and secure transport • Multiple hosting targets J 2 EE, . NET, “C”, …. . • Service Orientated architecture enable resource virtualization • Delivery via open source Globus Toolkit 3. 0 – Leverage GT Experience, code, mindshare Dheeraj Bhardwaj December 2003 61

Bio. Grid approach • Standardize interfaces • Provide global directory of objects • Distribute Bio. Grid approach • Standardize interfaces • Provide global directory of objects • Distribute computation transparently • Distribute data transparently • Provide security on all object storage, transfer and communications • • Provide accountability, credibility and identification Bundle everything in a plug-and-play package Dheeraj Bhardwaj December 2003 62

Typical Computing in Bioinformatics Job Task 1 Task 2. . . Software DB Task Typical Computing in Bioinformatics Job Task 1 Task 2. . . Software DB Task 251 -500 Software DB Task 501 -750 Software DB Task 751 -1000 Software DB Task 1 -250 Task 999 Task 1000 great many and similar tasks independent to each other Dheeraj Bhardwaj December 2003 63

Bioinformatics Environment Postgre. SQL P 2 P Server Environment Information Server Node Search Divided Bioinformatics Environment Postgre. SQL P 2 P Server Environment Information Server Node Search Divided Jobs Node Set of Nodes Node Globus Tool Kit Results Job Dispatcher (obidispatch) Reporting Environmental Information Temporal Work Area for Job Execution Local Authentification Job (List of Tasks) List of OBIEnv Users Globus Tool Kit Environment Scanner (obiregist) DB SW HW OBIEnv User Unauthorized Local Users Dheeraj Bhardwaj December 2003 transferred and updated by obiupdate command 64

Parallel Job Execution Job (Task List) blast Q 1 genbank blast Q 2 genbank Parallel Job Execution Job (Task List) blast Q 1 genbank blast Q 2 genbank : blast Q 10 genbank Job Dispatcher (obidispatch) Q 1, Q 2 Set of Nodes Q 3, Q 4 Tr. EMBL Q 5, Q 6 Tr. EMBL Q 7, Q 8 Q 9, Q 10 Tr. EMBL Nodes with Tr. EMBL and BLAST? Environment Information Server Tasks are independent to each other Dheeraj Bhardwaj December 2003 65

Typical Database Access in Bioinformatics Mirroring Web Services App 1’ App 1 App 2 Typical Database Access in Bioinformatics Mirroring Web Services App 1’ App 1 App 2 Site A Site B App 2’ Dheeraj Bhardwaj Site A December 2003 Site B 66

Database Federation and Computational Pipeline Database Federation + Web Services Phenome App 4 Metabolome Database Federation and Computational Pipeline Database Federation + Web Services Phenome App 4 Metabolome App 3 Proteome App 2 App 5 Computational Pipeline Transcriptome App 1 Genome Dheeraj Bhardwaj December 2003 67

Virtual Organization on Grid A B Project VO on Grid C D Project VO Virtual Organization on Grid A B Project VO on Grid C D Project VO provides the boundary of knowledge sharing over geometrical and organizational limitation. Dheeraj Bhardwaj December 2003 68

Bio. Grid Schematic • Grid-aware client software • Data and software resource directories • Bio. Grid Schematic • Grid-aware client software • Data and software resource directories • Grid of processing computers Dheeraj Bhardwaj December 2003 69

Open Grid Service Architecture Dheeraj Bhardwaj <dheerajb@cse. iitd. ac. in> December 2003 72 Open Grid Service Architecture Dheeraj Bhardwaj December 2003 72

Future Grid Challenges • Need ‘power station’ on the Grid – Buy (obtain) resources Future Grid Challenges • Need ‘power station’ on the Grid – Buy (obtain) resources as required • Need to understand how applications behave – Balance out data transfer Vs. compute shipping • Need to scalable wide-area service discovery – Peer to Peer or centralized servers – Meta-data to describe Grid Services • Need to exploit distributed services – Grid Service Orchestration – Optimise service selection and recover from failure Dheeraj Bhardwaj December 2003 73

The GRID is all about • The Coordinated, Transparent, Secure and Effective Utilization of The GRID is all about • The Coordinated, Transparent, Secure and Effective Utilization of Geographically distributed heterogeneous resources (both hardware & Software) for Applications To be Successful • The Grid has to support applications in the same way that the power utilities support the use of household appliances The Metaphor • Computers to act as generators of computational “power”, for applications to become computational appliances • The software infrastructure to act as the utility responsible for managing the interaction between them Dheeraj Bhardwaj December 2003 74

Whom Does Grid Computing Serve ? • The users and Their Applications • Large Whom Does Grid Computing Serve ? • The users and Their Applications • Large Complex Applications which need resources beyond the traditional – Parallel/Distributed processing in a box – Put-it-yourself together Clusters • Applications that describe multiple aspects of a system • Applications consisting of multiple modules • Applications with multi-source data • Applications interfacing with measurement systems and visualization systems Application Programmers will be able to write applications that leverage Tera. Flops computations amd Peta. Bytes storage Dheeraj Bhardwaj December 2003 75

Grid System – Three Point Checklist • Coordinated resource sharing that are not subject Grid System – Three Point Checklist • Coordinated resource sharing that are not subject to centralized control • Using standard, open, general-purpose protocols and interfaces • To deliver nontrivial quality of services Dheeraj Bhardwaj December 2003 76

Applications Development On Grid What do Application Developers Need to Think About in Grid Applications Development On Grid What do Application Developers Need to Think About in Grid Environments ? • This is very similar to the requirements for an application to be able to run on many different architectures • Need now to also think that not all processes in an application are necessarily running on the same resource or even the same architecture • Not all processes have access to the same environment, or may be able to reach the same set of remote resources Dheeraj Bhardwaj December 2003 77

Hook enough computers together and what do you get? A new kind of utility Hook enough computers together and what do you get? A new kind of utility that offers supercomputer processing on tap. Dheeraj Bhardwaj December 2003 78

Access Grid • High-end group work and collaboration technology • Grid services being used Access Grid • High-end group work and collaboration technology • Grid services being used for discovery, configuration, authentication • O(50) systems deployed worldwide • Basis for SC’ 2001 SC Global event in November 2001 – www. scglobal. org Presenter mic Presenter camera Ambient mic (tabletop) Audience camera www. accessgrid. org Dheeraj Bhardwaj December 2003 79

Building Bridges for the Future of Science Grid Computing is a paradigm that will Building Bridges for the Future of Science Grid Computing is a paradigm that will have considerable impact on how computing resources will be provisioned – and Java. TM technology is primary technology that will enable it Dheeraj Bhardwaj December 2003 80