Скачать презентацию The Grid and its Implications for Science and Скачать презентацию The Grid and its Implications for Science and

450c811b78a865b50176a434155e5c6b.ppt

  • Количество слайдов: 87

The Grid and its Implications for Science (and Industry) Ian Foster Argonne National Laboratory The Grid and its Implications for Science (and Industry) Ian Foster Argonne National Laboratory University of Chicago Globus Alliance Royal Society of New Zealand, Wellington, March 16, 2004

My Affiliations: (1) Physical Organizations University of Chicago 1890 (Rockefeller) 2200 faculty 13, 000 My Affiliations: (1) Physical Organizations University of Chicago 1890 (Rockefeller) 2200 faculty 13, 000 students 75 Nobel prizes 25 miles Operated by Univ. of Chicago for Dept of Energy’s Office of Science Founded 1943 (Fermi 1 st director) ~4000 employees, 4000 facility users ~$500 M annual budget

My Affiliations: (2) Virtual Organizations My Affiliations: (2) Virtual Organizations

Multidisciplinary Teams: Problem Solving in the 21 st Century u Teams organized around common Multidisciplinary Teams: Problem Solving in the 21 st Century u Teams organized around common goals ◊ u With diverse membership & capabilities ◊ u Heterogeneity is a strength not a weakness And geographic and political distribution ◊ u Communities: “Virtual organizations” No location/organization possesses all required skills and resources Must adapt as a function of the situation ◊ Adjust membership, reallocate responsibilities, renegotiate resources 4

Overview u Part I: Technology, science, & industry ◊ ◊ Why distributed teams are Overview u Part I: Technology, science, & industry ◊ ◊ Why distributed teams are important ◊ The need for infrastructure ◊ u Technology trends Implications beyond science Part II: New modes of working 5

Living in an Exponential World Performance per Dollar Spent Optical Fiber (bits per second) Living in an Exponential World Performance per Dollar Spent Optical Fiber (bits per second) (Doubling time 9 Months? ) Silicon Computer Chips (Number of Transistors) (Doubling time 18 Months) 0 1 2 3 Number of Years Scientific American, January 2001 Data Storage (bits per square inch) (Doubling time 12 Months) 4 5 6

Moore’s Law Tiny Sensors www. givenimaging. com Israeli Video Pill Habitat sensors Project Neptune Moore’s Law Tiny Sensors www. givenimaging. com Israeli Video Pill Habitat sensors Project Neptune

Living in an Exponential World: (2) Storage u Storage density doubles every 12 months Living in an Exponential World: (2) Storage u Storage density doubles every 12 months u Dramatic growth in online data (1 petabyte = 1000 terabyte = 1, 000 gigabyte) ◊ ~0. 5 petabyte ◊ 2005 ~10 petabytes ◊ 2010 ~100 petabytes ◊ u 2000 2015 ~1000 petabytes? Transforming entire disciplines in physical and, increasingly, biological sciences; humanities next? 8

Living in an Exponential World: (3) Networking (Coefficients Matter) u Network vs. computer performance Living in an Exponential World: (3) Networking (Coefficients Matter) u Network vs. computer performance ◊ ◊ Network speed doubles every 9 months ◊ u Computer speed doubles every 18 months Difference = order of magnitude per 5 years 1986 to 2000 @ NCSA ◊ ◊ u Computers: x 500 Networks: x 340, 000 2001 to 2010: ? ? ◊ Computers: x 60 ◊ Networks: x 4000 Scientific American, January 2001 9

Parallelism Has Come to Optical Networking (WDM) Source: Steve Wallach, Chiaro Networks “Lambdas” “Parallel Parallelism Has Come to Optical Networking (WDM) Source: Steve Wallach, Chiaro Networks “Lambdas” “Parallel lambdas will drive this decade the way parallel processors drove the 1990 s” (Larry Smarr) 10

The Opportunity (or Challenge): Computational Cornucopia u Abundant computation, data, bandwidth ◊ ◊ Simulations The Opportunity (or Challenge): Computational Cornucopia u Abundant computation, data, bandwidth ◊ ◊ Simulations of unprecedented accuracy ◊ u In many fields, too much data—not too little Ubiquitous internet distance not a barrier But as a consequence ◊ ◊ ◊ Rate of change accelerates Complex problems multidisciplinary teams & sharing of resources & expertise Without infrastructure, you can’t compete 11

Overview u Part I: Technology, science, & industry ◊ ◊ Why distributed teams are Overview u Part I: Technology, science, & industry ◊ ◊ Why distributed teams are important ◊ The need for infrastructure ◊ u Technology trends Implications beyond science Part II: New modes of working 12

Why Distributed Teams Are Important u Increasingly challenging & complex problems ◊ ◊ ◊ Why Distributed Teams Are Important u Increasingly challenging & complex problems ◊ ◊ ◊ u Global change, cosmology, biology, … Life sciences, manufacturing, mineral exploration Film production, game development, … Required expertise & resources distributed ◊ People ◊ Computational capability ◊ Data ◊ Sensors 13

High Energy Physics Experiments 14 High Energy Physics Experiments 14

Network for Earthquake Eng. Simulation Remote Users (Faculty, Students, Practitioners) Instrumented Structures and Sites Network for Earthquake Eng. Simulation Remote Users (Faculty, Students, Practitioners) Instrumented Structures and Sites Simulation Tools Repository Laboratory Equipment Field Equipment Curated Data Repository Leading Edge Computation Global Connections Remote Users: (K www. neesgrid. org Laboratory Equipment -12 Faculty and Students) 15

The Power of Data Integration: Virtual Observatories No. & sizes of data sets as The Power of Data Integration: Virtual Observatories No. & sizes of data sets as of mid-2002, grouped by wavelength • 12 waveband coverage of large areas of the sky • Total about 200 TB data • Doubling every 12 months • Largest catalogues near 1 B objects Data and images courtesy Alex Szalay, John Hopkins 16

Overview u Part I: Technology, science, & industry ◊ ◊ Why distributed teams are Overview u Part I: Technology, science, & industry ◊ ◊ Why distributed teams are important ◊ The need for infrastructure ◊ u Technology trends Implications beyond science Part II: New modes of working 17

“Cyberinfrastructure” “A new age has dawned in scientific & engineering research, pushed by continuing “Cyberinfrastructure” “A new age has dawned in scientific & engineering research, pushed by continuing progress in computing, information, and communication technology, & pulled by the expanding complexity, scope, and scale of today’s challenges. The capacity of this technology has crossed thresholds that now make possible a comprehensive “cyberinfrastructure” on which to build new types of scientific & engineering knowledge environments & organizations and to pursue research in new ways & with increased efficacy. ” [NSF Blue Ribbon Panel report, 2003] 18

Cyberinfrastructure-enabled Knowledge Communities Virtual teams, communities, organizations, knowledge communities, environments/ecologies Cyber-infrastructure: Equipment, Software, People, Cyberinfrastructure-enabled Knowledge Communities Virtual teams, communities, organizations, knowledge communities, environments/ecologies Cyber-infrastructure: Equipment, Software, People, Institutions Computation, Storage, Communication and Interface Technologies Slide courtesy Dan Atkins, U. Michigan 19

Infrastructure (1): 80% of Success is Showing Up* u The world is shrinking rapidly—but Infrastructure (1): 80% of Success is Showing Up* u The world is shrinking rapidly—but not uniformly. E. g. , from Chicago, I can send: ◊ ◊ u 1 TB (1012 bytes) to Geneva in 20 minutes 1 MB (106 bytes) to Wellington in 4 hours The dirty underside of exponentials ◊ u 9 month doubling 10 years lag in network deployment = 10, 000 x slower Without an always-on, hi-speed network you might as well be on a different planet! Broadband national & international nets *Woody Allen 20

Collaborating at Speed (www. internetnz. net. nz/public/ngi) ü “An innovative & globally connected economy, Collaborating at Speed (www. internetnz. net. nz/public/ngi) ü “An innovative & globally connected economy, with state of the art national internet infrastructure delivering bandwidth at capacities and prices that encourage collaboration, and stimulate researchers & entrepreneurs to seek new challenges and business opportunities. ” (www. nginz. co. nz/about/strategy. html) ü “In the knowledge economy, the new roads & flight paths are the internet … Don’t waste time with another report; just get on and build it. ” (Ian Taylor, p 38) 21

U. S. National Lambda Rail Very High-End Experimental and Research Applications 4 x 10 U. S. National Lambda Rail Very High-End Experimental and Research Applications 4 x 10 GB Wavelengths Initially Capable of 40 x 10 Gb wavelengths at Buildout 2000 Miles 10 ms =1000 x Campus Latency Source: John Silvester, Dave Reese, Tom West-CENIC 22

Star. Light International Interconnects Source: Tom De. Fanti, UIC 23 Star. Light International Interconnects Source: Tom De. Fanti, UIC 23

Trans-Pacific Optical Research Testbed 24 Trans-Pacific Optical Research Testbed 24

Infrastructure (2): Moving Beyond PC Science u “PC science”: science scoped to the data Infrastructure (2): Moving Beyond PC Science u “PC science”: science scoped to the data and computing that fits on a PC ◊ u Limits questions asked & answers obtained Networks allow us to do much better ◊ ◊ Harness idle desktops: Condor etc. ◊ Access remote supercomputers ◊ Integrate multiple datasets ◊ u Build inexpensive clusters Create community data resources Need not be inordinately expensive ◊ But does require investment 25

Overview u Part I: Technology, science, & industry ◊ ◊ Why distributed teams are Overview u Part I: Technology, science, & industry ◊ ◊ Why distributed teams are important ◊ The need for infrastructure ◊ u Technology trends Implications beyond science Part II: New modes of working 26

Cyberinfrastructure & VOs Have Relevance Far Beyond Science 1) Virtualization of information technology ◊ Cyberinfrastructure & VOs Have Relevance Far Beyond Science 1) Virtualization of information technology ◊ ◊ ◊ From vertical silos to on-demand access Improve efficiency of delivery, increase flexibility of use E. g. , financial services, e-commerce 2) New applications, products, & services enabled by much computation & data ◊ Media, life sciences, manufacturing, seismic exploration, online gaming, etc. 27

The Value of Grid Computing: IBM Perspective Increased Efficiency Higher Quality of Service Increased The Value of Grid Computing: IBM Perspective Increased Efficiency Higher Quality of Service Increased Productivity & ROI Reduced Complexity & Cost Improved Resiliency 28

Grids: HP Perspective virtual data center computing utility or GRID value programmable data center Grids: HP Perspective virtual data center computing utility or GRID value programmable data center switch compute fabric storage grid-enabled systems clusters UDC Tru 64, HP-UX, Linux Open VMS clusters, Tru. Cluster, MC Service. Guard today shared, traded resources 29

Platform Symphony: Real-Time Online Processing • Automatically connect applications to services • Dynamic & Platform Symphony: Real-Time Online Processing • Automatically connect applications to services • Dynamic & intelligent provisioning Applications: Delivery Application Virtualization Application Services: Distribution Infrastructure Virtualization Servers: Execution • Dynamic & intelligent provisioning • Automatic failover Source: The Grid: Blueprint for a New Computing Infrastructure (2 nd Edition), 2004. 30

Cost Savings from Grids u The size of cost savings from grids will come Cost Savings from Grids u The size of cost savings from grids will come in two waves: ◊ First from the adoption of clusters ◊ Then from the adoption of Enterprise Grids u Firms using Clusters estimate that cost savings will be small at first, but will grow to 15% to 30% savings in IT Costs in 2005 -2008. u Firms planning to use Enterprise Grids estimate that they will experience a second wave of benefits. Savings will grow to 15% to 30% by 2007 -2010. Source: Robert Cohen, “Grid Computing: Projected Impact on North Carolina’s Economy & Broadband Use through 2010, ” Rural Internet Access Authority, September 2003. http: //www. e-nc. org 31

Because of significant cost savings, in addition to productivity gains and changes in supply Because of significant cost savings, in addition to productivity gains and changes in supply chain relationships (especially in autos), many early adopter industries show dramatic gains in productivity over the levels expected in the 2010 economic forecast. This is one of the more striking findings of the research. There is a real possibility that once Web services & grids converge, firms will begin to transform their business processes & achieve even higher productivity benefits than the ones that we describe. Source: Robert Cohen 32

The use of Grids and Web Services results in a substantial increase in bandwidth The use of Grids and Web Services results in a substantial increase in bandwidth demand: The examples are derived from two separate studies The economic model provides us with a way to see how much spending for communications services will shift from its assumed pattern through 2010. In many of the early adopter industries we studied, spending on communications services more than doubled by 2010. We believe that this is likely to reflect new spending on broadband access. We plan to revisit these estimates. We are also developing other approaches to examine the broadband story. Source: Robert Cohen 33

Faster Computers + Sensors New Modes of Working u Open MRI and surgical theater Faster Computers + Sensors New Modes of Working u Open MRI and surgical theater ◊ Overlay of Graphics from l u Feedback to surgeon ◊ Change in location of landmarks l u Computed Data & Simulation and Target Tumor Feedback To MRI controls ◊ and radiologist to modulate l Instrument and Improve Images Provided by Ron Kikinis & Steve Pieper of the Surgical Planning Laboratory, Brigham and Woman’s Hospital, Harvard Slide: Larry Smarr, UCSD 34

New Modes of Working: Shell’s Visualization Center Over a hundred installed for local collaborative New Modes of Working: Shell’s Visualization Center Over a hundred installed for local collaborative decision support in the oil and gas industry “By being able to view the data together and talk on the spot, we were able to prepare for a pressure change not anticipated by people working on their own computers…and save significant amounts of money” Shell Exploration & Production Company www. shellus. com/sepco/technological_excellence. pdf 35

Part I: Summary u New modes of working ◊ ◊ u Integration of theory, Part I: Summary u New modes of working ◊ ◊ u Integration of theory, experiment, computing Enabled & driven by technology trends Requires new infrastructure ◊ ◊ u Broadband networks Beyond PC science Implications for industry as well as science ◊ Virtualization ◊ New modes of working 36

Overview u Part I: Technology, science, & industry u Part II: New modes of Overview u Part I: Technology, science, & industry u Part II: New modes of working ◊ Grid technologies ◊ Upper Atmosphere Research Collaboratory ◊ Network for Earthquake Eng. Simulation ◊ Data-intensive science & virtual data ◊ Grid 2003 Data Grid ◊ Earth System Grid ◊ New modes of research & funding 37

The Grid “Resource sharing & coordinated problem solving in dynamic, multiinstitutional virtual organizations” 1. The Grid “Resource sharing & coordinated problem solving in dynamic, multiinstitutional virtual organizations” 1. Enable integration of distributed resources 2. Using general-purpose protocols & infrastructure 3. To achieve better-than-best-effort service 38

Example Grid Capabilities Engage via telepresence in an experiment at a remote facility Discover Example Grid Capabilities Engage via telepresence in an experiment at a remote facility Discover & access a genome analysis service (running on high-end computer) Integrate data from multiple sources in support of global change research Harness computers across sites to process data from a physics experiment 39

The (Dubious) Power Grid Analogy Do we ship work to the power source? Or The (Dubious) Power Grid Analogy Do we ship work to the power source? Or ship power to where work needs to be done? On-demand access to and integration of resources & services, regardless of location 40

Increased functionality, standardization The Emergence of Open Grid Standards Managed shared virtual systems Computer Increased functionality, standardization The Emergence of Open Grid Standards Managed shared virtual systems Computer science research Open Grid Services Arch Web services, etc. Internet standards Custom solutions 1990 Real standards Multiple implementations Globus Toolkit Defacto standard Single implementation 1995 2000 2005 2010 41

Grid Technologies: Globus Toolkit & Open Grid Services Architecture Domain-Specific Services Program Execution Data Grid Technologies: Globus Toolkit & Open Grid Services Architecture Domain-Specific Services Program Execution Data Services Core Services WS-Resource Framework Web Services Messaging, Security, Etc. 42

UK e. Science Grid NEESgrid Tera. Grid BIRN Biomedical Grid Fusion Grid Earth System UK e. Science Grid NEESgrid Tera. Grid BIRN Biomedical Grid Fusion Grid Earth System Grid Access Grid MPICH-G 2 IBM Grid Toolkit EU Data. Grid Butterfly Grid NSF Middleware Init. Platform Globus Virtual Data Toolkit Partners are Creating Strong GT-Based Grid Solutions … Globus Toolkit Note that WSRF has little or no effect on solution users! 43

P re-WS Examples of Web Services Production Grid Deployments “Persistent deployment of Grid services P re-WS Examples of Web Services Production Grid Deployments “Persistent deployment of Grid services in support of a diverse user community” u Grid 3/i. VDGL (US) ◊ u P u 22 sites, O(3000) CPUs, 2 countries LHC Computing Grid P ◊ u 4 sites, O(3000) CPUs ◊ 25 sites, international Aeronautics NEESgrid (prod. 2004) W High energy physics Nordu. Grid ◊ P ◊ ◊ u NASA IPG ◊ ◊ P 24 clusters, 724 CPUs, 6 countries; physics u Instruments, data, compute, collaborative Earthquake eng. Tera. Grid (prod. Jan 04) ◊ P 5 sites, expanding 44

From Tony Blair’s Speech to the Royal Society (23 May 2002) What is particularly From Tony Blair’s Speech to the Royal Society (23 May 2002) What is particularly impressive is the way that scientists are now undaunted by important complex phenomena. Pulling together the massive power available from modern computers, the engineering capability to design and build enormously complex automated instruments to collect new data, with the weight of scientific understanding developed over the centuries, the frontiers of science have moved into a detailed understanding of complex phenomena ranging from the genome to our global climate. Predictive climate modelling covers the period to the end of this century and beyond, with our own Hadley Centre playing the leading role internationally. 45

From Tony Blair’s Speech to the Royal Society (23 May 2002) The emerging field From Tony Blair’s Speech to the Royal Society (23 May 2002) The emerging field of e-science should transform this kind of work. It's significant that the UK is the first country to develop a national e-science Grid, which intends to make access to computing power, scientific data repositories and experimental facilities as easy as the Web makes access to information. One of the pilot e-science projects is to develop a digital mammographic archive, together with an intelligent medical decision support system for breast cancer diagnosis and treatment. An individual hospital will not have supercomputing facilties, but through the Grid it could buy the time it needs. So the surgeon in the operating room will be able to pull up a high-resolution mammogram to identify exactly where the tumour can be found. 46

Overview u Part I: Technology, science, & industry u Part II: New modes of Overview u Part I: Technology, science, & industry u Part II: New modes of working ◊ Grid technologies ◊ Upper Atmosphere Research Collaboratory ◊ Network for Earthquake Eng. Simulation ◊ Data-intensive science & virtual data ◊ Grid 2003 Data Grid ◊ Earth System Grid ◊ New modes of research & funding 47

Upper Atmosphere Research Collaboratory UARC slides: Dan Atkins 48 Upper Atmosphere Research Collaboratory UARC slides: Dan Atkins 48

Computational models Team chat Session replay Annotation Work rooms Real-time instruments Archival data Journals Computational models Team chat Session replay Annotation Work rooms Real-time instruments Archival data Journals 49

Evolved into a Network of Instruments (One Global Instrument) 50 Evolved into a Network of Instruments (One Global Instrument) 50

51 51

Overview u Part I: Technology, science, & industry u Part II: New modes of Overview u Part I: Technology, science, & industry u Part II: New modes of working ◊ Grid technologies ◊ Upper Atmosphere Research Collaboratory ◊ Network for Earthquake Eng. Simulation ◊ Data-intensive science & virtual data ◊ Grid 2003 Data Grid ◊ Earth System Grid ◊ New modes of research & funding 52

Network for Earthquake Eng. Simulation Remote Users (Faculty, Students, Practitioners) Instrumented Structures and Sites Network for Earthquake Eng. Simulation Remote Users (Faculty, Students, Practitioners) Instrumented Structures and Sites Simulation Tools Repository Laboratory Equipment Field Equipment Curated Data Repository Leading Edge Computation Global Connections Remote Users: (K www. neesgrid. org Laboratory Equipment -12 Faculty and Students) 53

NEESgrid High-level Structure 54 NEESgrid High-level Structure 54

Architecture of NEESgrid Equipment Site. 55 Architecture of NEESgrid Equipment Site. 55

Substructure Pseudo-dynamic Testing Numerical Simulation Physical Tests Structural Test Geotechnical Test at Another Lab Substructure Pseudo-dynamic Testing Numerical Simulation Physical Tests Structural Test Geotechnical Test at Another Lab Structural FE Simulation 1 Total System Geotechnical FE Simulation 2 56

Test Structure for Multi-Site Online Simulation Test (MOST) Experiment UIUC Experimental Model U. Colorado Test Structure for Multi-Site Online Simulation Test (MOST) Experiment UIUC Experimental Model U. Colorado Experimental Model NCSA Computational Model m 1 f 2 f 1 m 1 f 2 Note: for ease of programming, all computational models were written in Matlab. 57

MOST: A Grid Perspective UIUC U. Colorado Experimental Model F 1 e F 2 MOST: A Grid Perspective UIUC U. Colorado Experimental Model F 1 e F 2 m 1 , q 1 f 1 , x 1 = f 2 NTCP SERVER SIMULATION COORDINATOR NCSA NTCP SERVER m 1 Computational Model f 1 f 2 58

Tele-Control Services 59 Tele-Control Services 59

Data Viewer 60 Data Viewer 60

61 61

Number of Remote Participants in MOST Experiment 62 Number of Remote Participants in MOST Experiment 62

Overview u Part I: Technology, science, & industry u Part II: New modes of Overview u Part I: Technology, science, & industry u Part II: New modes of working ◊ Grid technologies ◊ Upper Atmosphere Research Collaboratory ◊ Network for Earthquake Eng. Simulation ◊ Data-intensive science & virtual data ◊ Grid 2003 Data Grid ◊ Earth System Grid ◊ New modes of research & funding 63

Data-Intensive Analysis u Users & resources in many institutions … ◊ u … engage Data-Intensive Analysis u Users & resources in many institutions … ◊ u … engage in collaborative data analysis ◊ u 1000 s of users, 100 s of institutions, petascale resources Both structured/scheduled & interactive Many overlapping virtual orgs must ◊ Define activities ◊ Pool resources ◊ Prioritize tasks ◊ Manage data ◊ … 64

Science Review sharing Analysis exec. discovery data Researcher composition Applications instrument Chimera virtual data Science Review sharing Analysis exec. discovery data Researcher composition Applications instrument Chimera virtual data system Planning planning Production Manager data Gri. Phy. N Overview Virtual Production Data params discovery storage element Services storage element Grid Fabric Pegasus planner DAGman Globus Toolkit Condor Ganglia, etc. Virtual Data Toolkit Execution 65

(Early) Virtual Data Language pythia_input pythia. exe cmsim_input cmsim. exe write. Hits write. Digis (Early) Virtual Data Language pythia_input pythia. exe cmsim_input cmsim. exe write. Hits write. Digis CMS “Pipeline” begin v /usr/local/demo/scripts/cmkin_input. csh file i ntpl_file_path file i template_file i num_events stdout cmkin_param_file end begin v /usr/local/demo/binaries/kine_make_ntpl_pyt_cms 121. exe pre cms_env_var stdin cmkin_param_file stdout cmkin_log file o ntpl_file end begin v /usr/local/demo/scripts/cmsim_input. csh file i ntpl_file i fz_file_path file i hbook_file_path file i num_trigs stdout cmsim_param_file end begin v /usr/local/demo/binaries/cms 121. exe condor copy_to_spool=false condor getenv=true stdin cmsim_param_file stdout cmsim_log file o fz_file o hbook_file end begin v /usr/local/demo/binaries/write. Hits. sh condor getenv=true pre orca_hits file i fz_file i detinput file i condor_write. Hits_log file i oo_fd_boot file i datasetname stdout write. Hits_log file o hits_db end begin v /usr/local/demo/binaries/write. Digis. sh pre orca_digis file i hits_db file i oo_fd_boot file i carf_input_dataset_name file i carf_output_dataset_name file i carf_input_owner file i carf_output_owner file i condor_write. Digis_log stdout write. Digis_log file o digis_db end 66

Search for WW decays of the Higgs Boson and where only stable, final state Search for WW decays of the Higgs Boson and where only stable, final state particles are mass = 200 recorded: stability = 1 Scientist discovers an interesting result – wants to know how it was derived. Virtual Data Example: High Energy Physics mass = 200 decay = bb mass = 200 decay = ZZ mass = 200 decay = WW stability = 1 mass = 200 event = 8 mass = 200 plot = 1 Work and slide by Rick Cavanaugh and Dimitri Bourilkov, University of Florida mass = 200 decay = WW stability = 3 mass = 200 decay = WW event = 8 mass = 200 decay = WW plot = 1 . . . The scientist adds a new derived data branch. . . mass = 200 decay = WW stability = 1 Low. Pt = 20 High. Pt = 10000 . . . and continues to investigate… mass = 200 decay = WW stability = 1 event = 8 mass = 200 decay = WW stability = 1 plot = 1 67

Virtual Data Example: Sloan Galaxy Cluster Analysis Task Graph Sloan Data Galaxy cluster size Virtual Data Example: Sloan Galaxy Cluster Analysis Task Graph Sloan Data Galaxy cluster size distribution Jim Annis, Steve Kent, Vijay Sehkri, Neha Sharma, Fermilab, Michael 68 Milligan, Yong Zhao, Chicago

“We uploaded the data to the Grid & used the grid analysis tools to “We uploaded the data to the Grid & used the grid analysis tools to find the shower” Virtual Data Example: Education (Work in Progress) 69

Overview u Part I: Technology, science, & industry u Part II: New modes of Overview u Part I: Technology, science, & industry u Part II: New modes of working ◊ Grid technologies ◊ Upper Atmosphere Research Collaboratory ◊ Network for Earthquake Eng. Simulation ◊ Data-intensive science & virtual data ◊ Grid 2003 Data Grid ◊ Earth System Grid ◊ New modes of research & funding 70

Grid 2003: An Operational Grid Ø 28 sites (2100 -2800 CPUs) & growing Ø Grid 2003: An Operational Grid Ø 28 sites (2100 -2800 CPUs) & growing Ø 400 -1300 concurrent jobs Ø 7 substantial applications + CS experiments Ø Running since October 2003 Korea http: //www. ivdgl. org/grid 2003 71

Grid 2003 Components u Computers & storage at 28 sites (to date) ◊ u Grid 2003 Components u Computers & storage at 28 sites (to date) ◊ u Uniform service environment at each site ◊ ◊ u Pacman installation system enables installation of numerous other VDT and application services Certification & registration authorities, VO membership services, monitoring services Client-side tools for data access & analysis ◊ u Globus Toolkit provides basic authentication, execution management, data movement Global & virtual organization services ◊ u 2800+ CPUs Virtual data, execution planning, DAG management, execution management, monitoring IGOC: i. VDGL Grid Operations Center 72

Grid 2003 Metrics Metric Target Achieved Number of CPUs 400 2762 (28 sites) Number Grid 2003 Metrics Metric Target Achieved Number of CPUs 400 2762 (28 sites) Number of users > 10 102 (16) >4 10 (+CS) Number of sites running concurrent apps > 10 17 Peak number of concurrent jobs 1000 1100 > 2 -3 TB 4. 4 TB max Number of applications Data transfer per day 73

Grid 2003 Applications To Date u CMS proton-proton collision simulation u ATLAS proton-proton collision Grid 2003 Applications To Date u CMS proton-proton collision simulation u ATLAS proton-proton collision simulation u LIGO gravitational wave search u SDSS galaxy cluster detection u ATLAS interactive analysis u BTe. V proton-antiproton collision simulation u Sn. B biomolecular analysis u GADU/Gnare genone analysis u Various computer science experiments www. ivdgl. org/grid 2003/applications 74

Genome Features Annotated Genome Maps Genomes Comparisons Visualization Gene Annotations Annotated Data Sets Visualization Genome Features Annotated Genome Maps Genomes Comparisons Visualization Gene Annotations Annotated Data Sets Visualization Whole genome Analysis And Architecture Module Conserved Chromosomal Gene Clusters Sequence Analysis Module Predictions of Regulation Predictions of New pathways Functions of Hypotheticals Networks Comparisons Gene Functions Assignments Networks Analysis Module Proteomics Experimentation Operons, regulons networks Metabolic Reconstructions (Annotated stoichiometric Matricies) Conjectures about Gene Functions Phenotypes Module Metabolic Simulation Experimentation Metabolic Engineering 75

Grid 2003 Usage 76 Grid 2003 Usage 76

Overview u Part I: Technology, science, & industry u Part II: New modes of Overview u Part I: Technology, science, & industry u Part II: New modes of working ◊ Grid technologies ◊ Upper Atmosphere Research Collaboratory ◊ Network for Earthquake Eng. Simulation ◊ Data-intensive science & virtual data ◊ Grid 2003 Data Grid ◊ Earth System Grid ◊ New modes of research & funding 77

Earth System Grid Goal: address technical obstacles to the sharing & analysis of high-volume Earth System Grid Goal: address technical obstacles to the sharing & analysis of high-volume data from advanced earth system models 78

Under the Covers of ESG 79 Under the Covers of ESG 79

Overview u Part I: Technology, science, & industry u Part II: New modes of Overview u Part I: Technology, science, & industry u Part II: New modes of working ◊ Grid technologies ◊ Upper Atmosphere Research Collaboratory ◊ Network for Earthquake Eng. Simulation ◊ Data-intensive science & virtual data ◊ Grid 2003 Data Grid ◊ Earth System Grid ◊ New modes of research & funding 80

A Powerful New Three-way Alliance is Transforming Science (& Industry) Theory Models & Simulations A Powerful New Three-way Alliance is Transforming Science (& Industry) Theory Models & Simulations → Shared Data Requires much engineering and innovation Computing Systems, Notations Formal Foundation → Architecture, Algorithms Experiment & Advanced Data Collection → Shared Data Changes culture, mores, and behaviours CS as the “new mathematics” – George Djorgovski 81

New Modes of Science New Modes of Funding & Working u Effective Grid-enabled science New Modes of Science New Modes of Funding & Working u Effective Grid-enabled science demands ◊ ◊ Discipline scientists and computer scientists True multidisciplinary teams with time to experiment, make mistakes, build confidence ◊ ◊ u Reward structures that encourage risk taking Projects at a scale to enable real progress Large-scale multidisciplinary projects can ◊ Attract international attention & collaboration ◊ Attract students & show them science is fun ◊ Train new workforce with appropriate skills 82

Creating effective Grids can support requires PQ research models Pasteur’s Quadrant Research Model Yes Creating effective Grids can support requires PQ research models Pasteur’s Quadrant Research Model Yes Bohr Pasteur Focus on New Knowledge Creation? No Edison No Yes Focus on Application? Creation of knowledge: basic, curiositydriven research Application of knowledge Classic Linear Research Model Slide courtesy Dan Atkins, U. Michigan 83

Experimental & Collaborative Approach Experimental procedure: u Mix together, and shake well: ◊ ◊ Experimental & Collaborative Approach Experimental procedure: u Mix together, and shake well: ◊ ◊ u Scientists* with an overwhelming need to pool resources to solve fundamental science problems Computer scientists with a vision of a Grid that will enable virtual communities to share resources Monitor byproducts ◊ ◊ ◊ Heat: sometimes incendiary Light: considerable, in e. Science, computer science, & cyberinfrastructure engineering Operational cyberinfrastructure (hardware & software), with an enthusiastic and knowledgeable user community, and real scientific benefits 84

Problem-Driven, Collaborative Research Methodology Infrastructure Open Software Build Deploy Global Community Apply Design Computer Problem-Driven, Collaborative Research Methodology Infrastructure Open Software Build Deploy Global Community Apply Design Computer Science Analyze Discipline Science 85

Summary u Technologies rapidly reshaping landscape ◊ u Networks, data, computers Distributed, multidisciplinary teams Summary u Technologies rapidly reshaping landscape ◊ u Networks, data, computers Distributed, multidisciplinary teams as locales for creative work ◊ ◊ Human capital pre-eminent Integrate diverse people & other resources to address challenging problems Require infrastructure to participate Grid tools are enabling a broad spectrum of collaborative/distributed computing scenarios

For More Information u Globus Alliance ◊ u Global Grid Forum ◊ u www. For More Information u Globus Alliance ◊ u Global Grid Forum ◊ u www. griphyn. org Background information ◊ u www. ggf. org Gri. Phy. N ◊ u www. globus. org www. mcs. anl. gov/~foster “Collaborating at Speed” ◊ www. internetnz. net. nz/ public/ngi 2 nd Edition www. mkp. com/grid 2 87