d80419a4d0a88d1f6ecf35fdc30804e0.ppt
- Количество слайдов: 99
Grid meets Economics: A Market Paradigm for “Resource Management and Scheduling” for World. Wide Grid Computing Rajkumar Buyya Melbourne, Australia www. buyya. com/ecogrid WW Grid
2
Need Honest Answers! WW Grid n I want to have access to your Grid resources & want to know how many of you are willing to give me access ? (following cases) n I am unable to give you access our Australian machines, but I want to have access to yours! n n n 3 Want to solve academic problems Want to solve business problems I am willing to gift you Kangaroos! (bartering) I am willing to give you access to my machines, if you want. (sharing, but no measure & no Qo. S) I am willing to pay you dollars on usage basis. (economic incentive, market-based, and Qo. S)
Overview A quick glance at today’s Grid computing n Resource Management challenges for next generation Grid computing n A Glance at Approaches to Grid computing. n Grid Architecture for Computational Economy n Economy Grid = Globus + GRACE n Nimrod-G -- Grid Resource Broker Grid n Scheduling Experiments n Case Study: Drug Design Application on Grid Economy n Conclusions Grid n 4 Scheduling Economics
Scalable HPC: Breaking Administrative Barriers & new challenges 2100 2100 ? P E R F O R M A N C E 5 2100 Administrative Barriers • Individual • Group • Department • Campus • State • National • Globe • Inter Planet • Universe Desktop SMPs or Super. Computers Local Cluster Enterprise Cluster/Grid Global Cluster/Grid Inter Planetary Grid!
Why Grids? Large Scale Explorations need them—Killer Applications. n Solving grand challenge applications using modeling, simulation and analysis Aerospace Life Sciences 6 CAD/CAM Digital Biology Internet & Ecommerce Military Applications
7
What is Grid ? n An infrastructure that logically couples distributed resources: Computers – PCs, workstations, clusters, supercomputers, laptops, notebooks, mobile devices, PDA, etc; n Software – e. g. , ASPs renting expensive special data purpose applications on demand; archives n Catalogued data and databases – e. g. transparent access to human genome database; n Special devices – e. g. , radio telescope – SETI@Home searching for life in galaxy. n People/collaborators. q and presents them as an integrated global resource. q It enables the creation of virtual enterprises (VEs) for resource sharing. n 8 Wide area
Grid Applications-Drivers n n n n 9 Distributed HPC (Supercomputing): n Computational science. High-throughput computing: n Large scale simulation/chip design & parameter studies. Content Sharing (free or paid) n Sharing digital contents among peers (e. g. , Napster) Remote software access/renting services: n Application service provides (ASPs). Data-intensive computing: n Data mining, particle physics (CERN), Drug Design. On-demand, realtime computing: n Medical instrumentation & network-enabled solvers. Collaborative: n Collaborative design, data exploration, education.
Building and Using Grids require n n Globus ? n n n 10 Services that make our systems Grid Ready! Security mechanisms that permit resources to be accessed only by authorized users. (New) programming tools that make our applications Grid Ready!. Tools that can translate the requirements of an application/user into the requirements of computers, networks, and storage. Tools that perform resource discovery, trading, selection/allocation, scheduling and distribution of jobs and collects results.
Players in Grid Computing 11
What users want ? Users in Grid Economy & Strategy n Grid Consumers n n n Execute jobs for solving varying problem size and complexity Benefit by selecting and aggregating resources wisely Tradeoff timeframe and cost n n Grid Providers n n n Contribute “idle” resource for executing consumer jobs Benefit by maximizing resource utilisation Tradeoff local requirements & market opportunity n 12 Strategy: minimise expenses Strategy: maximise return on investment
Challenges for Next Generation Grid Technology Development
Sources of Complexity in Resource Management for World Wide Grid Computing n n n n n 14 n Size (large number of nodes, providers, consumers) Heterogeneity of resources (PCs, Workstations, clusters, and supercomputers, instruments, databases, software) Heterogeneity of fabric management systems (single system image OS, queuing systems, etc. ) Heterogeneity of fabric management polices Heterogeneity of application requirements (CPU, I/O, memory, and/or network intensive) Heterogeneity in resource demand patterns (peak, off-peak, . . . ) Applications need different Qo. S at different times (time critical results). The utility of experimental results varies from time to time. Geographical distribution of users & located different time zones Differing goals (producers and consumers have different objectives and strategies) Unsecure and Unreliable environment
Traditional approaches to resource management & scheduling are NOT useful for Grid ? n They use centralised policy that need n n n Due to too many heterogenous parameters in the Grid it is impossible to define/get: n n 15 system-wide performance matrix and common fabric management policy that is acceptable to all. “Economics” paradigm proved to effective institution in managing decentralization and heterogeneity that is present in human economies! n n complete state-information and common fabric management policy or decentralised consensus-based policy. Fall of USSR & Emergence of US as world superpower! (monopoly? ) So, we propose/advocate the use of computational economics principles in management of resources and scheduling computations on world wide Grid. Think locally and act globally approach to grid computing!
Benefits of Computational Economies n n It provides a nice paradigm for managing self interested and selfregulating entities (resource owners and consumers) Helps in regulating supply-and-demand of resources. n n n User-centric / Utility driven Scalable: n n n n 16 n Services can be priced in such a way that equilibrium is maintained. No need of central coordinator (during negotiation) Resources(sellers) and also Users(buyers) can make their own decisions and try to maximize utility and profit. Adaptable, It helps in offering different Qo. S (quality of services) to different applications depending the value users place on them. It improves the utilisation of resources It offers incentive for resource owners for being part of the grid! It offers incentive for resource consumers for being good citizens There is large body of proven Economic principles and techniques available, we can easily leverage it.
New challenges of Computational Economy n Resource Owners n n n n Resource Consumers n n n 17 How do I decide prices ? (economic models? ) How do I specify them ? How do I enforce them ? How do I advertise & attract consumers ? How do I do accounting and handle payments? …. . How do I decide expenses ? How do I express Qo. S requirements ? How I trade between timeframe & cost ? …. Any tools, traders & brokers available to automate the process ?
mix-and-match Object-oriented Internet/partial-P 2 P Network enabled Solvers Market/Computational Economy 18
Many Grid Projects & Initiatives n Australia n n n n Europe n n n Japan n 19 Economy Grid Nimrod-G Virtual Lab Active Sheets DISCWorld. . new coming up UNICORE MOL Lecce GRB Poland MC Broker EU Data Grid Euro. Grid Meta. MPI Dutch DAS XW, Ja. WS and many more. . . n USA n n n Cycle Stealing &. com Initiatives n n Distributed. net SETI@Home, …. Entropia, UD, Parabon, …. Public Forums n Ninf Data. Farm and many more. . . Globus Legion Javelin App. Le. S NASA IPG Condor Harness Net. Solve Access. Grid Gr. ADS and many more. . . n n n Global Grid Forum P 2 P Working Group IEEE TFCC Grid & CCGrid conferences http: //www. gridcomputing. com
Many Testbeds ? & who pays ? , who regulates demand supply ? GUSTO (decommissioned) WW Grid World Wide Grid Legion Testbed 20 NASA IPG
Testbeds so far -- observations n Who contributed resources & why ? n n n How long ? n n Short term: excitement is lost, too much of admin. Overhead (Globus inst+), no incentive, policy change, … What we need ? Grid Marketplace! n 21 Volunteers: for fun, challenge, fame, charismatic apps, public good like distributed. net & SETI@Home projects. Collaborators: sharing resources while developing new technologies of common interest – Globus, Legion, Ninf, MC Broker, Lecce GRB, . . . Unless you know lab. leaders, it is impossible to get access! Regulates supply-and-demand, offers incentive for being players, simple, scalable solution, quasideterministic – proven model in real-world.
Building an Economy Grid (Next Generation Grid Computing!) 22 To enable the creation of: Grid Marketplace (competitive) ASP Service Oriented Computing. . . And let users focus on their own work (science, engineering, or commerce)!
GRACE: A Reference Grid Architecture for Computational Economy Grid Bank Grid Explorer Application Job Control Agent Qo. S Grid Node 1 Grid Resource Broker Trading … Deployment Agent Job. Exec Misc. services Resource Allocation Storage Grid Middleware Services Accounting Resource Reservation R 1 See PDPTA 2000 paper! 23 Pricing Algorithms Trade Server Trade Manager Grid User Grid Node N Secure Schedule Advisor Health Monitor … Info ? Information Server(s) … Sign-on Grid Market Services R 2 … Rm Grid Service Providers
Economic Models for Trading n n n Commodity Market Model Posted Prices Models Bargaining Model Tendering (Contract Net) Model Auction Model n n English, first-price sealed-bid, second-price sealedbid (Vickrey), and Dutch (consumer: low, high, rate; producer: high, low, rate) Proportional Resource Sharing Model Shareholder Model Partnership Model See 24 SPIE ITCom 2001 paper!: with Heinz Stockinger, CERN!
Grid Components Applications and Portals Scientific … Prob. Solving Env. Collaboration Engineering Development Environments and Tools Languages Libraries Debuggers Monitoring Resource Brokers Web enabled Apps … Distributed Resources Coupling Services Security Information Process Resource Trading Market Info Web tools … Qo. S Grid Apps. Grid Tools Grid Middleware Local Resource Managers Operating Systems Queuing Systems Libraries & App Kernels … TCP/IP & UDP Networked Resources across Organisations 25 Computers Clusters Storage Systems Data Sources … Scientific Instruments Grid Fabric
Economy Grid = Globus + GRACE Applications Science Engineering Commerce … Portals High-level Services and Tools DUROC MPI-G Heartbeat Monitor Nexus MDS Condor LSF CC++ GASS GRD PBS DUROC QBank e. Cash See 26 IPDPS HWC 2001 paper! … Active. Sheet Grid Status Nimrod/G globusrun Grid Apps. Grid Tools Core Services Globus Security Interface Local Services GRACE-TS GRAM GARA GMD GBank JVM TCP UDP Linux Irix Solaris Grid Middleware Grid Fabric
GRACE components n n n n 27 A resource broker (e. g. , Nimrod/G) Various resource trading protocols for different economic models A mediator for negotiating between users and grid service providers (Grid Market Directory) A deal template for specifying resource requirements and services offers Grid Trading Server Pricing policy specification Accounting (e. g. , QBank) and payment management (Grid. Bank, not yet implemented)
Grid Open Trading Protocols Trade Manager Trade Server Get Connected Call for Bid(DT) Reply to Bid (DT) Pricing Rules Negotiate Deal(DT) API …. Confirm Deal(DT, Y/N) Cancel Deal(DT) Change Deal(DT) Get Disconnected 28 DT - Deal Template - resource requirements (BM) - resource profile (BS) - price (any one can set) - status - change the above values - negotiation can continue - accept/decline - validity period
Pricing, Accounting, Allocations and Job Scheduling Flow @ each site/Grid Level 0 Pricing Policy 2 0 1 Trade Server 3 4 DB@Each Site QBank 5 8 Resource Manager IBM-LL/PBS/…. 6 29 GRID Bank (digital transactions) 7 Compute Resources clusters/SGI/SP/. . . 0. Make Deposits, Transfers, Refunds, Queries/Reports 1. Clients negotiates for access cost. 2. Negotiation is performed per owner defined policies. 3. If client is happy, TS informs QB about access deal. 4. Job is Submitted 5. Check with QB for “go ahead” 6. Job Starts 7. Job Completes 8. Inform QB about resource utilization.
Service Items to be Charged n n CPU - User and System time Memory: n n n n 30 maximum resident set size - page size amount of memory used page faults: with/without physical I/O Storage: size, r/w/block IO operations Network: msgs sent/received Signals received, context switches Software and Libraries accessed Data Sources (e. g. Protein Data Bank)
How to decide Price ? n n n n 31 Fixed price model (like today’s Internet) Dynamic/Demand Supply (like tomorrow’s Internet) Usage Period Loyalty of Customers (like Airlines favoring frequent flyers!) Historical data Advance Agreement (high discount for corporations) Usage Timing (peak, off-peak, lunch time) Calendar based (holiday/vacation period) Bulk Purchase (register 100. com domains at once!) Voting -- trade unions decide pricing structure Resource capability as benchmarked in the market! Academic R&D/public-good application users can be offered at cheaper rate compared to commercial use. Customer Type – Quality or price sensitive buyers. Can be Prescribed by Regulating (Govt. ) authorities
Payments- Options & Automation n n Buy credits in advance / GSPs bill the user later--”pay as you go” Pay by Electronic Currency via Grid Bank n Net. Cash (anonymity), Net. Cheque, and Paypal n Net. Cheque: - http: //www. isi. edu/gost/info/netcash/ n n Net. Cash - http: //www. isi. edu/gost/info/netcheque/ n n It supports anonymity and it uses the Net. Cheque system to clear payments between currency servers. Paypal. com– account+email is linked to credit card. n n 32 Users register with NC accounting servers, can write electronic cheques and send (e. g email). When deposited, balance is transferred from sender to receiver account. Enter the recipient’s email address and the amount you wish to request. The recipient gets an email notification and pays you at www. Pay. Pal. com
Nimrod-G: The Grid Resource Broker Soft Deadline and Budget-based Economy Grid Resource Broker for Parameter Processing on P 2 P Grids
Parametric Computing (What Users think of Nimrod Power) Parameters Magic Engine Multiple Runs Same Program Multiple Data Courtesy: Anand Natrajan, University of Virginia 34 Killer Application for the Grid! See IPDPS 2000 paper!
P-study Applications -Characteristics n n n 35 Code (Single Program: sequential or threaded) High Resource Requirements Long-running Instances Numerous Instances (Multiple Data) High Computation-to-Communication Ratio Embarrassingly/Pleasantly Parallel
Sample P-Sweep Applications Bioinformatics: Drug Design / Protein Modelling Sensitivity experiments on smog formation Ecological Modelling: Combinatorial Control Strategies Optimization: for Cattle Tick Meta-heuristic Data Mining parameter estimation Computer Graphics: Ray Tracing High Energy Physics: Searching for Rare Events VLSI Design: Finance: SPICE Simulations Investment Risk Analysis Civil Engineering: Building Design Automobile: Crash Simulation 36 Electronic CAD: Field Programmable Gate Arrays Network Simulation Aerospace: Wing Design astrophysics
Thesis n n Perform parameter sweep (bag of tasks) (utilising distributed resources) within “T” hours or early and cost not exceeding $M. Three Options/Solutions: n n n 37 Using pure Globus commands Build your own Distributed App & Scheduler Use Nimrod-G (Resource Broker)
Executing Remotely Choose Resource Transfer Input Files Set Environment Start Process Pass Arguments Monitor Progress Read/Write Intermediate Files Transfer Output Files 38 +Resource Discovery, Trading, Scheduling, Predictions, Rescheduling, . . . Summary View Job View Event View
Using Pure Globus commands Do all yourself! (manually) Total Cost: $? ? ? 39
Build Distributed Application & Scheduler Build App case by case basis Complicated Construction 40 E. g. , App. Le. S/MPI based Total Cost: $? ? ?
Use Nimrod-G Aggregate Job Submission Aggregate View 41 Submit & Play!
Nimrod & Associated Family of Tools Remote Execution Server (on demand Nimrod Agent) P-sweep App. Composition: Nimrod/Enfusion Resource Management and Scheduling: Nimrod-G Broker Design Optimisations: Nimrod-O App. Composition and Online Visualization: Active Sheets Grid Simulation in Java: Grid. Sim Drug Design on Grid: Virtual Lab Upcoming? : HEPGrid (+U. Melbourne), GAVE(+Rutherford Appleton Lab) 42 Grid (Un)Aware Virtual Engineering File Transfer Server
Nimrod/G : A Grid Resource Broker n n n 43 A resource broker for managing and steering task farming (parametric sweep) applications on computational Grids based on deadline and computational economy. Key Features n A single window to manage & control experiment n Resource Discovery n Resource Trading n Scheduling & Predications n Transportation of data & results n Steering & data management It allows to study the behaviour of some of the output variables against a range of different input scenarios.
A Glance at Nimrod-G Broker Nimrod/G Client Nimrod/G Engine Schedule Advisor Trading Manager Grid Store Grid Dispatcher Grid Explorer Grid Middleware Globus, Legion, Condor, etc. TM TS GE GIS Grid Information Server(s) RM & TS G L G Globus enabled node. See 44 HPCAsia 2000 paper! RM & TS L C Legion enabled node. G RM: Local Resource Manager, TS: Trade Server C L Condor enabled node.
Nimrod/G Grid Broker Architecture Legacy Applications Customised Apps (Active Sheet) P-Tools (GUI/Scripting) (parameter_modeling) Monitoring and Steering Portals Nimrod Clients XML? Farming Engine Programmable Entities Management Resources IP hourglass ? Jobs Agent. Scheduler Tasks Agents Globus Computers 45 Legion-A Legion Channels . . . Algorithm. N Job. Server . . . Grid Explorer Database (Postgres) Nimrod Broker Trading Manager P 2 P-A Condor Local Schedulers PC/WS/Clusters Algorithm 1 Schedule Advisor Dispatcher & Actuators Globus-A Meta-Scheduler XML P 2 P . . . Storage Condor/LL/Mosix/ GTS Networks Database . . . GMD . . . G-Bank Instruments Radio Telescope Middleware Fabric
Deadline A Nimrod/G Monitor Cost Legion hosts Globus Hosts Bezek is in both Globus and Legion Domains 46
User Requirements: Deadline/Budget 47
Active Sheet: Spreadsheet Processing on Grid Nimrod Proxy Nimrod/G See 48 HPC 2001 paper!
49
Nimrod/G Interactions Resource Discovery Grid Info servers Scheduler Farming Engine Grid Trade Server Dispatcher Process server I/O server Resource allocation (local) Queuing System Nimrod Agent User process File access “Do this in 30 min. for $10? ” 50 Root node Gatekeeper node Computational node
Adaptive Scheduling Algorithms See HPDC AMS 2001 paper! Discover Establish Resources Rates Distribute Jobs 51 Compose & Schedule Discover More Resources Evaluate & Reschedule Meet requirements ? Remaining Jobs, Deadline, & Budget ?
Cost Model n n Without cost ANY shared system becomes un-managable Charge users more for remote facilities than their own Choose cheaper resources before more expensive ones Cost units (G$) may be n n 52 n Dollars Shares in global facility Stored in bank
Cost Matrix @ Grid site X n Machine 5 Machine 1 n Non-uniform costing Encourages use of local resources first User 1 1 Real accounting system can control machine usage User 5 2 n 1 3 Resource Cost = Function (cpu, memory, disk, network, software, Qo. S, current demand, etc. ) 53 Simple: price based on peaktime, offpeak, discount when less demand, . .
Deadline and Budget-based Cost Minimization Scheduling 1. 2. 3. 54 Sort resources by increasing cost. For each resource in order, assign as many jobs as possible to the resource, without exceeding the deadline. Repeat all steps until all jobs are processed.
Deadline-based Costminimization Scheduling n n M - Resources, N - Jobs, D - deadline Note: Cost of any Ri is less than any of Ri+1 …. Or Rm n n n Ct - Time when accessed (Time now) Ti - Job runtime (average) on Resource i (Ri) [updated periodically] n n RL: Resource List need to be maintained in increasing order of cost Ti is acts as a load profiling parameter. Ai - number of jobs assigned to Ri , where: n Ai = Min (No. Unassigned Jobs, No. Jobs Ri can complete by remaining deadline) n n n ALG: Invoke Job Assignment() periodically until all jobs done. n Job Assignment()/Reassignment(): n n 55 No. Un. Assigned. Jobsi = Diff( N, (A 1+…+Ai-1)) Jobs. Ri consume = Remaining. Time (D- Ct) DIV Ti Establish ( RL, Ct , Ti , Ai ) dynamically – Resource Discovery. For all resources (I = 1 to M) { Assign Ai Jobs to Ri , if required}
Deadline and Budget-based Time Minimization Scheduling 1. 2. 3. 4. 56 For each resource, calculate the next completion time for an assigned job, taking into account previously assigned jobs. Sort resources by next completion time. Assign one job to the first resource for which the cost per job is less than the remaining budget per job. Repeat all steps until all jobs are processed. (This is performed periodically or at each scheduling-event. )
Deadline and Budget-based Time+Cost Min. Scheduling 1. 2. 3. 4. 57 Split resources by whether cost per job is less than budget per job. For the cheaper resources, assign jobs in inverse proportion to the job completion time (e. g. a resource with completion time = 5 gets twice as many jobs as a resource with completion time = 10). For the dearer resources, repeat all steps (with a recalculated budget per job) until all jobs are assigned. [Schedule/Reschedule] Repeat all steps until all jobs are processed.
Evaluation of Scheduling Heuristics A Hypothetical Application on WW Grid World Wide Grid
World Wide Grid (WWG) WW Grid Australia North America ANL: SGI/Sun/SP 2 USC-ISI: SGI UVa: Linux Cluster UD: Linux cluster UTK: Linux cluster Monash Uni. : Nimrod/G Linux cluster Globus+Legion GRACE_TS Solaris WS Globus/Legion GRACE_TS Asia/Japan WW Grid Internet Tokyo I-Tech. : ETL, Tuskuba Linux cluster Globus + GRACE_TS Chile: Cluster 59 Globus + GRACE_TS South America Europe ZIB/FUB: T 3 E/Mosix Cardiff: Sun E 6500 Paderborn: HPCLine Lecce: Compaq SC CNR: Cluster Calabria: Cluster CERN: Cluster Pozman: SGI/SP 2 Globus + GRACE_TS
Experiment-1 Setup n Workload: n n Deadline: 1 hrs. and budget: 800, 000 units Strategy: minimise cost and meet deadline Execution Cost with cost optimisation n n 60 165 jobs, each need 5 minute of cpu time AU Peaktime: 471205 (G$) AU Offpeak time: 427155 (G$)
Resources Selected & Price/CPU-sec. Resource Type & Size Grid services Peaktime Cost (G$) Offpeak cost Linux cluster (60 nodes) Monash, Australia Globus/Condor 20 5 IBM SP 2 (80 nodes) ANL, Chicago, US Globus/LL 5 10 Sun (8 nodes) ANL, Chicago, US Globus/Fork 5 10 SGI (96 nodes) ANL, Chicago, US Globus/Condor-G 15 15 SGI (10 nodes) 61 Owner and Location ISI, LA, US Globus/Fork 10 20
Execution @ AU Peak Time 62
Execution @ AU Offpeak Time 63
AU peak: Resources/Cost in Use After the calibration phase, note the difference in pattern of two graphs. This is when scheduler stopped using expensive resources. 64
AU offpeak: Resources/Cost in Use 65
Experiment-2 Setup n Workload: n n Deadline: 2 hrs. and budget: 396000 units Strategy: minimise time / cost Execution Cost with cost optimisation n n 66 165 jobs, each need 5 minute of CPU time Optimise Cost: 115200 (G$) (finished in 2 hrs. ) Optimise Time: 237000 (G$) (finished in 1 hr. ) In this experiment: Time-optimised scheduling run costs double that of Cost-optimised. Users can now trade-off between Time Vs. Cost.
Resources Selected & Price/CPU-sec. Resource & Location Linux Cluster-Monash, Melbourne, Australia Globus, GTS, Condor 2 64 153 Linux-Prosecco-CNR, Pisa, Italy Globus, GTS, Fork 3 7 1 Linux-Barbera-CNR, Pisa, Italy Globus, GTS, Fork 4 6 1 Solaris/Ultas 2 TITech, Tokyo, Japan Globus, GTS, Fork 3 9 1 SGI-ISI, LA, US Globus, GTS, Fork 8 37 5 Sun-ANL, Chicago, US Globus, GTS, Fork 7 Total Experiment Cost (G$) 42 237000 4 115200 Time to Complete Exp. (Min. ) 67 Grid services & Fabric Cost/CPU No. of Jobs Executed sec. or Time_Opt Cost_Op unit t. 70 119
Scheduling for Time Optimization 68
Scheduling for Cost Optimization 69
Application Case Study The Virtual Laboratory Project: "Molecular Modelling for Drug Design" on Peer-to-Peer Grid
Drug Design: Data Intensive Computing on Grid n n n A Virtual Laboratory for “Molecular Modelling for Drug Design” on Peer-to-Peer Grid. It provides tools for examining millions of chemical compounds (molecules) in the Protein Data Bank (PDB) to identify those having potential use in drug design. In collaboration with: n Kim Branson, Structural Biology, Walter and Eliza Hall Institute (WEHI) 71 http: //www. csse. monash. edu. au/~rajkumar/dd@home/
Design. Drug@Home Architecture A Virtual Lab for “Molecular Modeling for Drug Design” on P 2 P Grid Market Directory Data Replica Catalogue ? ” “Give me list PDBs sources Of type aldrich_300? ” v er “s “Screen 2 K molecules in 30 min. for $10” Resource Broker (RB maps suitable Grid nodes and Protein Data. Bank) e ic t os c ” rs? e id e ov pr ic erv PDB 1 72 GTS “s “mol. 5 please? ” “g et m ol. 10 fro m GTS pd b 1 & PDB 2 sc re ase? ” “mol. 10 ple GTS Grid Info. Service GTS en it. ” GTS (GTS - Grid Trade Server)
Software Tools n n n n 73 Molecular Modelling Tools (DOCK) Parameter Modelling Tools (Nimrod/en. Fusion) Grid Resource Broker (Nimrod-G) Data Grid Broker Protein Data Bank (PDB) Management and Intelligent Access Tools n PDB databse Lookup/Index Table Generation. n PDB and associated index-table Replication. n PDB Replica Catalogue (that helps in Resource Discovery). n PDB Servers (that serve PDB clients requests). n PDB Brokering (Replica Selection). n PDB Clients for fetching Molecule Record (Data Movement). Grid Middleware (Globus and Gr. ACE) Grid Fabric Management (Fork/LSF/Condor/Codine/…)
DOCK code* (Enhanced by WEHI, U of Melbourne) n n n n A program to evaluate the chemical and geometric complementarities between a small molecule and a macromolecular binding site. It explores ways in which two molecules, such as a drug and an enzyme or protein receptor, might fit together. Compounds which dock to each other well, like pieces of a three -dimensional jigsaw puzzle, have the potential to bind. So, why is it important to able to identify small molecules which may bind to a target macromolecule? A compound which binds to a biological macromolecule may inhibit its function, and thus act as a drug. Thus disabling the ability of (HIV) virus attaching itself to molecule/protein! With system specific code changed, we have been able to compile it for Sun-Solaris, PC Linux, SGI IRIX, Compaq Alpha/OSF 1 * 74 Original Code: University of California, San Francisco: http: //www. cmpharm. ucsf. edu/kuntz/
Dock input file score_ligand minimize_ligand multiple_ligands random_seed anchor_search torsion_drive clash_overlap conformation_cutoff_factor torsion_minimize match_receptor_sites random_search. . . maximum_cycles ligand_atom_file receptor_site_file score_grid_prefix vdw_definition_file chemical_score_file flex_definition_file flex_drive_file ligand_contact_file ligand_chemical_file ligand_energy_file 75 yes no 7 no yes 0. 5 3 yes no yes 1 S_1. mol 2 ece. sph ece parameter/vdw. defn parameter/chem_score. tbl parameter/flex. defn parameter/flex_drive. tbl dock_cnt. mol 2 dock_chm. mol 2 dock_nrg. mol 2 Molecule to be screened
Parameterized Dock input file 76 score_ligand minimize_ligand multiple_ligands random_seed anchor_search torsion_drive clash_overlap conformation_cutoff_factor torsion_minimize match_receptor_sites random_search. . . maximum_cycles ligand_atom_file receptor_site_file score_grid_prefix vdw_definition_file chemical_score_file flex_definition_file flex_drive_file ligand_contact_file ligand_chemical_file ligand_energy_file $score_ligand $minimize_ligand $multiple_ligands $random_seed $anchor_search $torsion_drive $clash_overlap $conformation_cutoff_factor $torsion_minimize $match_receptor_sites $random_search Molecule to be screened $maximum_cycles ${ ligand_number}. mol 2 $HOME/dock_inputs/${receptor_site_file} $HOME/dock_inputs/${score_grid_prefix} vdw. defn chem_score. tbl flex. defn flex_drive. tbl dock_ cnt. mol 2 dock_ chm. mol 2 dock_ nrg. mol 2
Dock Plan. File (contd. ) parameter database_name label "database_name" text select oneof "aldrich" "maybridge_300" "asinex_egc" "asinex_epc" "asinex_pre" "available_chemicals_directory" "inter_ bioscreen_s" "inter_bioscreen_n_300" "inter_bioscreen_n_500" "biomolecular_research_institute" "molecular_science" "molecular_diversity_preservation" "national_cancer_institute" "IGF_HITS" "aldrich_300" "molecular_science_500" "APP" "ECE" default "aldrich_300"; parameter score_ligand text default "yes"; parameter minimize_ligand text default "yes"; parameter multiple_ligands text default "no"; parameter random_seed integer default 7; parameter anchor_search text default "no"; parameter torsion_drive text default "yes"; parameter clash_overlap float default 0. 5; parameter conformation_cutoff_factor integer default 5; parameter torsion_minimize text default "yes"; parameter match_receptor_sites text default "no"; parameter random_search text default "yes"; . . . parameter maximum_cycles integer default 1; parameter receptor_site_file text default " ece. sph"; parameter score_grid_prefix text default " ece"; parameter ligand_number integer range from 1 to 200 step 1; Molecules to be screened 77
Dock Plan. File task nodestart copy. /parameter/vdw. defn node: . copy. /parameter/chem_score. tbl node: . copy. /parameter/flex. defn node: . copy. /parameter/flex_drive. tbl node: . copy. /dock_inputs/get_molecule node: . copy. /dock_inputs/dock_base node: . endtask main node: substitute dock_base dock_run node: substitute get_molecule_fetch node: execute sh. /get_molecule_fetch node: execute $HOME/bin/dock. $OS -i dock_run -o dock_out copy node: dock_out. /results/dock_out. $ jobname copy node: dock_cnt. mol 2. /results/dock_cnt. mol 2. $jobname copy node: dock_chm. mol 2. /results/dock_chm. mol 2. $jobname copy node: dock_nrg. mol 2. /results/dock_nrg. mol 2. $jobname endtask 78
Nimrod/Turbo. Linux en. Fuzion GUI tools for Parameter Modeling 79
Docking Experiment Preparation n Setup PDB Data. Grid n n n Create Docking Grid. Score (receptor surface details) for a given receptor on home node. Pre-Staging Large Files required for Docking: n n n 80 Index PDB databases Pre-stage (all) Protein Data Bank (PDB) on replica sites Start PDB Server Pre-stage Dock executables and PDB access client on Grid nodes, if required (e. g. , dock. Linux, dock. Sun. OS, dock. IRIX 64, and dock. OSF 1 on Linux, Sun, SGI, and Compaq machines respectively). Use globus-rcp. Pre-stage/Cache all data files (~3 -13 MB each) representing receptor details on Grid nodes. This can be done demand by Nimrod/G for each job, but few input files are too large and they are required for all jobs). So, prestaging/caching at http-cache or broker level is necessary to avoid the overhead of copying the same input files again and again!
Protein Data Bank n n n 81 Databases consist of small molecules from commercially available organic synthesis libraries, and natural product databases. There is also the ability to screen virtual combinatorial databases, in their entirety. This methodology allows only the required compounds to be subjected to physical screening and/or synthesis reducing both time and expense.
Target Testcase n n 82 The target for the test case: electrocardiogram (ECE) endothelin converting enzyme. This is involved in “heart stroke” and other transient ischemia. Is·che·mi·a : A decrease in the blood supply to a bodily organ, tissue, or part caused by constriction or obstruction of the blood vessels.
“Screen 2 K molecules in 30 min. for $10” Data. Grid Brokering Nimrod/G Computational Grid Broker 1 “Screen mol. 5 please? ” Algorithm 1 PDB Broker Algorithm. N “advise PDB source? “process & send results” 7 Data Replica Catalogue . . . 2 5 3 “PDB replicas please? ” 4 “selection & advise: use GSP 4!” “Is GSP 4 healthy? ” 6 “mol. 5 please? ” PDB 2 PDB Service 83 GSP 1 GSP 2 Grid Info. Service GSP 3 (Grid Service Provider) GSP 4 PDB Service GSPm GSPn
Nimrod/G in Action: Screening on World-Wide Grid 84
Any Scientific Discovery ? Did your collaborator invent new drug for xxxx? Not Yet Anyway, checkout the announcement of Nobel-prize winners for next year 85 ?
Conclude with a comparison with the Electrical Grid………. . Where we are ? ? Courtesy: Domenico Laforenza
Alessandro Volta in Paris in 1801 inside French National Institute shows the battery while in the presence of Napoleon I Fresco by N. Cianfanelli (1841) (Zoological Section "La Specula" of National History Museum of Florence University)
What ? !? ! Oh, mon Dieu ! This is a mad man… 88 …. and in the future, I imagine a worldwide Power (Electrical) Grid …. . .
2001 - 1801 = 200 Years 89
Can we Predict its Future ? ” I think there is a world market for about five computers. ” Thomas J. Watson Sr. , IBM Founder, 1943 90
What Enron, World Leader in Power and Natural Gas Distribution Business, Think of Economy Grid!. . . ---- Original Message -------Subject: Your papers on Economics & Grid allocation Date: Wed, 14 Mar 2001 12: 10: 20 -0800 From: Lance_Norskog@enron. net To: rajkumar@csse. monash. edu. au, davida@csse. monash. edu. au, jon@csse. monash. edu. au Hello. I am researching mass computation infrastructures. The company I work for, Enron, is a worldwide commodity company. Our business model is to find mid-sized commodity markets that don't work like large-scale commodity markets (like wheat, gold, orange juice, etc. ) and to restructure them. The division I work in is working to do this for long-distance fiber optic bandwidth. Other divisions are pursuing metals, paper pulp, etc. (We even have "weather derivatives", which is a betting parlor for industries that depend far too much on hot or cold weather. Somehow, this is legal!) So, my perspective on mass computation is from the point of view of how to make it a large-scale market. 91 The papers you have published and are presenting concentrate on the development of a "spot" market for the commodity of processor time. This is only part of the economic picture. ( I just found " Calender based" among your list of bidding parameters, but your analysis seems to be very oriented to the spot market. )
What Enron, World Leader in Power and Natural Gas Distribution Business, Think of Economy Grid!. . . Our customers are large corporations. They have a need not to solve some particular problem now and then, but every day. For example, take a department store chain that routes one million different items from warehouses into department stores every day, and wants to do it in a near-optimal way. They need to run this computation, with different input vectors, every business day! If that company is to commit to running its business with the Grid , it needs a few guarantees around its Grid use: 1) complete reliability and availability 2) a known price for every use, set far in advance 3) an active spot market in case its regular supplier fails 4) enforceable contracts guaranteeing quality of service This last is a killer: the quantity of processing power you get can't be squishy. It has to be a measureable unit: "Java MIPS" is computer-driven. Number 2, a known price for every use, is also missing from your analysis. A large, active, "liquid" commodity market can supply not merely spot market purchases, but also future needs at a price fixed today: "6 million MIPS from midnight to 3 AM every day for six months, starting July 1" will cost me this much per month. I can lock down my spreadsheet and know that I won't be driven out of business by a temporary shortage, because my supply is fully guaranteed. 92
What Enron, World Leader in Power and Natural Gas Distribution Business, Think of Economy Grid!. . . Large commodity markets have the vast majority of their product change hands under such long-term agreements, rather than on the spot market. The strategy is to buy part of your needs in long-term contracts, part in medium-term contracts, and most of the rest on short-term, just to avoid getting stuck with 5 years from now with vast amounts of a commodity you don't want. Every day you might have a missing 3 -5% that you need to buy on the spot market. (I'm in California. We're having so much trouble because our utilities were banned from buying a heterogeneous basket of contracts and were required to buy all electricity on the spot market. ) I think my point is that while economics is a fine metaphor for making operational decisions in scheduling resources, those numbers will not be visible to end customers. Instead, the customers will buy blocks of resource for future delivery with pricing based on standard macro-economic factors like interest rates, falling machine prices, rising electricity The producers will use your economic-based techniques to direct their day-to-day operations and to make the interproducer spot market function. prices, etc. Lance Norskog Sr. Software Engineer 93 Enron Broadband Systems ---------------------------------------------------
Conclusions n n n 94 Grid Computing is emerging as a next generation computing platform. The use of economics paradigm for management of resources in Grid Computing is essential to push Grid into mainstream computing! Adaptive, scalable, and easy-to-use systems and tools are essential to make end-users life easier. It is projected that the impact of World-Wide Grid on 21 st century economy will be similar to the impact made by electric power grid on the 20 th century economy. To achieve this goal, in my humble opinion, Use Nimrod Family of Tools (not to mention Nimrod-G Broker) along with Globus, of course! Enjoy excitements of World-Wide Grid Computing!
Download Software & Information n Nimrod & Parameteric Computing: n n Economy Grid & Nimrod/G: n n http: //www. buyya. com/dd@home/ Grid Simulation (Java based): n n http: //www. buyya. com/ecogrid/ Virtual Laboratory/Design. Drug@Home: n n http: //www. csse. monash. edu. au/~davida/nimrod/ http: //www. buyya. com/gridsim/ World Wide Grid testbed: n n http: //www. buyya. com/ecogrid/wwg/ Looking for new volunteers to grow n n 95 Please contact me to barter your & our machines! Want to build on our work/collaborate: n Talk to me now or email: rajkumar@csse. monash. edu. au
Acknowledgements n Special thanks to the following colleagues for sharing ideas/works: n n n Unable to mention all names explicitly here, however, their efforts are recognized by featuring their work in this presentation. n Colleagues from Asia/Japan, Europe: Italy, Germany, Swiss, Poland, UK, US, & Chille for providing access to their machines-->WWG testbed. n 96 David Abramson, Monash University Jack Dongarra, University of Tennessee Wolfgang Gentzsch, Sun Microsystems Jon Giddy, DSTC @ Monash University Domenico Laforenza, CNR/CNUCE, Italy Globus Team!
Thank You… Any ? ? 97
Further Information n Books: n n n IEEE Task Force on Cluster Computing n n 98 www. gridforum. org IEEE/ACM CCGrid’xy: www. ccgrid. org n n http: //www. ieeetfcc. org Global Grid Forum n n High Performance Cluster Computing, V 1, V 2, R. Buyya (Ed), Prentice Hall, 1999. The GRID, I. Foster and C. Kesselman (Eds), Morgan-Kaufmann, 1999. CCGrid 2002, Berlin: ccgrid 2002. zib. de Grid workshop - www. gridcomputing. org
Further Information n Cluster Computing Info Centre: n n Grid Computing Info Centre: n n http: //computer. org/dsonline/gc Compute Power Market Project n 99 http: //www. gridcomputing. com IEEE DS Online - Grid Computing area: n n http: //www. buyya. com/cluster/ http: //www. Compute. Power. com


