b9e113e741efe06396441e37238fcb0c.ppt
- Количество слайдов: 24
An Agent Based, Dynamic Service System to Monitor, Control and Optimize Distributed Systems February 2006 Iosif Legrand California Institute of Technology 1 February 2006 Iosif Legrand
The Mon. ALISA Framework Ø Mon. ALISA is a Dynamic, Distributed Service System capable to collect any type of information from different systems, to analyze it in near real time and to provide support for automated control decisions and global optimization of workflows in complex grid systems. Ø The Mon. ALISA system is designed as an ensemble of autonomous multi-threaded, self-describing agent-based subsystems which are registered as dynamic services, and are able to collaborate and cooperate in performing a wide range of monitoring tasks. These agents can analyze and process the information, in a distributed way, and to provide optimization decisions in large scale distributed applications. 2 February 2006 Iosif Legrand
Mon. ALISA is A Dynamic, Distributed Service Architecture Ø The framework is based on a hierarchical structure of loosely coupled agents acting as distributed services which are independent & autonomous entities able to discover themselves and to cooperate using a dynamic set of proxies or self describing protocols. Ø An agent-based architecture provides the ability to invest the system with increasing degrees of intelligence; to reduce complexity and make global systems manageable in real time. For an effective use of distributed resources, these services provide adaptability and self-organization. 3 February 2006 Iosif Legrand
Mon. ALISA service & Data Handling n ery r ist g Re Data Cache Service & DB io at ov WSDL SOAP Postgres My. SQL sc Client (other service) Web client Data Stores Lookup Service Di WEB Service Lookup Service Communications via the ML Proxy data Mon. ALSIA Service Client (other service) Java Predicates & Agents Applications Configuration Control (SSL) User defined loadable Modules to write /sent data 4 February 2006 Iosif Legrand
The Mon. ALISA Discovery System & Services Fully Distributed System with no Single Point of Failure Clients , HL services repositories Proxies AGENTS Mon. ALISA services Global Services or Clients Dynamic load balancing Scalability & Replication Security AAA for Clients Distributed System for gathering and Analyzing Information. Distributed Dynamic Network of JINI-LUSs Discovery- based on a lease Mechanism and REN Secure & Public 5 February 2006 Iosif Legrand
Monitoring Internet 2 backbone Network u Test for a Land Speed Record u ~ 7 Gb/s in a single TCP stream from Geneva to Caltech 6 February 2006 Iosif Legrand
The Ultra. Light Network BNL ESnet IN /OUT 7 February 2006 Iosif Legrand
Monitoring Network Topology Latency, Routers NETWORKS ROUTERS AS 8 February 2006 Iosif Legrand
Monitoring The GLORIAD Ring 9 February 2006 Iosif Legrand
Monitoring Grid sites, Running Jobs, Network Traffic, and Connectivity JOBS TOPOLOGY ACCOUNTING 10 February 2006 Iosif Legrand
Monitoring OSG: Resources, Jobs & Accounting Running Jobs Accounting 42 SITES ~ 4 000 Nodes ( 10 000 CPUs) Thousands of Jobs 60 000 parameters 11 February 2006 Iosif Legrand
FTP Data Transfer between GRID sites Total FTP Traffic per VO 12 February 2006 Iosif Legrand
Bandwidth Challenge at SC 2005 151 Gbs ~ 500 TB Total in 4 h 13 February 2006 Iosif Legrand
End User / Client Agent LISA- Localhost Information Service Agent v v Authorization Service discovery Local detection of the hardware and software configuration Complete end-system monitoring: Per-process load, I/O and network throughputs, etc. v End-to-end performance measurements v Will act as an active listener for all events related with the requests generated by its local applications. 14 February 2006 Iosif Legrand
Host Monitoring at SC 2005 u u u Many “network” problems are actually endhost problems: misconfigured or underpowered end-systems Network Device Information TCP Settings Host/System Information designed to monitor the The LISA application was endhost and its view of the network. For SC|05 we developed we used LISA to gather the relevant host details related to network performance Information on the system information, TCP configuration and network device setup was gathered and accessible from one site. Future plans are to coordinate this with LISA and deploy this as part of OSG. The Tier-2 centers are a primary target. 15 February 2006 Iosif Legrand
Available Bandwidth Measurements Embedded Pathload module. 16 February 2006 Iosif Legrand
Coordination Service for Available Bandwidth Measurements u u u Enforces measurement fairness Avoids multiple probes on shared network segments Dynamic configuration of measurements timing Logs events Provides service redundancy by using a masterslave model 17 February 2006 Iosif Legrand
Monitoring the Execution of Jobs and the Time Evolution SPLIT JOBS LIFELINES for JOBS Summit a Job Job DAG 18 Job 1 Job 2 Job 31 Job 32 February 2006 Iosif Legrand
Ap. Mon – Application Monitoring Library of APIs (C, C++, Java, Perl. Python) that can be used to send any information to Mon. ALISA services ØFlexibility, Ø Ø dynamic configuration, high communication performance Automated system monitoring Accounting information A IS L n. A sts o M ho APPLICATION Config Servlet dynamic reloading App. Monitoring Time; IP; proc. ID parameter 1: value parameter 2: value UDP/XDR Monitoring Data Ap. Mon. ALISA Service . . . APPLICATION App. Monitoring Mbps_out: 0. 52 Status: reading MB_inout: 562. 4 No Lost Packages 19 System Monitoring load 1: 0. 24 processes: 97 pages_in: 83 February 2006 Ap. Mon Config UDP/XDR Monitoring Data Mon. ALISA Service Ap. Mon configuration generated automatically by a servlet / CGI script Iosif Legrand
Mon. ALISA agents to create on demand on an optical path or tree Discovery & Secure Connection 2 ML Agent Mon. ALISA ML Demon Optical Switch 1 3 Control and Monitor the switch Optical Switch ML Agent Mon. ALISA Runs a ML Demon >ml_path IP 1 IP 4 “copy file IP 4” Time to create a path on demand <1 s independent of the location and the number of connections 4 ML proxy services used in Agent Communication 20 February 2006 Iosif Legrand
Monitoring and Controlling Optical Planes Controlling Port power monitoring 21 February 2006 Iosif Legrand
Monitoring Optical Switches Agents to Create on Demand an Optical Path 22 February 2006 Iosif Legrand
Communities using Mon. ALISA Major Communities q OSG q CMS q ALICE q D 0 q STAR q VRVS q LGC RUSSIA q SE Europe GRID q APAC Grid q UNAM Grid q ABILENE q ULTRALIGHT q GLORIAD q LHC Net q Ro. Edu. NET 23 Mon. ALISA ABILENE Demonstrated at: Running 24 X 7 v. SC 2003 at 250 Sites Ø Collecting 250, 000 v. Telecom World CMS-DC 04 parameters in near 2003 real-time GRID 3 Ø Update rate of 25, 000 v. WSIS 2003 parameter updates per v. SC 2004 second VRVS Ø Monitoring v. I 2 2005 Ø 12, 000 computers ALICE v. TERENA 2005 Ø > 100 WAN Links Ø Thousands of Grid jobs v. IGrid 2005 running con- currently v. SC 2005 February 2006 Iosif Legrand
The Mon. ALISA Architecture Provides: Ø Distributed Registration and Discovery for Services and Applications. Ø Monitoring all aspects of complex systems : q System information for computer nodes and clusters q Network information : WAN and LAN q Monitoring the performance of Applications, Jobs or services q The End User Systems, its performance q Video streaming Ø Can interact with any other services to provide in near real-time customized information based on monitoring data Ø Secure, remote administration for services and applications Ø Agents to supervise applications, trigger alarms, restart or reconfigure them, and to notify other services when certain conditions are detected. Ø The Mon. ALISA framework is used to develop higher level decision services, implemented as a distributed network of communicating agents, to perform global optimization tasks. Ø Graphical User Interfaces to visualize complex information 24 February 2006 Iosif Legrand
b9e113e741efe06396441e37238fcb0c.ppt