f179236ce8b93c5a78bbb730ce833527.ppt
- Количество слайдов: 37
- Eddy Caron
Lego Team from GRAAL • • Anne Benoît (Mc. F) Eddy Caron (Mc. F) Frédéric Desprez (DR) Yves Caniou (Mc. F) • • Raphaël Bolze (Ph. D) Pushpinder Kaur Chouhan (Ph. D) Jean-Sébastien Gay (Ph. D) Cedric Tedeschi (Ph. D) E. Caron - Réunion de lancement LEGO - 10/02/06
Lego Team from GRAAL • • Anne Benoît (Mc. F) Eddy Caron (Mc. F) Frédéric Desprez (DR) Yves Caniou (Mc. F) • • Raphaël Bolze (Ph. D) Pushpinder Kaur Chouhan (Ph. D) Jean-Sébastien Gay (Ph. D) Cedric Tedeschi (Ph. D) E. Caron - Réunion de lancement LEGO - 10/02/06
DIET Architecture Client Master Agent MA JXTA MA MA LA FAST library Application Modeling LDAP LA LA System availabilities NWS Server front end Local Agent E. Caron - Réunion de lancement LEGO - 10/02/06 LA
Data Management Join work with G. Antoniu, E. Caron, B. Del Fabbro, M. Jan
Data/replica management • Two needs w Keep the data in place to reduce the overhead of communications between clients and servers w Replicate data whenever possible • Two approaches for DIET Client w DTM (LIFC, Besançon) § Hierarchy similar to the DIET’s one § Distributed data manager § Redistribution between servers B B § P 2 P data cache F X Server 2 Net. Solve w IBP (Internet Backplane Protocol) : data cache w Request Sequencing to find data dependences • Server 1 B w Jux. Mem (Paris, Rennes) • A Work done within the Grid. RPC Working Group (GGF) w Relations with workflow management E. Caron - Réunion de lancement LEGO - 10/02/06 Y Client G
Data management with DTM within DIET • Persistence at the server level • To avoid useless data transfers w w • Intermediate results (C, D) Between clients and servers Between servers “transparent” for the client Data Manager/Loc Manager w Hierarchy mapped on the DIET one w modularity • Proposition to the Grid-RPC WG (GGF) w Data handles w Persistence flag w Data management functions E. Caron - Réunion de lancement LEGO - 10/02/06
Performances (A = C * B) Performance E. Caron - Réunion de lancement LEGO - 10/02/06
Performances (C = A * B; D = E + C; A =t. A) E. Caron - Réunion de lancement LEGO - 10/02/06
JUXMEM PARIS project, IRISA, France • A peer-to-peer architecture for a data-sharing service in memory • Persistence and data coherency mechanism • Transparent data localization Peer ID Peer ID Toolbox for the development of P 2 P applications One peer Peer TCP/IP Peer Peer Firewall Peer Set of protocols Peer HTTP E. Caron - Réunion de lancement LEGO - 10/02/06 Unique ID Several communication protocols (TCP, HTTP, …)
Visualization Work with Raphaël Bolze
Viz. DIET: A visualization tool • • Current view of the DIET platform A postmortem analysis from log files is available Good scalability We can show : w w w Communication between agents State of Se. D Available Services Persistent Data Name information CPU, memory and network load. E. Caron - Réunion de lancement LEGO - 10/02/06
Log. Service • • CORBA communications Messages ordering and scheduling Messages filtering System state E. Caron - Réunion de lancement LEGO - 10/02/06
Log. Service & DIET • Log. Service Componant w Log. Manager (LM) w Log. Central • Each Log. Manager receives information from agent and send them to Log. Central out of DIET structure. • Viz. Diet shows graphicaly all messages from Log. Service • Message transfert from agent using Log. Manager w No disc storage E. Caron - Réunion de lancement LEGO - 10/02/06
Viz. DIET v 1. 0 XML: - DIET Agents - DIET Servers - Physical Machines - Physical Storage Viz. DIET Distributed DIET Deployment Log. Service Go. DIET E. Caron - Réunion de lancement LEGO - 10/02/06
Screenshot : Platform Visualization E. Caron - Réunion de lancement LEGO - 10/02/06
Screenshots: Statistic module E. Caron - Réunion de lancement LEGO - 10/02/06
Platform Deployment Work from E. Caron, P. -K. Chouhan and A. Legrand
Go. DIET: A tool for automated DIET deployment • Automate configuration, staging, execution and management of distributed DIET platform w Support experiments at large scale w Faster and easier bulk testing w Reduce errors & debugging time for users • Constraints: w Simple XML file w Console & batch mode w Integrate w/ visualization tools and CORBA tools [wrote in Java] E. Caron - Réunion de lancement LEGO - 10/02/06
DIET usage with contrib services Déploiement distribué de DIET Administration de DIET XML Go. DIET Traces Sous-ensemble de traces Viz. DIET E. Caron - Réunion de lancement LEGO - 10/02/06 Log. Service Sous-ensemble de traces
Launch process • Go. DIET follows DIET hierarchy in launch order • For each element to be launched: w Configuration file written local disk [including parent agent, naming service location, hostname and/or port endpoint…] w Configuration file staged remote disk (scp) w Remote command launched (ssh) [PID retrieved, stdout & stderr saved on request] • Feedback from Log. Central used to time launch of next element E. Caron - Réunion de lancement LEGO - 10/02/06
Go. DIET Console • java -jar Go. DIET. jar vthd 4 site. xml E. Caron - Réunion de lancement LEGO - 10/02/06
Go. DIET: before launch E. Caron - Réunion de lancement LEGO - 10/02/06
Go. DIET: after launch • 27 sec launch w/ waiting for feedback E. Caron - Réunion de lancement LEGO - 10/02/06
Grid’ 5000 DIET deployment • 7 sites / 8 clusters w Bordeaux, Lille, Lyon, Orsay, Rennes, Sophia, Toulouse • 1 MA • 8 LA • 574 Se. D E. Caron - Réunion de lancement LEGO - 10/02/06
Scheduling Work with Alan Su, Peter Frauenkron, Eric Boix
The scheduling • • • Plug-in scheduler Round robin as default scheduling Advanced scheduling only possible with more information. Existing schedulers in DIET use data of FAST and/or NWS. Limitations: w w deployment of appropriate hierarchies for a given grid platform is non-obvious limited consideration of inter-task factors non-standard application- and platform-specific performance measures FAST, NWS : low availability, Se. D idles, for NWS no default weighting difficult (possible? ). E. Caron - Réunion de lancement LEGO - 10/02/06
Plugin Scheduling • • • Plugin scheduling facilities to enable application-specific definitions of appropriate performance metrics an extensible measurement system tunable comparison/aggregation routines for scheduling composite requirements enables various selection methods w w basic resource availability processor speed, memory database contention future requests Component Se. D Agents Client Before After automatic performance estimate (FAST/NWS) chosen/defined by application programmer exec. time sorting “menu” of aggregation methods CLIENT CODE UNCHANGED E. Caron - Réunion de lancement LEGO - 10/02/06
Co. RI • • • Collector: an easy interface to gathering performance and load about a specific Se. D. Two modules (currently): Co. RI-Easy and FAST Possible to extend (new modules): Ganglia, Nagios, R-GMA, Hawkeye, INCA, MDS, … Co. RI - Easy FAST other • Using fast and basic functions or simple performance tests. • Keep the independence of DIET. • Able to run on “all” operating systems to allow a default scheduling with basic information. E. Caron - Réunion de lancement LEGO - 10/02/06
Batch and parallel submissions Work with Yves Caniou
Difficulties of the problem • Several Se. D types • Parallel or sequential jobs • Submit a parallel job (pdgemm, . . . ) • Transparent for the user • General API agent Se. D_batch Se. D_seq Se. D_parallel
Se. D_parallel • Se. D_parallel on the frontal • Submit a parallel job → system dependant w w agent NFS: copy the code ? MPI: LAM, MPICH ? Reservation ? Monitoring & Perf. prediction agent Se. D_parallel Frontal NFS
Se. D_batch • Se. D_batch on the frontal • Submit a parallel job → even more system dependent agent w Previous mentioned problems w Numerous batch systems → homogenization ? w Batch sched. behavior → queues, scripts, etc. agent Se. D_batch GLUE OAR SGE LSF PBS Condor Loadleveler
Batch & parallel submissions • Asynchronous, long term production jobs • Still more problems w System dependent, numerous batch systems and their behavior w Performance prediction ! → Application makespan in function of #proc? → If reservation available, how to compute deadline? w Scheduling problems → Do we reserve when probing? How long hold it? → How to manage data transfers when waiting in the queue? w Co-scheduling? w Data & job migration?
Future Work
Future work • LEGO applications with DIET w CRAL (RAMSES) w CERFACS w TLSE (Update) • Components and DIET w Which architecture ? • Deployment w Link between ADAGE and theoretical solution on cluster [IJHPCA 06] ? w Anne Benoît approach • … E. Caron - Réunion de lancement LEGO - 10/02/06
Questions ? http: //graal. ens-lyon. fr/DIET