Скачать презентацию ATLAS-EDG Task Force report Oxana Smirnova LCG ATLAS Lund oxana Скачать презентацию ATLAS-EDG Task Force report Oxana Smirnova LCG ATLAS Lund oxana

33b1f1d7d3161091a09e1ae88a7d643e.ppt

  • Количество слайдов: 13

ATLAS-EDG Task Force report Oxana Smirnova LCG/ATLAS/Lund oxana. smirnova@cern. ch September 19, 2002, RHUL ATLAS-EDG Task Force report Oxana Smirnova LCG/ATLAS/Lund oxana. [email protected] ch September 19, 2002, RHUL ATLAS Software Workshop 2002 -09 -03 Oxana. [email protected] ch

Outline n n n EDG overview ATLAS-EDG Task Force Use case Problems & solutions Outline n n n EDG overview ATLAS-EDG Task Force Use case Problems & solutions Summary NB! Things are changing (improving) very rapidly; this report may become outdated tomorrow 2002 -09 -03 Oxana. [email protected] ch 2

EU Datagrid project n Started on January 1, 2001, to deliver by end 2003 EU Datagrid project n Started on January 1, 2001, to deliver by end 2003 ¡ ¡ n n Aim: to develop a Grid middleware suitable for High Energy physics, Earth Observation and biology applications Development based on existing tools, e. g. , Globus, LCFG, GDMP etc The core testbed consists of the central site at CERN and few facilities across the Western Europe; many more sites are foreseen to join later By now reached the stability level sufficient to execute production-scale tasks 2002 -09 -03 Oxana. [email protected] ch 3

EDG Testbed n EDG is committed to create a stable testbed to be used EDG Testbed n EDG is committed to create a stable testbed to be used by applications for real tasks ¡ ¡ ¡ n Most sites are installed from scratch using the EDG tools (Red. Hat 6. 2 based) ¡ ¡ n Lyon has installation on the top of existing farm A lightweight EDG installation is available Central element: the Resource Broker (RB), distributes jobs between the resources ¡ ¡ 2002 -09 -03 This started to materialize in mid-August… …and coincided with the ATLAS DC 1 ATLAS was given the first priority Currently, only one RB (CERN) is available for applications In future, may be an RB per Virtual Organization (VO) Oxana. [email protected] ch 4

Chart borrowed from Guido Negri’s slides EDG functionality as of today RC CE rfcp Chart borrowed from Guido Negri’s slides EDG functionality as of today RC CE rfcp CASTOR GDMP or RM LDAP GDMP or RM NFS testbed 010. cern. ch GDMP or RM Output Input RSL lxshare 0399. cern. ch do rfcp UI replicate jdl Input Output CE RB rfcp CE lxshare 0393. cern. ch lxshare 033. cern. ch 2002 -09 -03 Oxana. [email protected] ch 5

ATLAS-EDG Task Force n ATLAS is eager to use Grid tools for the Data ATLAS-EDG Task Force n ATLAS is eager to use Grid tools for the Data Challenges ¡ ¡ n ATLAS-EDG Task Force was put together in August with the aims: ¡ ¡ n n ATLAS Data Challenges are already on the Grid (Nordu. Grid, USA) The DC 1/phase 2 (to start in October) is expected to be done using the Grid tools to a bigger extent To assess the usability of the EDG testbed for the immediate production tasks To introduce the Grid awareness to the ATLAS collaboration The Task Force has representatives both from ATLAS and EDG: 40 members (!) on the mailing list, ca 10 of them working nearly fulltime The initial task: to process 5 input partitions of the Dataset 2000 at the EDG Testbed + one non-EDG site (Karlsruhe); if this works, continue with other datasets 2002 -09 -03 Oxana. [email protected] ch 6

Task description (dataset 2000) n n Input: set of generated events as ROOT files Task description (dataset 2000) n n Input: set of generated events as ROOT files (each input partition ca 1. 8 GB, 100. 000 events); master copies are stored in CERN CASTOR Processing: ATLAS detector simulation using a pre-installed software release 3. 2. 1 ¡ Each input partition is processed by 20 jobs (5000 events each) ¡ Full simulation is applied only to filtered events, ca 450 per job ¡ A full event simulation takes ca 150 seconds per event on a 1 GHz PIII processor Output: simulated events are stored in ZEBRA files (ca 1 GB each output partition); an HBOOK histogram file and a log-file (stdout+stderr) are also produced. Total: 9 GB of input, 2000 CPU-hours of processing, 100 GB of output. 2002 -09 -03 Oxana. [email protected] ch 7

Execution of jobs n It was expected that we can make full use of Execution of jobs n It was expected that we can make full use of the Resource Broker functionality ¡ Data-driven job steering ¡ Best available resources otherwise n Input files are pre-staged once (copied from CASTOR and replicated elsewhere) n A job consists of the standard DC 1 shell-script, very much the way it is done on a conventional cluster A Job Definition Language is used to wrap up the job, specifying: ¡ The executable file (script) ¡ Input data ¡ Files to be retrieved manually by the user ¡ Optionally, other attributes (max. CPU, Rank etc) Storage and registration of output files is a part of the job n n 2002 -09 -03 Oxana. [email protected] ch 8

Encountered obstacles EDG can not replicate files directly from CASTOR and can not register Encountered obstacles EDG can not replicate files directly from CASTOR and can not register them in the Replica Catalog Ø Replication was done via CERN SE; EDG is working on a better (though temporary) solution. CASTOR team writes a Grid. FTP interface, which will help a lot. Ø Big file transfer interrupts after 1. 2 – 1. 3 GB Ø Also known Globus API problem, temporary fixed by using plain Globus instead of EDG tools ü Jobs were “lost” by the system after 20 minutes of execution Ø Known problem of the Globus software, temporary fixed on expense of frequent job submission ü Static information system: if a site goes down, it should be removed manually from the index Ø ü 2002 -09 -03 Attempts are under way to switch to the dynamic hierarchical MDS; not yet stable due to the Globus bugs Oxana. [email protected] ch 9

Other minor problems n Installation of ATLAS software: ¡ ¡ n Authentication & authorization, Other minor problems n Installation of ATLAS software: ¡ ¡ n Authentication & authorization, users and services ¡ ¡ ¡ n ¡ Is abundant and not very much user-oriented Things are improving as more users are coming Information system ¡ n EDG can’t accept instantly a dozen of new country Certificate Authorities A possible (quick) solution: ATLAS CA? Default proxy lives only 12 hours – users keep forgetting to request longer ones to accommodate long jobs Documentation ¡ n Cyclic dependencies External dependencies, esp. on system software very difficult to browse/search and retrieve relevant info Data management ¡ ¡ 2002 -09 -03 information about existing file collections is not easy to find management of output data is mostly manual (can not be done via JDL) Oxana. [email protected] ch 10

Achievements: n n n n A team of hard-working people across the Europe ATLAS Achievements: n n n n A team of hard-working people across the Europe ATLAS software (release 3. 2. 1) is packed into relocatable RPMs, distributed and validated elsewhere DC 1 production script is “gridified”, submission script is produced User-friendly testbed status monitor deployed 5 Dataset 2000 input files are replicated to 5 sites (2 @ each) After fixing the “long jobs” problem, 50% of the planned challenge is performed (5 researchers × 10 jobs) – unfortunately, only CERN testbed was fully available With the rest of the testbed being fixed, jobs are getting scheduled and executed elsewhere Second test: 4 input files (ca 400 MB each) replicated to 4 sites; 250 jobs submitted, adjusted to run ca 4 hours each. The jobs were distributed across all the testbed by the Resource Broker 2002 -09 -03 Oxana. [email protected] ch 11

Summary n Advantages of the Grid: ¡ ¡ ¡ n Where we are now: Summary n Advantages of the Grid: ¡ ¡ ¡ n Where we are now: ¡ ¡ n Possibility to execute tasks and move files over a distributed computing infrastructure by using one single personal certificate (no need to memorize dozens of passwords) Possibility do distribute the workload adequately and automatically, without logging in explicitly to each remote system Possibility to do worldwide production in a perfectly coordinated way, using identical software (RPMs), scripts and databases Several Grid toolkits are on the market EDG – probably the most elaborated, but still in development This development goes way faster with the help of the users running real applications Common efforts of the ATLAS-EDG Task Force proved that it is possible to execute real tasks on the EDG Testbed already now Thanks all the members for the efforts so far, but there’s more to be done! 2002 -09 -03 Oxana. [email protected] ch 12

2002 -09 -03 Cal Loomis Fairouz Malek-Ohlsson Gonzalo Merino Armin Nairz Giudo Negri Steve 2002 -09 -03 Cal Loomis Fairouz Malek-Ohlsson Gonzalo Merino Armin Nairz Giudo Negri Steve O'Neale Laura Perini Gilbert Poulard Alois Putzer Di Qing Mario Reale David Rebatto Zhongliang Ren Silvia Resconi Alessandro De Salvo Markus Schulz Oxana Smirnova Chun Lik Tan Jeff Templon Stan Thompson Luca Vaccarossa Oxana. [email protected] ch Peter Watkins 13