1882a0c31741471500db8554ffc2bc9c.ppt
- Количество слайдов: 17
Status of the WLCG Tier-2 Centres M. C. Vetterli Simon Fraser University and TRIUMF WLCG Overview Board, CERN, October 27 th 2008 Simon Fraser M. C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #1
Sources of Information § Discussions with experiment representatives in July § APEL monitoring portal http: //www 3. egee. cesga. es/gridsite/accounting/CESGA/egee_view. php § WLCG reliability reports http: //lcg. web. cern. ch/LCG/accounts. htm § October GDB mtg; dedicated to Tier-2 issues http: //indico. cern. ch/conference. Display. py? conf. Id=20234 § Talks from the last OB & LHCC Slides labeled with a * are from MV’s LHCC rapporteur talk Simon Fraser M. C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #2
Tier-2 Performance Summary* § Overall, the Tier-2 s are contributing much more now § Significant fractions of the Monte Carlo simulations are being done in the T 2 s for all experiments § Reliability is better, but still needs to improve § CCRC’ 08 exercise is generally considered a success for the Tier 2 s Simon Fraser M. C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #3
Tier-2 Centres in CCRC’ 08 – General* § Overall, the Tier-2 s and the experiments considered the CCRC’ 08 exercise to be a success § The networking/data transfers were tested extensively; some FTS tuning was needed, but it worked out § Experiments tended to continue other activities in parallel which is a good test of the system, although the load was not as high as anticipated § While CMS did include significant user analysis activities, the chaotic use of the Grid by a large number of inexperienced people is still to be tested Simon Fraser M. C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #4
Tier-2 Issues/Concerns As of CB and meetings with experiments this summer § Communications: Do Tier-2 s have a voice? Is there a good mechanism for disseminating information? § Better monitoring: Pledges vs actual vs used § Hardware acquisitions: What should be bought? k. SI 2006? § Tier-2 capacity: Size of datasets? Effect of LHC delay? §… Simon Fraser M. C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #5
Tier-2 Issues/Concerns § Upcoming onslaught of users: Some user analysis tests have been done but scaling is a concern § User Support: Ticketing system exists but it is not really used for user support issues. This affects Tier-2 s especially. § Federated Tier-2 s: Tools to federate? Monitoring? (averaging) § Interoperability of EGEE, OSG, and NDGF should be improved § Software/Middleware updates: Could be smoother; too frequent Simon Fraser M. C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #6
Communications for Tier-2 s § Identified by the T 2 s at the last CB as a serious problem. Interesting to me that many in experiment computing management did not share this concern. § Should communication be organized according to experiment or to Tier-1 association? There also differing opinions on this. There are two issues: Grid middleware/operations Experiment software § My view after studying this is that the situation is OK for “tightly coupled” Tier-2 s, but not for remote and smaller Tier-2 s that are not well coupled to a Tier-1. Simon Fraser M. C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #7
Communications for Tier-2 s § Many lines of communication do indeed exist. § Some examples are: CMS has two Tier-2 coordinators: Ken Bloom (Nebraska) Giuseppe Bagliesi (INFN) - attend all operations meetings - feed T 2 issues back to the operations group - write T 2 -relevant minutes - organize T 2 workshops ALICE has designated 1 Core Offline person in 3 to have privileged contact with a given T 2 site manager - weekly coordination meetings - Tier-2 federations provide a single contact person - A Tier-2 coordinates with its regional Tier-1 Simon Fraser M. C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #8
Communications for Tier-2 s ATLAS uses its cloud structure for communications - Every Tier-2 is coupled to a Tier-1 - 5 national clouds; others have foreign members (e. g. “Germany” includes Krakow, Prague, Switzerland; Netherlands includes Russia, Israel, Turkey) - Each cloud has a Tier-2 coordinator Regional organizations, such as: + France Tier-2/3 technical group: - coordinates with Tier-1 and with experiments - monthly meetings - coordinates procurement and site management + GRIF: Tier-2 federation of 5 labs around Paris + Canada: Weekly teleconferences of technical personnel (T 1 & T 2) to share information and prepare for upgrades, large production, etc. + Many others exist; e. g. in the US and the UK Simon Fraser M. C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #9
Communications for Tier-2 s § Tier-2 Overview Board reps: Michel Jouvin and Atul Gurtu have just been appointed to the OB to give the Tier-2 s a voice there. § Tier-2 mailing list: Actually exists and is being reviewed for completeness & accuracy § Tier-2 GDB: The October GDB was dedicated to Tier-2 issues + reports from experiments: role of the T 2 s; communications + talks on regional organizations + discussion of accounting + technical talks on storage, batch systems, middleware Seems to have been a success; repeat a couple of times per year? Simon Fraser M. C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #10
Simon Fraser M. C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #11
Simon Fraser M. C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #12
Tier-2 Installed Resources § But how much of this is a problem of under-use rather than under-contribution? a task force has been set up to extract installed capacities from the Glue schema § Monthly APEL reports still undergo significant modifications from first draft. Good because communication with T 2 s better Bad because APEL accounting still has problems Accounting seems to be very finicky; breaks when the CE or MON box is upgraded § How are jobs distributed to the Tier-2 s? Simon Fraser M. C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #13
Tier-2 Hardware Questions § How does the LHC delay affect the requirements and pledges for 2009? + We are told to go ahead and buy what was planned but we have already seen some under-use of CPU capacity and we have seen this for storage as well Simon Fraser M. C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #14
Tier-2 Hardware Questions § How does the LHC delay affect the requirements and pledges for 2009? + We are told to go ahead and buy what was planned but we have already seen some under-use of CPU and we are now starting to see this for storage as well § We need to use something other than Spec. Int 2000! + this benchmark is totally out-of-date & useless for new CPUs + continued delays in Spec. HEP can cause sub-optimal decisions Simon Fraser M. C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #15
Tier-2 Hardware Questions § Networking to the nodes is now an issue. + with 8 cores per node, 1 Gig. E connection ≈ 16. 8 MB/sec/core + Tier-2 analysis jobs run on reduced data sets and can do rather simple operations have seen 7. 5 MB/sec at ATLAS and much more (x 10? ) + Do we need to go to Infiniband? + We certainly need increased capability for the uplinks; we should have a minimum of fully non-blocking Gig. E the worker nodes. We need more guidance from the experiments The next round of purchases is now! Simon Fraser M. C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #16
Summary § The role of the Tier-2 centres has increased markedly in the last year >50% of Monte Carlo simulation is done in the T 2 s now. § The CCRC’ 08 exercise is considered a success by the Tier 2 s and by the experiments. § Availability and reliability are up, but still need improvement. § Resource acquisition vs pledges is better but still needs work § Issues for Tier 2 s: - communication should be (& is being) improved - work should ramp up on chaotic user analysis - reporting actual resources should be established - improved user support is needed Simon Fraser M. C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #17


