Скачать презентацию CMS Castor 2 Experience n On behalf of Скачать презентацию CMS Castor 2 Experience n On behalf of

5da907767758a86a8a67becdf12eec91.ppt

  • Количество слайдов: 16

CMS Castor 2 Experience n On behalf of CMS Computing n Nick J. Sinanis CMS Castor 2 Experience n On behalf of CMS Computing n Nick J. Sinanis n CERN Castor Review - CERN June 8 th, 2006

Castor 2 for CMS n January ’ 05 to Summer ’ 05 u n Castor 2 for CMS n January ’ 05 to Summer ’ 05 u n Summer ’ 05 to December ‘ 05 u u u n First Castor 2 tests with Ph. EDEx First tests for SC 3 (Lassi Tuura) l Lots of problems Production (Nikolay Darmenov) l Smooth when not hitting server problems Transfers & Production - the 1 st migrated activity December ’ 05 to March ’ 06 u Towards full migration l Had to understand the issues of è Applications (@cern and not at @cern) è Configuration Nick J. Sinanis CERN - CMS Computing Castor Review - CERN June 8 th, 2006

Castor 2 Migration n Mid-March u n First CMS DS in place l WAN Castor 2 Migration n Mid-March u n First CMS DS in place l WAN Pool l CMSPROD (relocated from castorgridsc) Migration was finalized 2 nd of May u u u Most delicate part migrating all default accesses l Env Switch Delayed not to interfere with Physics TDR II Some problems with l Libshift l Castor Env for Grid Jobs Nick J. Sinanis CERN - CMS Computing Castor Review - CERN June 8 th, 2006

Castor 2 Migration (II) n Strong and early buy-in from CMS u u u Castor 2 Migration (II) n Strong and early buy-in from CMS u u u n Started testing at the earliest possibility Both sides learnt a lot, even at a cost of time Had to coordinate the Castor 2 migration while l Having a changing Software environment l intense PTDR activities were taking place è Running old Software l Had to respond to urgent production and transfer requests for the PTDR Yet, we lack the full experience that could make us fully confident u More to come on this soon… Nick J. Sinanis CERN - CMS Computing Castor Review - CERN June 8 th, 2006

Castor 2 Issues u Configuration u Performance and Scalability u Operations u Support Nick Castor 2 Issues u Configuration u Performance and Scalability u Operations u Support Nick J. Sinanis CERN - CMS Computing Castor Review - CERN June 8 th, 2006

Configuration n Libshift. so u u New ABI SO versioning policy requires change management! Configuration n Libshift. so u u New ABI SO versioning policy requires change management! l Was easier with Castor 1 l Castor 2 is expected to release libshift versions more frequently This raises important issues as: l Forward compatibility l How patches are handled? l Validation by experiments? l Effective distribution to sites? l Part of the OS or like any other external? Nick J. Sinanis CERN - CMS Computing Castor Review - CERN June 8 th, 2006

Performance and Scalability n Our sole experience is with SC’s u u n Numbers Performance and Scalability n Our sole experience is with SC’s u u n Numbers look good so far l It was an enormous amount of work They are not representative though l No opportunity to confirm SC 3 rerun numbers l Neither T 0 nor CAF access patterns have been tried yet! CMS real test of Castor 2 is CSA ’ 06 ! Nick J. Sinanis CERN - CMS Computing Castor Review - CERN June 8 th, 2006

CSA ‘ 06 n Computing, Software & Analysis Challenge 2006 u u u u CSA ‘ 06 n Computing, Software & Analysis Challenge 2006 u u u u n Targeted for October and for one month l Early start mid-September 20 -40 Hz exercise from Tier-0 down to Tier-2 Simulate DAQ with 50 Mevts on Tier-0 disk Run Calibration jobs Run Tier-0 PR Write to Tape Ship Full Event to Tier-1 s l 300 MB/s from Tier-0 to Tier-1 s CSA ’ 06 Demonstrates the Workflow u SC 4 is a validation step for CSA ’ 06 Nick J. Sinanis CERN - CMS Computing Castor Review - CERN June 8 th, 2006

Operations n Service Side u u u Several reconfigurations required so far l Maturity? Operations n Service Side u u u Several reconfigurations required so far l Maturity? l Or just flexibility? l Can Tier-1’s deploying Castor 2 follow all this? Many new components l Usually takes time to understand l Usually users manage to break things Validation of Significant Service Reconfigurations l Need to be part of the experiment’s validation è Aim for this at the start up Nick J. Sinanis CERN - CMS Computing Castor Review - CERN June 8 th, 2006

Operations (II) n User Side u u n This mainly means “Special Users” CSA Operations (II) n User Side u u n This mainly means “Special Users” CSA ’ 06 is planned as an operations aware exercise l Will need to have effective channels for this Interface to Large Scale DM Tools u u u Administration tools Monitoring is crucial System level monitoring is good and very detailed l But not what we actually need! Nick J. Sinanis CERN - CMS Computing Castor Review - CERN June 8 th, 2006

Service Monitoring n We need to couple monitoring to u u n User application Service Monitoring n We need to couple monitoring to u u n User application (CAF) Tier-0 l Merge and Export buffers But also to understand u u u Disk retention and GC policies File replication File pinning Staging latencies Active Users (top 20) Nick J. Sinanis CERN - CMS Computing Castor Review - CERN June 8 th, 2006

Monitoring Example n What is all this traffic? u CMS _default cluster yesterday Nick Monitoring Example n What is all this traffic? u CMS _default cluster yesterday Nick J. Sinanis CERN - CMS Computing Castor Review - CERN June 8 th, 2006

SRM n SRM is a core strategy for CMS u u u n Vast SRM n SRM is a core strategy for CMS u u u n Vast majority of our Castor-2 problems had to do with Castor/SRM or interoperability u u u n Used since spring 2004 (DC 04) Part of C-TDR baseline Practical interoperability critical l V 1, V 2 and recently agreed extensions l “SRM-as-used-by-X” insufficient SRM request corruption and state confusion Lack of “safe-for-everyday-use” srm. Copy() Inability to delete files (advisory delete) We lack confidence in the SRM side Nick J. Sinanis CERN - CMS Computing Castor Review - CERN June 8 th, 2006

Support n It has been almost immediate! u u n Support People in the Support n It has been almost immediate! u u n Support People in the middle of u u u n Systems Users Policies, etc. Is it sufficient? u u n Evident effort to dig into rather complex problems l e. g. , SRM problems Many thanks go to Olof, Jan et al. , behind the scenes Aren’t we quickly burning precious resources? Will support level scale as we approach real experiment tests and startup? What happened to the user manual? Nick J. Sinanis CERN - CMS Computing Castor Review - CERN June 8 th, 2006

Summary n Strong buy-in from CMS n Successful migration n Pending confirmation of performance Summary n Strong buy-in from CMS n Successful migration n Pending confirmation of performance and scalability in CSA’ 06 n Support has been fine n Inevitably we need to see the big picture u u SRM Service Monitoring tools Nick J. Sinanis CERN - CMS Computing Castor Review - CERN June 8 th, 2006

Questions Nick J. Sinanis CERN - CMS Computing Castor Review - CERN June 8 Questions Nick J. Sinanis CERN - CMS Computing Castor Review - CERN June 8 th, 2006