Скачать презентацию Datamanagement for PS Informal workshop 06 07 2010 Скачать презентацию Datamanagement for PS Informal workshop 06 07 2010

10f433d5bb98cdfc257ff361236d1770.ppt

  • Количество слайдов: 12

Datamanagement for PS, Informal workshop, 06. 07. 2010, A. Rothkirch, FSEC Datamanagement for PS Datamanagement for PS, Informal workshop, 06. 07. 2010, A. Rothkirch, FSEC Datamanagement for PS - Notes in brief A. Rothkirch et al. Outline 1. Introduction / Motivation 2. Current PETRA III/Hasylab Computing Infrastructure • • PETRA III fileserver layout PETRA III workgroup server (wgs) layout 3. Mid term requirements (next 5 years) • • Data amounts Data management Data processing User management

Datamanagement for PS, Informal workshop, 06. 07. 2010, A. Rothkirch, FSEC Requirements for DESY Datamanagement for PS, Informal workshop, 06. 07. 2010, A. Rothkirch, FSEC Requirements for DESY Photon Science (PS) Computing • The high brilliant radiation source PETRA III has begun operation • Experiments apply fast, high resolving detectors (time and space) leading to huge amounts of data. Thus, needs arise for • • Data storage and management Data processing Access to data and compute power from outside DESY User administration / management by activities of CFEL, FLASH, GKSS, PETRA III Extension and Inhouse research. Additional Services from IT used by PS: Windows + Linux installation, Windows FS, AFS, Mail, Networks, Oracle, Printing, Licenses (e. g. IDL). This presentation should support for planning, management and coordination of IT resources

Datamanagement for PS, Informal workshop, 06. 07. 2010, A. Rothkirch, FSEC Data storage and Datamanagement for PS, Informal workshop, 06. 07. 2010, A. Rothkirch, FSEC Data storage and data management Fundamental decision • data storage and processing at DESY IT (‘Rechenzentrum’) Current status PETRA III • 2 File Server systems installed (FS) • 1 Blade Center (Workgroup Servers (WGS)) installed • FS: Total amount of hard disc storage ~176 TB. • WGS: 16 x 8 phys. cores. Those systems are used by DESY beamlines (BLs) to ensure BL operation GKSS (P 05, P 07) and EMBL (P 12, P 13, P 14) are not covered ! • Experiment hall ist set up with 1 GE Networking • Selected experiments having high amounts of data (HTPs) are connected by 10 GE fiber optics (mono mode)

Datamanagement for PS, Informal workshop, 06. 07. 2010, A. Rothkirch, FSEC Brief specs for Datamanagement for PS, Informal workshop, 06. 07. 2010, A. Rothkirch, FSEC Brief specs for current infrastructure I) • 2 Fileserver systems (Netapp 3140), each – ~ 70 Ti. B netto, Raid 6, 20% snapshot – ~ 88 Ti. B max. (NO snapshot) • • • II) • Total: ~ 140 / 176 Ti. B Take over option 5 year maintenance / support 1 Blade center – 16 Work group server, 2 x 2, 26 Ghz Quad. Core 24 GB RAM (8 cores + 8 HT) III) • 3 Dell R 510 (NFS Storage) [in mid Juli] – ~ 40 TB Hasylab in-house storage + 20 TB DCache – Testbed for data migration

Datamanagement for PS, Informal workshop, 06. 07. 2010, A. Rothkirch, FSEC I) PETRA III Datamanagement for PS, Informal workshop, 06. 07. 2010, A. Rothkirch, FSEC I) PETRA III fileserver layout Hasylab P 3 -BLs share storage and bandwidth • Limits (single FS system, we have two) – 140 -240 MB/s transfer rate (Linux), depending on file size (300 MB/s peak in parallel transfers seen) – Up to 60 MB/s on Win. XP (NO real 10 GBit support) – Tests with Win 2008 Server pending (IT) Detector Software/Driver support unclear • Connection Hall 47 ↔ FS 4 x 10 Gbit default Net + 2 x 10 Gbit extra p 3 -fs 01 P 03 P 06 P 01 FSEC AR P 08 p 3 -fs 02 • Bls: n times 1 Gbit HTP (10 Gbit fiber) p 3 -fs 03 P 02 p 3 -fs 04 P 10 P 11 P 09 P 04 High throughput

Datamanagement for PS, Informal workshop, 06. 07. 2010, A. Rothkirch, FSEC Boundaries Special “Bios” Datamanagement for PS, Informal workshop, 06. 07. 2010, A. Rothkirch, FSEC Boundaries Special “Bios” for Fileserver • Allows for max. of 55 (? ) discs for single aggregate → max. ~40 Ti. B netto, NO snapshot → max. “directory size” defines ‚experiment capacity’ • Number of files per directory depends on protocol • Right management depends on protocol • Currently – Access from Win / Linux needed – Many single files from some kb to 32 MB are written – Network file system (NFS) [rights: user/groups/other] [2∙ 106 files written in test] – Directory sizes of 6. 4 TB (Raid 6, 20% snapshot) • Options for archiving under first discussions with IT * Boundaries: how, how much, what, how long, cost distribution * Speed issues: HTP experiments

Datamanagement for PS, Informal workshop, 06. 07. 2010, A. Rothkirch, FSEC II) PETRA III Datamanagement for PS, Informal workshop, 06. 07. 2010, A. Rothkirch, FSEC II) PETRA III workgroup server (wgs) layout Blade center Each wgs: • Dual Quadcore • Nehalem architecture • 2. 23 GHz • 8 cores (+ 8 HT) • 24 GB RAM • 16 GB swap • SL Linux 5. 3 2. 6. 18 -128 -7 -1 -el 5; x 86_64 • IDL 7. 1 • Python 2. 4. 3 (default) • Python 2. 5. 2 (phyton 2. 5) p 3 -wgs 01 p 3 -wgs 02 p 3 -wgs 03 p 3 wgs. desy. de p 3 -wgs 04 p 3 -wgs 05 p 3 -wgs 06 p 3 -wgs 07 pool or dedicated to BL p 3 -wgs 08 p 3 -wgs 09 p 3 -wgs 10 p 3 -wgs 11 p 3 -wgs 12 p 3 -wgs 13 p 3 -wgs 14 p 3 -wgs 15 p 3 -wgs 16 High throughput BLs p 3 wgs: load balanced pool p 03 p 06 p 10 p 11 Test system / version evaluation Data management

Datamanagement for PS, Informal workshop, 06. 07. 2010, A. Rothkirch, FSEC WGS setup (all Datamanagement for PS, Informal workshop, 06. 07. 2010, A. Rothkirch, FSEC WGS setup (all have in common) • SL Linux 5. 3 (64 Bit) • Software by /opt/products, i. e. – IDL, Python, Matlab, Maple, Mathematica etc. • Dedicated 15 GB partition for special scientific software – available by ‘ini xray’ (setup on request by Scientists) p 3 -wgs 03: /afs/desy. de/user/r/rothkirc>ini xray Initializing XRAY setup. . . o bkchem - chemical drawing program o CCP 4 6. 1. 3 - crystallography package o Clipper - C++ crystallographic library o COOT 0. 6. 1 - Crystallographic Object-Oriented Toolkit o PHENIX 1. 4 -3 - automated structure determination o Platon - multipurpose crystallographic tool o SHELX - crystal structure determination o XDS - X-ray Detector Software Initializing Module xray. . .

Datamanagement for PS, Informal workshop, 06. 07. 2010, A. Rothkirch, FSEC Mid term requirements Datamanagement for PS, Informal workshop, 06. 07. 2010, A. Rothkirch, FSEC Mid term requirements (next 5 years) Data storage • Users have limited infrastructure for storage (and processing) • DESY has to on site provide resources • Needs by DESY scientists Arbeitsgruppe Inhouse W 3_1 – W 3_5 CFEL PETRA III Messdaten [TB / Jahr] Summe [TB / Jahr] Je 10 50 Coherent X-Ray Imaging N. N. Theory group 200 150 - 350 P 01 – P 04 P 05 (GKSS) P 06 P 07 (GKSS) P 08 -P 11 70 150 100 150 320 ~800 PETRA III Erweiterung 300 FLASH 100 1600

Datamanagement for PS, Informal workshop, 06. 07. 2010, A. Rothkirch, FSEC Data amounts (next Datamanagement for PS, Informal workshop, 06. 07. 2010, A. Rothkirch, FSEC Data amounts (next 5 years) Measured data: 1600 TB Assuming 30% on average for analysis and data provision of ½ year results Disc storage (constant): 2 PB This includes backup. Rule for long term storage: data resulting into publication should be stored for 10 years Expectation: 50% of data Tape (increaing): 1 PB / Jahr resp. 5 PB in next 5 years. Experiments become operational at different times. PETRA III expectation: Fully operational in spring 2012.

Data management Datamanagement for PS, Informal workshop, 06. 07. 2010, A. Rothkirch, FSEC Users Data management Datamanagement for PS, Informal workshop, 06. 07. 2010, A. Rothkirch, FSEC Users need to store data onsite DESY and need access to data for analysis. Prerequisites: • Implementation of authentication and authorization. The procedure needs to be that user friendly, that PS users lacking detailed knowledge in methods of electronic data processing can handle this. • A portal is needed allowing an effective access to data. Aim: allow for searching by specific criteria, for organizing/sorting the data as well as for data transfer Data processing After data transfer to the RZ, access to computing power is needed to process the data. Tasks are: • Reconstruction issues to allow final analysis or to provide the user a basis on which he/she can proceed with work. • A fast data analysis becomes more and more important to allow for monitoring and quality assessment of the experiment • Actual analysis can be controlled from computers within DESY or remotely controlled by the user via a proper interface The existing WGS is not fully used to capacity. Increasing demands should be covered by scaling existing infrastructure. For specific applications, parallel processing or cost effective CPUs can be useful.

Datamanagement for PS, Informal workshop, 06. 07. 2010, A. Rothkirch, FSEC User management Digital Datamanagement for PS, Informal workshop, 06. 07. 2010, A. Rothkirch, FSEC User management Digital User Office DUO (from PSI) has been adopted management of PS users: DOOR is successfully used and one can assume that it needs further development. PNI-HDRI The PNI Institutes within HGF have to face similar challenges. Beside others, they invented the HDRI Refer to comments of Rainer Gehrke Brief summary Increasing amount of data by PS Experiments resulting in increasing IT support • Acquisition, installation and operation of File and Compute Servers • Needs for comprehensive support and assistance of users in data management and processing • User support on behalf of e. g. authentication, authorization, access to data and compute server. . .