3396642fa320dd3d37cb97974066f881.ppt
- Количество слайдов: 29
Oxford University Particle Physics Unix Overview Sean Brisbane Particle Physics Systems Administrator Room 661 Tel 73389 s. brisbane 1@physics. ox. ac. uk 14 th October 2014 Graduate Lectures 1
Strategy l Local Cluster Overview l Connecting to it l Grid Cluster l Computer Rooms l How to get help l 14 th October 2014 Graduate Lectures 2
Particle Physics Strategy The Server / Desktop Divide Servers Virtual Machine Host Desktops General Purpose Unix Server Win 7 PC Linux File Servers Win 7 PC Linux Worker nodes Win 7 PC Group DAQ Systems Ubuntu PC Web Server NIS Server torque Server Linux Desktop Approx 200 Desktop PC’s with Exceed, putty or ssh/X windows used to access PP Linux systems 14 th October 2014 Graduate Lectures 3
Physics fileservers and clients Storage system: Windows server Central Linux fileserver PP fileserver Client Windows Central Ubuntu PP Linux Recommended storage H: drive /home folder /home and /data folders Windows storage “H: ” drive or “Y: home” /physics/home PP Storage Y: /Linux. Users/pplinux /data/home, /data/experiment /data/home Central Linux Y: /Linux. Users/home/ /network/home/particle Graduate Lectures 14 th October 2014 4
l Particle Physics Linux Unix Team (Room 661): n n Pete Gronbech - Senior Systems Manager and Grid. PP Project Manager Ewan Mac. Mahon – Grid Systems Administrator Kashif Mohammad – Grid and Local Support Sean Brisbane – Local Server and User Support l General purpose interactive Linux based systems for code development, short tests and access to Linux based office applications. These are accessed remotely. l Batch queues are provided for longer and intensive jobs. Provisioned to meet peak demand give a fast turnaround for final analysis. l Systems run Scientific Linux (SL) which is a free Red Hat Enterprise based distribution. l The Grid & CERN have migrated to SL 6. The majority of the local cluster is also on SL 6, but some legacy SL 5 systems are provided for those that need them. l We will be able to offer you the most help running your code on the newer SL 6. Some experimental software frameworks still require SL 5. 14 th October 2014 Graduate Lectures 5
Current Clusters l Particle Physics Local Batch cluster l Oxfords Tier 2 Grid cluster 14 th October 2014 Graduate Lectures 6
PP Linux Batch Farm Scientific Linux 6 Users log in to the interactive nodes pplxint 8 & 9, the home directories and all the data disks (/home area or /data/group ) are shared across the cluster and visible on the interactive machines and all the batch system worker nodes. Approximately 300 cores (430 incl. JAI/LWFA), each with 4 GB of RAM memory. The /home area is where you should keep your important text files such as source code, papers and thesis The /data/ area is where you should put your big reproducible input and output data jailxwn 02 64 * AMD cores jailxwn 01 64 * AMD cores pplxwn 59 pplxwn 60 Graduate Lectures 16 * Intel cores pplxwnnn pplxwn 41 16 * Intel 2650 cores pplxwn 38 12 * Intel 5650 cores pplxwnnn 12 * Intel 5650 cores pplxwn 32 12 * Intel 5650 cores pplxwn 31 12 * Intel 5650 cores pplxwn 16 8 * Intel 5420 cores pplxwn 15 8 * Intel 5420 cores pplxint 9 pplxint 8 14 th October 2014 16 * Intel cores 16 * Intel 2650 cores Interactive login nodes 7
PP Linux Batch Farm Scientific Linux 5 Legacy SL 5 jobs supported by smaller selection of worker nodes. Currently eight servers with 16 cores each with 4 GB of RAM memory per core. All of your files area available from SL 5 and 6, but the software environment will be different and therefore your code may not run if compiled for the other operating system. pplxwn 30 pplxwnnn 16 * AMD 6128 cores pplxwn 24 pplxwn 23 16 * AMD 6128 cores pplxint 6 pplxint 5 14 th October 2014 Graduate Lectures 16 * AMD 6128 cores Interactive login nodes 8
PP Linux Batch Farm NFS is used to export data to the smaller experimental groups, where the partition size is less than the total size of a server. NFS Servers 40 TB pplxfsn Data Areas 30 TB pplxfsn Data Areas 19 TB pplxfsn Data Storage Home areas The data areas are too big to be backed up. The servers have dual redundant PSUs, RAID 6 and are running on uninterruptible powers supplies. This safeguards against hardware failures, but does not help if you delete files. The home areas are backed up by two different systems nightly. The Oxford ITS HFS service and a local back up system. If you delete a file tell us a soon as you can when you deleted it and it’s full name. The latest nightly backup of any lost or deleted files from your home directory is available at the read-only location /data/homebackup/{username} The home areas are quota’d but if you require more space ask us. Store your thesis on /home NOT /data. 14 th October 2014 Graduate Lectures 9
Particle Physics Computing The Lustre file system is used to group multiple file servers together to provide extremely large continuous file spaces. This is used for the Atlas and LHCb groups. Lustre OSS 04 44 TB df -h /data/atlas Filesystem /lustre/atlas 25/atlas df -h /data/lhcb Filesystem /lhcb 25 Size Used Avail Use% Mounted on 366 T 199 T 150 T 58% /data/atlas Size Used Avail Use% Mounted on 118 T 79 T 34 T 71% /data/lhcb 25 pplxint 5 14 th October 2014 Graduate Lectures 10
14 th October 2014 Graduate Lectures 11
Strong Passwords etc l Use a strong password not open to dictionary attack! n fred 123 – No good n Uaspnotda!09 – Much better l More convenient* to use ssh with a passphrased key stored on your desktop. n Once 14 th October 2014 set up Graduate Lectures 12
Connecting with Pu. TTY Question: How many of you are using Windows? & Linux? On the desktop Demo 1. Plain ssh terminal connection 1. 2. From ‘outside of physics’ From Office (no password) ssh with X windows tunnelled to passive exceed ssh, X windows tunnel, passive exceed, KDE Session Password-less access from ‘outside physics’ 2. 3. 4. 1. See backup slides http: //www 2. physics. ox. ac. uk/it-services/ppunix-cluster 14 th October 2014 Graduate Lectures http: //www. howtoforge. com/ssh_key_based_logins_putty 13
14 th October 2014 Graduate Lectures 14
South. Grid Member Institutions l Oxford RAL PPD Cambridge Birmingham Bristol Sussex l JET at Culham l l l 14 th October 2014 Graduate Lectures 15
Current capacity l Compute Servers n Twin and twin squared nodes – 1770 CPU cores l Storage Total of ~1300 TB n The servers have between 12 and 36 disks, the more recent ones are 4 TB capacity each. These use hardware RAID and UPS to provide resilience. n 14 th October 2014 Graduate Lectures 16
Get a Grid Certificate You will then need to contact central Oxford IT. They will need to see you, with your university card, to approve your request: To: help@it. ox. ac. uk Must remember to use the same PC to request and retrieve the Dear Stuart Robeson and Jackie Hewitt, Grid Certificate. The new me know a I Please let UKCA pagegood time to come over to Banbury road IT office for you to approve my grid certificate request. http: //www. ngs. ac. uk/ukca Thanks. uses a JAVA based CERT WIZARD 14 th October 2014 Graduate Lectures 17
When you have your grid certificate… Save to a filename in run Log in to pplxint 9 andyour home directory on the Linux systems, eg: mkdir. globus Y: Linuxusersparticlehome{username}mycert. p 12 chmod 700. globus cd. globus openssl pkcs 12 -in. . /mycert. p 12 -clcerts -nokeys -out usercert. pem openssl pkcs 12 -in. . /mycert. p 12 -nocerts -out userkey. pem chmod 400 userkey. pem chmod 444 usercert. pem 14 th October 2014
Now Join a VO l This is the Virtual Organisation such as “Atlas”, so: n You are allowed to submit jobs using the infrastructure of the experiment n Access data for the experiment l Speak to your colleagues on the experiment about this. It is a different process for every experiment! 14 th October 2014 Graduate Lectures 19
Joining a VO l l l Your grid certificate identifies you to the grid as an individual user, but it's not enough on its own to allow you to run jobs; you also need to join a Virtual Organisation (VO). These are essentially just user groups, typically one per experiment, and individual grid sites can choose to support (or not) work by users of a particular VO. Most sites support the four LHC VOs, fewer support the smaller experiments. The sign-up procedures vary from VO to VO, UK ones typically require a manual approval step, LHC ones require an active CERN account. For anyone that's interested in using the grid, but is not working on an experiment with an existing VO, we have a local VO we can use to get you started. 14 th October 2014 Graduate Lectures 20
When that’s done l Test your grid certificate: > voms-proxy-init –voms lhcb. cern. ch Enter GRID pass phrase: Your identity: /C=UK/O=e. Science/OU=Oxford/L=Oe. SC/CN=j bloggs Creating temporary proxy. . . . . l Done Consult the documentation provided by your experiment for ‘their’ way to submit and manage grid jobs 14 th October 2014 Graduate Lectures 21
Two Computer Rooms provide excellent infrastructure for the future The New Computer room built at Begbroke Science Park jointly for the Oxford Super Computer and the Physics department, provides space for 55 (11 KW) computer racks. 22 of which will be for Physics. Up to a third of these can be used for the Tier 2 centre. This £ 1. 5 M project was funded by SRIF and a contribution of ~£ 200 K from Oxford Physics. The room was ready in December 2007. Oxford Tier 2 Grid cluster was moved there during spring 2008. All new Physics High Performance Clusters will be installed here. 14 th October 2014 Graduate Lectures 22
Local Oxford DWB Physics Infrastructure Computer Room Completely separate from the Begbroke Science park a computer room with 100 KW cooling and >200 KW power has been built. ~£ 150 K Oxford Physics money. Local Physics department Infrastructure computer room. Completed September 2007. This allowed local computer rooms to be refurbished as offices again and racks that were in unsuitable locations to be re housed. 14 th October 2014 Graduate Lectures 23
Cold aisle containment 14 th October 2014 Graduate Lectures 24
Other resources (for free) l Oxford Advanced Research Computing A shared cluster of CPU nodes, “just” like the local cluster here n GPU nodes n – Faster for ‘fitting’, toy studies and MC generation – *IFF* code is written in a way that supports them Moderate disk space allowance per experiment (<5 TB) n http: //www. arc. ox. ac. uk/content/getting-started n l Emerald Huge farm of GPUs n http: //www. cfi. ses. ac. uk/emerald/ n l Both needs a separate account and project n Come talk to us in RM 661 14 th October 2014 Graduate Lectures 25
The end of the overview Now more details of use of the clusters l Help Pages l http: //www. physics. ox. ac. uk/it/unix/default. htm n http: //www 2. physics. ox. ac. uk/research/particlephysics/particle-physics-computer-support n l ARC n l http: //www. arc. ox. ac. uk/content/getting-started Email n pp_unix_admin@physics. ox. ac. uk 14 th October 2014 Graduate Lectures 26
BACKUP 14 th October 2014 Graduate Lectures 27
Puttygen to create an ssh key on Windows (previous slide point #4) Paste this into ~/. ssh/authorized_keys on pplxint Enter a secure passphrase then : - Enter a strong passphrase - Save the private parts of the key to a subdirectory of your local drive. 14 th October 2014 Graduate Lectures 28
Pageant Run Pageant once after login l Right-click on the pageant symbol and “Add key” for your Private (windows ssh key) l 14 th October 2014 Graduate Lectures 29
3396642fa320dd3d37cb97974066f881.ppt