Скачать презентацию GRACE at UCL 2 When one size Скачать презентацию GRACE at UCL 2 When one size

31e25f93a7c0c96e1049ae52b31cae79.ppt

  • Количество слайдов: 16

GRACE at UCL GRACE at UCL

2 When one size can't fit all: Scalable HPC For Research Delivery ISD/RITS/RCPS - 2 When one size can't fit all: Scalable HPC For Research Delivery ISD/RITS/RCPS - Owain Kenway Grace/Legion/Software Stack/Legion DI www. ucl. ac. uk/research-it-services

3 State of Research Computing Services: Legion has been UCL's primary local compute resource 3 State of Research Computing Services: Legion has been UCL's primary local compute resource since 2007. Almost none of the original hardware is still in service. Gradual upgrade over time. Absorbing other services. 7 year old core network technology – 1 G Ethernet www. ucl. ac. uk/research-it-services

4 State of Research Computing Services: Legion Gradual upgrade over time means service is 4 State of Research Computing Services: Legion Gradual upgrade over time means service is fragmented: 8 Different node types! Some have Infiniband, some don't! PIs buy the hardware they need. www. ucl. ac. uk/research-it-services

5 Parallel vs Serial In general: Iridis 3 → parallel Legion → high throughput 5 Parallel vs Serial In general: Iridis 3 → parallel Legion → high throughput Parallel Single job spans multiple nodes Tightly coupled parallelisation usually in MPI Sensitive to network performance Currently primarily chemistry, physics, engineering High throughput Lots (tens of thousands) of independent jobs on different data High I/O Currently, primarily biosciences and physics In the future, digital humanities www. ucl. ac. uk/research-it-services

6 Parallel Many processes on many processors work simultaneously + communicate between each other 6 Parallel Many processes on many processors work simultaneously + communicate between each other Input Data Output Data www. ucl. ac. uk/research-it-services

7 Many processes, operate independently of each other and in any order High Throughput 7 Many processes, operate independently of each other and in any order High Throughput Output Data Input Data www. ucl. ac. uk/research-it-services

8 Iridis Retirement Luckily, we had £ 1. 5 million to spend! In summer 8 Iridis Retirement Luckily, we had £ 1. 5 million to spend! In summer 2015, Southampton were due to retire Iridis This means that we would lose ~71 Tera. Flops of compute capacity. And the ability to run large parallel jobs! We also wanted to retire the original Legion hardware which was 7 years old! Losing another 20 Tera. Flops www. ucl. ac. uk/research-it-services

9 State of Research Computing Services: Grace went “into service” on the nd December 9 State of Research Computing Services: Grace went “into service” on the nd December 2015. Complete new 2 service for parallel compute. All nodes are connected to storage by 40 gigabit infiniband. Infiniband is primary network in the cluster (IP over IB – looks like a “normal” network). Designed with network capacity to double size over time. www. ucl. ac. uk/research-it-services

10 To replace UCL's Iridis 3 service and retired Legion nodes we required ~90 10 To replace UCL's Iridis 3 service and retired Legion nodes we required ~90 Tera. Flops sustained Grace was benchmarked at ~180 Tera. Flops www. ucl. ac. uk/research-it-services

11 Legion Grace www. ucl. ac. uk/research-it-services 11 Legion Grace www. ucl. ac. uk/research-it-services

12 Legion/Grace have a common software stack. Red Hat Enterprise Linux + Son of 12 Legion/Grace have a common software stack. Red Hat Enterprise Linux + Son of Grid Engine + Environment modules Common set of Compilers (so you can compile your own code) Libraries Applications It's likely the application you use is already available or we can install it for you Scripted builds of applications (so we can easily install new versions for you) x. CAT management software (which allows us to manage the cluster) Easy to move between the services (you have the same environment on both machines) www. ucl. ac. uk/research-it-services

13 Wherever possible the UCL Research Computing Platform Services Team's work is Open Source 13 Wherever possible the UCL Research Computing Platform Services Team's work is Open Source and on Github: https: //github. com/UCL-RITS/rcps-buildscripts https: //github. com/UCL-RITS/rcps-modulefiles You can deploy it on your resources/desktop (application licenses permitting) www. ucl. ac. uk/research-it-services

14 The Future – Legion “Data Intensive” Although Legion now does only high throughput 14 The Future – Legion “Data Intensive” Although Legion now does only high throughput computing, it's not designed for it. Some issues with I/O We need to retire some old hardware. So the next major upgrade is redesigning Legion for HTC. Replace old “Nehalem” nodes. Replace/upgrade 1 G Ethernet I/O subsystem. Local mirroring of common datasets. Coming ~summer 2017! The then current iteration of the software stack. www. ucl. ac. uk/research-it-services

15 None of this would have been possible without: UCL: Dr Ian Kirker, Heather 15 None of this would have been possible without: UCL: Dr Ian Kirker, Heather Kelly, Brian Alston, Thomas Jones, Luke Sudbery, William Hay, Colin Byelong, Prof. Dario Alfe, Dr Javier Herrero, Dr Jörg Saßmannshausen, Mike Atkins, Greg Dyer OCF/Lenovo/DDN Georgina Ellis, Arif Ali, Jagjit Reehal, Jim Roche, Richard Mansfield and certainly many, many others. THANKS! www. ucl. ac. uk/research-it-services

Grace has effectively doubled the capacity for parallel compute available to researchers at UCL Grace has effectively doubled the capacity for parallel compute available to researchers at UCL Visit www. ucl. ac. uk/research-it-services/grace Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut efficitur ipsum vitae tortor accumsan, a pulvinar lorem lacinia. Donec eu arcu justo. Fusce eget consequat risus Proin est lacus, interdum vitae feugiat quis, faucibus vel mi. Vivamus accumsan nisi vel nulla viverra semper. Donec purus enim, sollicitudin vitae porta a, commodo sodales justo. Sed iaculis rutrum molestie. to download these slides after the event. What did you think? Join the conversation on Twitter with #Grace. At. UCL. Don’t forget to follow us for access to the event video and today’s polling results.