Скачать презентацию Datacentre Management HPC practices Dr P Sambath Скачать презентацию Datacentre Management HPC practices Dr P Sambath

1a017e6e0df1042806de7fb9cf7150e1.ppt

  • Количество слайдов: 43

Datacentre Management – HPC practices Dr. P. Sambath Narayanan Senior technology Architect Customer Experience Datacentre Management – HPC practices Dr. P. Sambath Narayanan Senior technology Architect Customer Experience centre Sun Microsystems

The Networked Data Center No longer about just deploying more technology Computer Centre has The Networked Data Center No longer about just deploying more technology Computer Centre has a strategic role in the Organization IT infrastructure must provide competitive advantage Data is perceived as an asset The right data, right place, right time, right cost Page 2

The Growing Challenge Decreasing costs of HPC class computing allows more systems level research The Growing Challenge Decreasing costs of HPC class computing allows more systems level research to be done in all fields ● However, the complexity of today's HPC environments keep many from fully participating ●Open standards allow innovation across communities of researchers ●

Leveraging Innovation Throughout Sun Java Stor. Edge TM Sun Grid Solaris. TM 10 Operating Leveraging Innovation Throughout Sun Java Stor. Edge TM Sun Grid Solaris. TM 10 Operating System Java Software. Ubiquity Data Enterpris Services e TM Sun Stor. Edge Platform System SAM-FS & QFS JXT A RIS C NFS J 2 EE, J 2 ME XML Software Networ k Identity TCP/IP in Every System Customize 1980 Standardize 1990 Utilize 2000 Page 4

Sun’s Open Source Initiatives $500 M and 3, 000 person/years = largest EVER contributed Sun’s Open Source Initiatives $500 M and 3, 000 person/years = largest EVER contributed body of code 850+ Members, 250+ JSRs 3 Complete J 2 EE/J 2 SE versions 2 Complete J 2 ME versions First Java IDE to support J 2 SE 5. 0 language features Over 150 members Sun: 1 st Liberty-enabled Identity Management offering 400 M+ Liberty-enabled identities and clients forecast by Y. E. 2005 Now on Java. net—monthly snapshots J 2 SE 6 JRL license, open New dialog Over 1, 300 projects, 18 communities Hosts JSRs, over 110 user groups 7. 5 M Lines of code, 2 nd largest contribution EVER (after Solaris) Translated into 45 languages

The Growing Challenge Grand challenge problems are too big for any single institution ● The Growing Challenge Grand challenge problems are too big for any single institution ●

Page 7 Page 7

Application Segmentation Page 8 Application Segmentation Page 8

Scale Up ● ● ● Large databases Enterprise apps— CRM, ERP, SCM Data warehousing, Scale Up ● ● ● Large databases Enterprise apps— CRM, ERP, SCM Data warehousing, business intelligence Server consolidation/ Scale Out Scale Up mainframe rehosting ● Web services, mail, ● Large databases messaging, security, firewall ● Enterprise apps— ● Applications server, CRM, ERP, SCM database, ERP, CRM ● Data warehousing, ● HPC, Compute Grid solutions business intelligence ● Distributed databases ● Server consolidation/ mainframe rehosting ● Load balancing, business logic ● Network-facing, I/O intensive ● Server consolidation Scale Up ● Scale Out Sun Proprietary/Confidential: Internal Use Only 9

Towards A Single Campus Grid Untapped resources are available for everyone. Page 10 Towards A Single Campus Grid Untapped resources are available for everyone. Page 10

HPC Cluster Page 11 HPC Cluster Page 11

Compute Node Evolution Page 12 Compute Node Evolution Page 12

Top 5 HPC Requirements Top 5 HPC Requirements

Why Opteron? Page 14 Why Opteron? Page 14

63% Less Power Consumption Sun Fire X 4100 550 W vs. 1470 W Page 63% Less Power Consumption Sun Fire X 4100 550 W vs. 1470 W Page 15

Why Opteron? Page 16 Why Opteron? Page 16

75% Smaller Rackmount Size Sun Fire X 4100 1 U vs. 4 U Page 75% Smaller Rackmount Size Sun Fire X 4100 1 U vs. 4 U Page 17

Top 5 HPC Requirements Top 5 HPC Requirements

What is Infini. Band (IB)? • High performance interconnect > High bandwidth - 8/16 What is Infini. Band (IB)? • High performance interconnect > High bandwidth - 8/16 Gb/s today, roadmap to 96 Gb/s > Low latency - less than 10 microseconds > Low overhead - RDMA transport engine moves data reliably between applications • Becoming standard in HPC • Datacenter - Distributed DBs (Oracle RAC) & other apps, rack systems Page 19

IB in Solaris • Basic IB infrastructure in Solaris 10 • Continuing development driven IB in Solaris • Basic IB infrastructure in Solaris 10 • Continuing development driven by > New interface HW with improved capabilities > RAS requirements > New services, e. g storage connection > Parity with IB on Linux - Open. IB Page 20

N 1 Definition Software and services for lifecycle management of compute services and infrastructure N 1 Definition Software and services for lifecycle management of compute services and infrastructure Page 21

N 1 Methodology Release Management Change Management Service Management Billing Orchestration Business Services Network N 1 Methodology Release Management Change Management Service Management Billing Orchestration Business Services Network Servers Storage Page 22

N 1 System Manager: Provision Discover bare metal servers Discover systems with OS Provision N 1 System Manager: Provision Discover bare metal servers Discover systems with OS Provision Solaris, Linux Create multiple OS Profiles Provision Solaris patches Provision Red. Hat RPMs Provision and update firmware Page 23

N 1 System Manager: Monitor Hardware monitoring OS Monitoring Monitor server “reachability” Define thresholds N 1 System Manager: Monitor Hardware monitoring OS Monitoring Monitor server “reachability” Define thresholds Log Events Send notifications Industry standards (SNMP, IPMI) Page 24

N 1 System Manager: Manage Remote power on/off Remote command execution Hybrid UI Scriptable N 1 System Manager: Manage Remote power on/off Remote command execution Hybrid UI Scriptable CLI Role-based access control Remote serial console Page 25

Hybrid UI Page 26 Hybrid UI Page 26

ILOM – Integrated Lights Out Manager • Lights Out Management for Sun Fire systems ILOM – Integrated Lights Out Manager • Lights Out Management for Sun Fire systems • Provides full local or remote access for setup, maintenance and on-going monitoring/management of a single system • Full remote KVM functionality > Including remote media support • Browser-based UI and full CLI • Access via Management Ethernet port, Serial port or Host OS (with suitable driver) • Standards supported include HTTPS, LDAP, SSH 2. 0, SNMP v 1, v 2 c, v 3, IMPI 2. 0, DMTF 'SMASH' CLI Page 27

OS Options The No. 1 Unix Red Hat and Su. SE Operating System. Enterprise OS Options The No. 1 Unix Red Hat and Su. SE Operating System. Enterprise Linux Available for All Sun Systems. Sun x 64 Systems. All Sun x 64 Systems are certified to run Microsoft Windows

Solaris 10 Blazing Performance 27 Performance World Records V 20 Z's to E 25 Solaris 10 Blazing Performance 27 Performance World Records V 20 Z's to E 25 K's Multiple Workloads Single Core Platforms Multi Core Platforms Page 29

Solaris 10 Enterpriseclass Ecosystem 2, 300, 000+ Licenses 1, 000's of Applications 400+ Supported Solaris 10 Enterpriseclass Ecosystem 2, 300, 000+ Licenses 1, 000's of Applications 400+ Supported Platforms Follow the Sun Support Guaranteed Compatibility Vibrant Open Source Community Page 30

Solaris 10 and Sun's Opteron Servers • Performance > Platform specific optimizations > Optimized Solaris 10 and Sun's Opteron Servers • Performance > Platform specific optimizations > Optimized Memory management > 20 years of Multi-thread-tuning > Near-linear scalability > DTrace for massive performance opportunities • Consolidation > Limitless partitioning with one license • Multi-Core Support > Tools, Predictive Self-Healing, Scheduler • Compatibility Guaranteed Page 31

Enterprise-class x 64 Features Dynamic Tracing (DTrace) Solaris Containers Predictive Self-Healing ZFS Secure Execution Enterprise-class x 64 Features Dynamic Tracing (DTrace) Solaris Containers Predictive Self-Healing ZFS Secure Execution Open Source Application Stack Page 32

Sun Grid Rack System Grids Made Easy • Easy and fast way to buy Sun Grid Rack System Grids Made Easy • Easy and fast way to buy and deploy grids > Integrated Racks directly from Sun's factory > Any combination of x 64 servers • Sample configurations with rules derived from real grid experience > Several HPTC sample configurations Page 33

Sun Grid Rack System Updated for Sun Fire x 2100, Sun Fire x 4100 Sun Grid Rack System Updated for Sun Fire x 2100, Sun Fire x 4100 & Sun Fire x 4200 Servers • Easy-to-use web configurator > Sample Configurations for industry applications • Server nodes > Sun Fire x 2100, Sun Fire x 4100, & Sun Fire x 4200 servers > Solaris 10 OS for X 86, or Linux • Infrastructure: > Interconnect: 3 rd Party (Cisco and others) > Software : Sun N 1 System Manager, Sun N 1 Grid Engine > Web Services option: N 1000 series switches, Sun N 1 Service Provisioning System, Sun Java Page 34

Sun Grid Rack System Sun Customer Ready Systems Program Sun CRS Delivered Quality and Sun Grid Rack System Sun Customer Ready Systems Program Sun CRS Delivered Quality and Lower Risk • Higher > Helps reduce cost and risk in the deployment of horizontally scaled architectures • Agile Deployment > Accelerate the deployment of grid-enabled applications by up to 90%, and reduce initial installation issues by up to 80% • Higher Utilization > With Sun N 1 Grid Engine software, customers experience up to 90% system utilization rate • Easier to Manage > Redefining the entire rack system as the building block for the grid • Lower Power and Cooling Costs Page 35

Logo Here Faster risk analysis, 30% less uses New Energy heat Sun Fire x Logo Here Faster risk analysis, 30% less uses New Energy heat Sun Fire x 64 servers and Solaris 10 to create a compute grid for faster Monte Carlo analysis, while generating 30% less heat than competing alternatives. Page 36

HPC Cluster based on Opteron Lower Power and Cooling Costs Advanced Remote Management Industry HPC Cluster based on Opteron Lower Power and Cooling Costs Advanced Remote Management Industry Standard Design HPC cluster based on Opteron architecture Enterprise. Class Features High Performance Flexible Choices of Multiple OSes Page 37

Sun Top 500 Systems, 6 -2005 #37, USC, V 60 x, 2640 Xeon CPU Sun Top 500 Systems, 6 -2005 #37, USC, V 60 x, 2640 Xeon CPU (also some IBM Xeon and Dell x 64) – Fell from #31 despite increase from 5. 7 ->7. 2 TF ●#109, Nottingham, UK, V 20 z, 1024 Opteron CPU ●#172, Aachen, Germany, SF 25 K, 672 USIV CPU ●#347, Idaho National Labs, V 20 z, 460 Opteron CPU ●#404, DLR Germany, V 20 z, 384 Opteron CPU ●#446, Cambridge UK, SF 15 K, 900 USIII CPU ●

You Can Participate Too ● Sun HPC Consortium at SC|05 Seattle, November 2005 ● You Can Participate Too ● Sun HPC Consortium at SC|05 Seattle, November 2005 ● Sun Application Tuning Seminar + HPCC – Aachen Germany, Spring 2006 – Sun HPC Consortium at CCGrid ● – Singapore, May 2006

Sun's HPC Solution Center in Oregon Ribbon-Cutting November, 2005 Demonstrates renewed Sun focus on Sun's HPC Solution Center in Oregon Ribbon-Cutting November, 2005 Demonstrates renewed Sun focus on Technical Computing and return to TOP 500 leadership ● 6 TFLOPS (1536 cores of x 64) for large-scale cluster testing and benchmarks ● Deployed in record time with Sun Grid Rack System ● Offers customers more flexibility and choice: Puts Sun on short list of Terascale compute vendors ● Page 40

Datacentre Management – HPC practices Dr. P. Sambath Narayanan sambath. narayanan@sun. com Datacentre Management – HPC practices Dr. P. Sambath Narayanan sambath. [email protected] com

FS/Volume Model vs. ZFS FS/Volume I/O Stack Block Device Interface FS • “Write this FS/Volume Model vs. ZFS FS/Volume I/O Stack Block Device Interface FS • “Write this block, then that block, . . . ” Object-Based Transactions ZFS • “Make these 7 changes to these 3 objects” • Loss of power = loss of on-disk consistency • Workaround: journaling, which is slow & Block Device Interface complex • Write each block to each disk immediately to keep mirrors in sync ZFS I/O Stack • All-or-nothing Transaction Group Commit Volume DMU • Again, all-or-nothing • Always consistent on disk • No journal – not Batch Transaction Group needed I/O • Loss of power = resync • Schedule, aggregate, and issue I/O at will • Synchronous and slow Storage Pool • No resync if power lost Page 42

The Sun x 64 Advantage • Enterprise focus > World record performance w/ near The Sun x 64 Advantage • Enterprise focus > World record performance w/ near linear scalability > Design features: streamlining RAS for the x 64 space > Superior energy-efficiency advantage over Intel-based systems > Dual core ready • Only x 64 single RU server with a list price below $750 USD • An integrated system solution from a single vendor > > Hardware Networking Middleware Solaris OS Page 43