d928d05ba496d572862d048f27d9d95c.ppt
- Количество слайдов: 29
Strategie e Soluzioni Fujitsu per il calcolo ad alte prestazioni Cristian Antonucci Maria De Luca Coruzzi Fujitsu #2 in Top 500 ENEA, 3 luglio 2012 0 Copyright 2012 FUJITSU
High Performance Computing with Fujitsu HPC roots K-Computer Development Solution Positioning and Use 1 Copyright 2012 FUJITSU
Fujitsu HPC Servers - past, present and future Fujitsu has been developing the most advanced supercomputers in the world since 1977! PRIMERGY in HPC for 10 years! Se rie s r. S co m pu te Trans-Exascale FX 10 No. 1 in Top 500 (June and Nov. , 2011) K computer World’s Fastest Vector Processor (1999) FX 1 VPP 5000 pe r Su Exascale SPARC Enterprise PRIMEQUEST NWT* Developed with NAL VPP 300/700 Most Efficient Performance in Top 500 (Nov. 2008) Next x 86 generation C lu st er Se rie s x 8 6 Ve ct or s rie e PRIMERGY CX 400 r. S PRIMEPOWER te HPC 2500 ⒸJAXA pu World’s Most VPP 500 om Scalable rc PRIMERGY VP Series Supercomputer e BX 900 p es AP 3000 (2003) Cluster node ri Su e r S HX 600 la Cluster node F 230 -75 APU ca PP S M PRIMERGY RX 200 ar al AP 1000 Cluster node c S Japan’s Largest No. 1 in Top 500 (Nov. 1993) Gordon Bell Prize (1994, 95, 96) Japan’s First Vector (Array) Supercomputer(1977) -1985 Cluster in Top 500 (July 2004) *NWT: Numerical Wind Tunnel 1990 1995 2000 2005 2 2010 2015 2020 Copyright 2012 FUJITSU
K installation at Riken AICS *) in Kobe Air handling units Chillers Kobe AICS facilities and aerial photo LINPACK : 10. 51 PFlops System Mem. : 17. 6 PB No. of racks : 864 No. of CPUs : 88, 128 No. of cores : 705, 024 No. of cables : > 20, 000 Seismic isolation structure 3 rd Computer floor, Oct. 1 st , 2010 (First 8 racks was installed) * : Advanced Institute for Computational Science Courtesy of RIKEN 3 Copyright 2012 FUJITSU
"K computer" Achieves Goal of 10 Petaflops and Tokyo, November 2, 2011 — RIKEN and Fujitsu today announced that the "K computer“, which is a supercomputer currently under their joint development, has achieved a LINPACK benchmark performance of 10. 51 petaflops Highest Performance Efficiency with 93% 4 Copyright 2012 FUJITSU
Fujitsu’s approach to the HPC market Fujitsu covers the HPC customer needs with tailored HPC platforms FX 10 Petascale Supercomputer PRIMERGY HPC solutions NEW Divisional Departmental Workgroup Work Group Power Workstation 5 Copyright 2012 FUJITSU
Fujitsu PRIMEHPC FX 10 Next level of supercomputing introduced Nov. ‘ 11 Node Theoretical computational performance Processor Hardware-based error detection + possible selfrecovery range Hardware-based error detection range Range in which errors do not affect actual operation 6 32 GB, 64 GB 85 GB/s × 2 (bidirectional) / link No. racks Nodes Theoretical computational performance 4 ~ 1, 024 384 ~ 98, 304 90. 8 ~ 23, 248 TFLOPS Total memory Interconnect Cooling Method SPARC 64™ VIII / IX fx RAS coverage ranges SPARC 64™ IXfx (1. 848 GHz, 16 cores) x 1 Memory capacity Memory bandwidth Inter-node transfer rate System 236. 5 GFLOPS 12 ~ 6, 291 TB “Tofu” Interconnect Direct water cooling + air cooling (Optional: Exhaust cooling unit) Copyright 2012 FUJITSU
Processor features Single CPU as a node design SPARC 64 TM IXfx architecture SPARC 64™ V 9 + HPC-ACE (Arithmetic Computational Extensions) 16 core @ 1. 848 GHz Floating-point arithmetic unit (each core) Simultaneous floating-point arithmetic executions (per core) On board Inter. Connect Controller (ICC) High performance per watt 236. 5 Gflops per CPU High-reliability Power-saving Technologies 7 Copyright 2012 FUJITSU
Tofu Interconnect Single CPU and single interconnect controller Very fast node to node communication: • 10 links for inter-node connection • 10 GB/s per link = 5 GB/s x 2 (bi-directional) • Low latency (min 1. 5μs between adjacent nodes) Total 100 GB/s off-chip bandwidth: feeds sufficient data to high performance CPU Topology network Tofu - Fujitsu’s original 6 D Mesh / Torus interconnect • High comunication performance • High system scalability • High fault-tolerance 8 Copyright 2012 FUJITSU
Tofu Interconnect Technology Very fast node to node communication, 5 GB/s x 2 (bi-directional) Low latency (min 1. 5μs between adjacent nodes, max 4. 4μs for 1 Pflops configuration) Integrated MPI support for collective operations (Barrier and Reduction) Topology Physical topology: 6 D Torus / Mesh addressed by (x, y, z, a, b, c) • 10 links / node, 6 links for 3 D torus and 4 links for Node Group User view/Application view : Logical 3 D Torus (X, Y, Z) (1, 2, 1) (0, 2, 1) xyz 3 D Torus (0, 2, 0) (3 : torus) b c (1, 2, 0) (0, 1, 1) a h) (2 : mes (2 : (0, 0, 1) (0, 1, 0) h) (0, 0, 0) (a, b, c) Node Group (12 nodes group, 2 x 3 x 2) z+ x- yx+ (1, 0, 0) y+ z z- y x Node Group 3 D connection 9 3 D connection of each node Copyright 2012 FUJITSU
High Performance Computing with Fujitsu PRIMERGY x 86 based HPC 10 Copyright 2012 FUJITSU
Scale: Density, HA, Flexibility, Capability Positioning of PRIMERGY HPC Portfolio The optimal solution for high-end performance, density, RAS and flexible I/O integration Max espandibility and scalability of I/O slots, HDDs, RAM. Intel Xeon and GPGU card BX 900 NEW Highest flexibility in combining different requirements (I/O, interconnect, fat nodes, thin nodes) RX 300 RX 200 Perfect bridge from desk side to cluster in a box RX 350 CX 400 Optimal integration of HPC capability and capacity requirements in standard rack infrastructures Intel® Xeon® processor E 5 -2600 family with up to 8 GT/s Scale: Density, Performance/Cost , Capacity 11 Copyright 2012 FUJITSU
Modular HPC growth potential towards …. W de pa or rtm kg Scalability, density & energy costs ro en up ta / RX 200 l C NEW CX 400 Da ta Ce nte r BX 400 Data r ente capacity BX 900 Scalability, density & availability Flexibility to address all kinds of customer requirements NEW: skinless server PRIMERGY CX 400 • Massive scale-out due ultra dense server • HPC GPU coprocessor support Latest generation Intel® Xeon® Processor E 5 series Highest memory performance plus high reliability Low latency/high bandwidth Infiniband infrastructure Industry leading blade server density Industry leading I/O bandwidth capability 12 Copyright 2012 FUJITSU
PRIMERGY performance boost by Intel E 5 “Sandy Bridge” New HPC support functions in Romley / Sandy Bridge platform PRIMERGY S 7/S 3 benchmark vs. previous generation Baseline (prev. generation) Efficiency Up to 2 X FLOPS for HPC workloads with Intel® Advanced Vector Extensions (AVE) New operations to enhance vectorization Extended SSE FP register sizes Up to 4 channels DDR 3 1600 Advantages SPECpower SAP SD Server Power +73% +43% +120% HPC LINPACK Performance advantage of up 120% compared to previous generation… Up to 70% more overall performance Up to 120% in HPC scenarios … enables to run more workloads on the same system SPECfp_rate_ base 2006 SPECint_rate_ base 2006 SAP SD VMmark 2. 0 +80% +70% +56% +39% By increased performance of up to 120%, PRIMERGY dual socket servers will Source: Intel support the requirements of today and meet future demand 13 Copyright 2012 FUJITSU
Most energy efficient server in the world Fujitsu PRIMERGY achieves world record in energy efficiency and holds several best in class ratings World record in SPECpower_ssj 2008 by breaking the prestigious milestone of 6, 000 overall ssj_ops/watt http: //ts. fujitsu. com/ps 2/press/read/news_details. aspx? id=6092 Reduce energy consumption and current carbon footprint Up to 73% more performance per Watt compared to the previous generation means • Up to 33% less energy for the same current performance level enables to better meet stringent environmental mandates for data centers • Up to 66% more workloads on current power budget without stressing current data center cooling 14 Copyright 2012 FUJITSU
PRIMERGY CX 400 - HPC Design CX 400 combines high performance computing combined with high density at lower overall investment High Density / Scalability in 2 U Chassis HPC requirements optimally fulfilled Main usage scenario Cloud Up to 4 nodes (1 U) or 2 nodes (2 U) per 2 U chassis 2 x Intel® Xeon® E 5 -2600 processors / node Intel® Xeon® processor E 5 -2400 node coming soon 16 DIMMs, up to 1600 MHz Redundant, hot-plug PSUs for enhanced availability / lower servicing effort Up to 24 x HDD FDR Infiniband interconnect option for highest, most efficient bandwidth and lowest latency GPU Option (2 U node) Support of Intel MIC Q 1/ 2013 planned 15 HPC Copyright 2012 FUJITSU
PRIMERGY CX 400 S 1 – chassis Shared Power 2 x PSU (each@1400 W) 92% efficiency (80 Plus Gold) hot-plug, redundant Shared Cooling back front 4 x 80 W fans, redundant, non h-p Shared Drivebay Hot-plug disk drives - front access Hot-plug server nodes - rear access individual serviceable w/o disruption for other nodes Rear cabling for I/O Air flow: front-to-back Size: 2 U hot-plug PSUs and 4 x CX 250 server nodes 447 x 775 x 87 mm (W x D x H) Fits into standard 19“ racks no need for over-sized rack depth or rear-expander mechanics 16 Copyright 2012 FUJITSU
PRIMERGY CX 2 yy Server nodes Double Performance per U /w condensed dual socket server nodes CX 250 S 1 : 1 U Server Node hot-plug with 2 CPUs standard node for HPC and Cloud Computing 2 x latest Intel® Xeon® processor E 5 -2600 family up to 512 GB RAM 1 x PCIe expansion slot + 1 x mezzanine card CX 250 S 1 half-wide 1 U server node 4 x per CX 400 CX 270 S 1: 2 U Server Node hot-plug with 2 CPUs + GPGPU option HPC optimized node /w GPGPU acceleration 2 x Intel® Xeon® processor E 5 -2600 family up to 512 GB RAM 2 x PCIe expansion slots 1 x GPGPU option: NVIDIA Tesla 20 series: M 2075 / M 2090 2 U height ØUp to 64 Intel Xeon processor cores Ø up to 2. 048 GB Memory Ø up to 36 TB local storage 17 CX 270 S 1 half-wide 2 U server node /w GPGPU option 2 x per CX 400 Copyright 2012 FUJITSU
Storage & Parallel File System with Fujitsu 18 Copyright 2012 FUJITSU
Fujitsu Exabyte FS architecture example Infiniband network FUJITSU EXABYTE FILE SYSTEM PRIMERGY HPC Cluster Master node Fail-over pair File mgmt node MDS node FC Fail-over pair MDS node FC FC FC Fail-over pair OSS 1 node OSS 2 node OSS 3 node OSS 4 node FC FC Ethernet network FUJITSU ETERNUS FC FC FUJITSU ETERNUS FC controller FC FC FC controller DX LUN MDT FUJITSU ETERNUS MDT OST FC controller DX LUN OST FC FC FC controller DX LUN DX LUN OST OST OST A file is not limited to the maximum size of an OST A file’s data blocks can be striped across multiple OSS’s and OST’s File striping improves I/O bandwidth The aggregate I/O bandwidth is the sum of all OSS’s that participate in the file system File system size is the sum of all OST’s that are configured for each of the file system OSS’s 19 Copyright 2012 FUJITSU
Fujitsu Exabytes File System (FEFS) The main enhancements of FEFS are: Scalability • Enabling scalability of file systems from terabytes to a maximum of 8 Exabytes • Offering superior price-performance for clusters consisting of several dozen nodes up to those comprising a million servers High performance • Up to 10, 000 storage servers with the world's highest throughput speed of 1 TB/s • Metadata management capable of creating several tens of thousands of files per second, with between 1 -3 times the performance of Lustre High reliability • Built-in redundancy at all levels of the file system (such as RAID disk, Infini. Band multipath, and configurations of multiple servers and storage units), enables failover while jobs are running Qo. S • Fair share features for allocating resources amongst users prevent individual users from monopolizing I/O processing resources • Priority control settings of each node guarantees enables I/O processing bandwidth control for each node • Directory level quota’s allow efficient use of disk capacity by monitoring and managing file system usage by users 20 Copyright 2012 FUJITSU
Specification of FEFS q Fujitsu expand the system Item Max. File system size 1 PB (8 EB) Max. # of files 32 G (8 x 10 18) files Max OST size 100 TB (1 PB) Max. # of stripes Node Scalability 100 PB (8 EB) Max file size System Limits FEFS 20 K Max # of clients 1 M clients Max # of OST 20 K Max block size (Backend File System) 512 KB Qo. S (Fair share/Best effort) Yes Directory Quota Yes Usability Infiniband Multi-rail Yes 21 Copyright 2012 FUJITSU
Product Lineup Rack mountable modules 3. 5” type Max. drive number: 480 Max. cache capacity: 16 GB 2. 5” type Max. drive number: 960 Max. cache capacity: 96 GB 3. 5” type 2. 5” type ETERNUS DX 8700 S 2 offers a unique modular architecture. Starting with minimum initial costs, storage can be flexibly expanded according to business needs. DX 8700 S 2 3, 072 Maximum drive number Maximum storage capacity (Physical) With 2. 5" drives SAS 2764. 8 [TB] With 2. 5" SAS 900 GB drives Nearline SAS 4608. 0 [TB] With 3. 5" Nearline SAS 3 TB drives 768 [GB] Maximum cache capacity Host interfaces (Port number per device) Remarks FC 2/4/8 G i. SCSI 10 G FCo. E 10 G 22 (128 port) (64 port) Copyright 2012 FUJITSU
PRIMERGY HPC Ecosystem Reference Summary 23 Copyright 2012 FUJITSU
Building Blocks of PRIMERGY HPC Ecosystem Cluster Operation PRIMERGY Server ISV and Research Partnerships PCM Edition Pre. Di. CT Initiative Eternus Storage Open Petascale Libraries Network Consulting and Integration Services Sizing, design Certified system and production environment Proof of concept Complete assembly, pre-installation and quality assurance Integration into customer environment Ready to operate delivery Ready to Go 24 Copyright 2012 FUJITSU
HPC Wales – A Grid of HPC Excellence Initiative’s motivation and background Positioning of Wales at the forefront of supercomputing Promotion of research, technology and skills Improvement of economic development Creation of 400+ quality jobs, 10+ new business Implementation and rollout Distributed HPC clusters among 15 academic sites in Wales Sophisticated tier model with central hubs, tier 1 and 2 sites, portal for transparent, easy use of resources Final rollout of PRIMERGY x 86 clusters in 2012 Performance & Technology Solution Design Fujitsu Value Proposition PRIMERGY CX 250 and BX 922 User-focused solution to access distributed HPC systems from desktop browser Multiple components integrated into a consistent environment with a single sign-on Data accessible across the entire infrastructure, automated movement driven by workflow Collaborative sharing of information and resources Best-in-class technology combination ~200 TFlops aggregated peak performance Infiniband, 10 / 1 Gb Ethernet Eternus DX online SAN (home FS) Parallel File System (up to 10 GB/s) DDN Lustre Backup & Archiving Symantec, Quantum 25 Latest Intel processor technology Mix of Linux and Windows OS Complete tuned software stack Full service engagement Completely integrated design Comprehensive engagement model at all levels (research, development, business) Professional delivery management and governance End-to-end program management Copyright 2012 FUJITSU
Summary : 35 years leadership in HPC q. . with K computer, PRIMEHPC FX 10, and PRIMERGY Servers q Achieving petascale HPC today, and ready for exascale tomorrow q HPC is instrumental for computational sciences and product design in industry q Fujitsu provides HPC solutions for each size of a problem and is enabling adoption and efficient usage of HPC in science and industry 26 Copyright 2012 FUJITSU
C. Antonucci Presales Practice Email: cristian. antonucci@ts. fujitsu. com M. De Luca Coruzzi Presales Practice Email: maria. delucacoruzzi@ts. fujitsu. com 27 Copyright 2012 FUJITSU
28 Copyright 2012 FUJITSU