5c8b0b5dd64ebb9be02b9a5851cf0ef5.ppt
- Количество слайдов: 58
NERSC Status and Plans for the NERSC User Group Meeting February 22, 2001 BILL KRAMER DEPUTY DIVISION DIRECTOR DEPARTMENT HEAD, HIGH PERFORMANCE COMPUTING DEPARTMENT kramer@nersc. gov 510 -486 -7577 NERSC User Group Meeting
Agenda • • Update on NERSC activities IBM SP Phase 2 status and plans NERSC-4 plans NERSC-2 decommissioning NERSC User Group Meeting
ACTIVITIES AND ACCOMPLISHMENTS NERSC User Group Meeting
NERSC Facility Mission To provide reliable, high-quality, state-of-the-art computing resources and client support in a timely manner— independent of client location– while wisely advancing the state of computational and computer science. NERSC User Group Meeting
2001 GOALS • PROVIDE RELIABLE AND TIMELY SERVICE —Systems • Gross Availability, Scheduled Availability, MTBF/MTBI, MTTR —Services • Responsiveness, Timeliness, Accuracy, Proactivity • DEVELOP INNOVATIVE APPROACHES TO ASSIST THE CLIENT COMMUNITY EFFECTIVELY USE NERSC SYSTEMS. • DEVELOP AND IMPLEMENT WAYS TO TRANSFER RESEARCH PRODUCTS AND KNOWLEDGE INTO PRODUCTION SYSTEMS AT NERSC AND ELSEWHERE • NEVER BE A BOTTLENECK TO MOVING NEW TECHNOLOGY INTO SERVICE. • INSURE ALL NEW TECHNOLOGY AND CHANGES IMPROVE (OR AT LEAST DOES NOT DIMINISH) SERVICE TO OUR CLIENTS. NERSC User Group Meeting
GOALS (CON’T) • NERSC AND LBNL WILL BE A LEADER IN LARGE SCALE SYSTEMS MANAGEMENT & SERVICES. • EXPORT KNOWLEDGE, EXPERIENCE, AND TECHNOLOGY DEVELOPED AT NERSC, PARTICULARLY TO AND WITHIN NERSC CLIENT SITES. • NERSC WILL BE ABLE TO THRIVE AND IMPROVE IN AN ENVIRONMENT WHERE CHANGE IS THE NORM • IMPROVE THE EFFECTIVENESS OF NERSC STAFF BY IMPROVING INFRASTRUCTURE, CARING FOR STAFF, ENCOURAGING PROFESSIONALISM AND PROFESSIONAL IMPROVEMENT TIMELY INNOVATIVE INFORMATION ASSISTANCE RELIABLE TECHNOLOGY SERVICE TRANSFER CONSISTENT SERVICE & SYSTEM ARCHITECTURE EXCELLENT STAFF MISSION NEW TECHNOLOGY STAFF EFFECTIVENESS RESEARCH FLOW CHANGE WISE INTEGRATI ON NERSC User Group Meeting NEW TECHNOLOGY LARGE SCALE LEADER SUCCESS FOR CLIENTS AND FACILITY
Major Accomplishments Since last meeting (June 2000) • IBM SP placed into full service April 4, 2000 – more later — Augmented the allocations by 1 M Hours in FY 2000 — Contributed to 11 M PE hours in FY 2000 – more than doubling the FY 2000 allocation — SP is fully utilized • Moved entire facility to Oakland – more later • Completed the second PAC allocation process with lessons learned from the first year NERSC User Group Meeting
Activities and Accomplishments • Improved Mass Storage System — Upgraded HPSS — New versions of HSI — Implementing Gigabit Ethernet — Two STK robots added — Replaced 3490 with 9840 tape drives — Higher density and Higher speed tape drives • Formed Network and Security Group • Succeeded in external reviews — Policy Board — SCAC NERSC User Group Meeting
Activities and Accomplishments • Implemented new accounting system – NIM — Old system was • • Difficult to maintain Difficult to integrate to new system Limited by 32 bits Not Y 2 K compliant — New system • Web focused • Available data base software • Works for any type of system • Thrived in a state of increased security — Open Model — Audits, tests NERSC User Group Meeting
2000 Activities and Accomplishments • NERSC firmly established as a leader in system evaluation — Effective System Performance (ESP) recognized as a major step in system evaluation and is influencing a number of sites and vendors — Sustained System Performance measures — Initiated a formal benchmarking effort to the NERSC Application Performance Simulation Suite (NAPs) to possibly be the next widely recognized parallel evaluation suite NERSC User Group Meeting
Activities and Accomplishments • Formed the NERSC Cluster team to investigate the impact of SMP commodity clusters for High Performance, Parallel Computing to assure the most effective implementations of division resources related to cluster computing. — Coordinate all NERSC Division cluster computing activities (research, development, advanced prototypes, pre-production, and user support). — Initiated a formal procurement for mid-range cluster • In consultation with DOE, decided not to award as part of NERSC program activities NERSC User Group Meeting
NERSC Division HORST SIMON Division Director CHIEF TECHNOLOGIST DAVID BAILEY WILLIAM KRAMER Deputy Director HIGH PERFORMANCE COMPUTING DEPARTMENT HIGH PERFORMANCE COMPUTING RESEARCH DEPARTMENT WILLIAM KRAMER ROBERT LUCAS Department Head DIVISION ADMINISTRATOR & FINANCIAL MANAGER WILLIAM FORTNEY DISTRIBUTED SYSTEMS DEPARTMENT WILLIAM JOHNSTON Department Head DEB AGARWAL, Deputy HENP COMPUTING ADVANCED SYSTEMS TAMMY WELCOME COMPUTATIONAL SYSTEMS APPLIED NUMERICAL ALGORITHMS DAVID QUARRIE IMAGING & COLLABORATIVE COMPUTING PHIL COLELLA BAHRAM PARVIN MASS STORAGE NANCY MEYER JIM CRAW COMPUTER OPERATIONS & NETWORKING SUPPORT WILLIAM HARRIS USER SERVICES FRANCESCA VERDIER CENTER FOR BIOINFORMATICS & COMPUTATIONAL GENOMICS MANFRED ZORN ESMOND NG CENTER FOR COMPUTATIONAL SCIENCE & ENGR. SCIENTIFIC DATA MANAGEMENT MGMT RESEARCH JOHN BELL ARIE SHOSHANI FUTURE INFRASTRUCTURE NETWORKING & SECURITY FUTURE TECHNOLOGIES HOWARD WALTER ROBERT LUCAS (acting) SCIENTIFIC COMPUTING COLLABORATORIES DEB AGARWAL DATA INTENSIVE DIST. COMPUTING BRIAN TIERNEY (CERN) WILLIAM JOHNSTON (acting) DISTRIBUTED SECURITY RESEARCH MARY THOMPSON ARIE SHOSHANI VISUALIZATION ROBERT LUCAS (acting) NETWORKING WILLIAM JOHNSTON (acting) NERSC User Group Meeting Rev: 02/01/01
HIGH PERFORMANCE COMPUTING DEPARTMENT WILLIAM KRAMER Department Head ADVANCED SYSTEMS COMPUTATIONAL SYSTEMS TAMMY WELCOME JAMES CRAW Greg Butler Thomas Davis Adrian Wong Terrence Brewer (C) Scott Burrow (I) Tina Butler Shane Canon Nicholas Cardo Stephan Chan William Contento (C) Bryan Hardy (C) Stephen Luzmoor (C) Ron Mertes (I) Kenneth Okikawa David Paul Robert Thurman (C) Cary Whitney COMPUTER OPERATIONS & NETWORKING SUPPORT WILLIAM HARRIS Clayton Bagwell Jr. Elizabeth Bautista Richard Beard Del Black Aaron Garrett Mark Heer Russell Huie Ian Kaufman Yulok Lam Steven Lowe Anita Newkirk Robert Neylan Alex Ubungen FUTURE INFRASTRUCTURE NETWORKING & SECURITY HOWARD WALTER HENP COMPUTING DAVID QUARRIE * CRAIG TULL (Deputy) Eli Dart Brent Draney Stephen Lau Paolo Calafiura Christopher Day Igor Gaponenko Charles Leggett (P) Massimo Marino Akbar Mokhtarani Simon Patton MASS STORAGE USER SERVICES NANCY MEYER FRANCESCA VERDIER Harvard Holmes Wayne Hurlbert Nancy Johnston Rick Un (V) Mikhail Avrekh Harsh Anand Majdi Baddourah Jonathan Carter Tom De. Boni Jed Donnelley Therese Enright Richard Gerber Frank Hale John Mc. Carthy R. K. Owen Iwona Sakrejda David Skinner Michael Stewart (C) David Turner Karen Zukor (C) Cray (FB) Faculty UC Berkeley (FD) Faculty UC Davis (G) Graduate Student Research Assistant (I) IBM (M) Mathematical Sciences Research Institute (MS) Masters Student (P) Postdoctoral Researcher (SA) Student Assistant (V) Visitor * On leave to CERN NERSC User Group Meeting Rev: 02/01/01
HIGH PERFORMANCE COMPUTING RESEARCH DEPARTMENT ROBERT LUCAS Department Head APPLIED NUMERICAL ALGORITHMS PHILLIP COLELLA Susan Graham (FB) Daniel Graves IMAGING & COLLABORATIVE COMPUTING BAHRAM PARVIN Anton Kast Daniel Martin (P) Peter Mc. Corquodale (P) Greg Miller (FD) Brian Van Straalen Hui H Chan (MS) Ge Cong (V) Donn Davy Inna Dubchak Michael Lijewski Charles Rendleman Carl Anderson Mary Anderson Junmin Gu Jinbaek Kim (G) Andreas Mueller Vijaya Natarajan Frank Olken Ekow Etoo Elaheh Pourabbas (V) Arie Segev (FB) FUTURE TECHNOLOGIES M. Shinkarsky (SA) Alexander Sim John Wu VISUALIZATION ROBERT LUCAS (acting) David Culler (FB) James Demmel (FB) Lin-Wang Michael Wehner (V) Chao Yang Woo-Sun Yang (P) ARIE SHOSHANI JOHN BELL William Crutchfield Marcus Day Jodi Lamoureux (P) Sherry Li Osni Marques Peter Nugent David Raczkowski (P) SCIENTIFIC DATA MANAGEMENT CENTER FOR COMPUTATIONAL SCIENCE & ENGINEERING Ann Almgren Vincent Beckner Qing Yang ESMOND NG Julian Borrill Xiaofeng He (V) Andrew Canning Yun He Chris Ding Parry Husbands (P) Tony Drummond Niels Jensen (FD) Ricardo da Silva (V) Plamen Koev (G) * Sylvia Spengler Sonia Sachs John Taylor SCIENTIFIC COMPUTING CENTER FOR BIOINFORMATICS & COMPUTATIONAL GENOMICS MANFRED ZORN Gerald Fontenay Masoud Nikravesh (V) ROBERT LUCAS (acting) Paul Hargrove Leonid Oliker Eric Roman Erich Stromeier Michael Welcome Katherine Yelick (FB) Edward Bethel James Chen (G) Bernd Hamann (FD) NERSC User Group Meeting James Hoffman (M) David Hoffman (M) Oliver Kreylos (G) Terry Ligocki John Shalf Soon Tee Teoh (G) Gunther Weber (G) (FB) Faculty UC Berkeley (FD) Faculty UC Davis (G) Graduate Student Research Assistant (M) Mathematical Sciences Research Institute (MS) Masters Student 02/01/01 Rev:
FY 00 MPP Users/Usage by Discipline NERSC User Group Meeting
FY 00 PVP Users/Usage by Discipline NERSC User Group Meeting
NERSC FY 00 MPP Usage by Site NERSC User Group Meeting
NERSC FY 00 PVP Usage by Site NERSC User Group Meeting
FY 00 MPP Users/Usage by Institution Type NERSC User Group Meeting
FY 00 PVP Users/Usage by Institution Type NERSC User Group Meeting
NERSC System Architecture FDDI/ ETHERNET 10/100/Gigbit REMOTE VISUALIZATION SERVER MAX SGI STRAT SYMBOLIC MANIPULATION SERVER HIPPI Research Cluster CRI T 3 E 900 644/256 HPSS IBM And STK Robots IBM SP NERSC-3 Processors 604/ 304 Gigabyte memory DPSS PDSF ESnet CRI SV 1 MILLENNIUM LBNL Cluster VIS IBM SP LAB NERSC-3 – Phase 2 a 2532 Processors/ 1824 Gigabyte memory NERSC User Group Meeting
Current Systems System Description IBM SP RS/6000 NERSC-3/Phase 2 a – Seaborg – a 2532 processor systems using 16 CPU SMP Nodes, with the “Colony” Double/Double Switch. Peak performance is ~3. 8 Tflop/s. 12 GB memory per computational node, 20 TB of usable globally accessible parallel disk. NERSC-3/Phase 1 - Gseaborg- a 608 -processor IBM RS/6000 SP system, with a peak performance of 410 gigaflop/s, 256 gigabytes of memory, and 10 terabytes of disk storage. NERSC Production Macine Cray T 3 E NERSC-2 - Mcurie- a 696 -processor, MPP system with a peak speed of 575 gigaflop/s, 256 megabytes of memory per processor, and 1. 5 terabytes of disk storage. A peak CPU performance of 900 MFlops per processor. NERSC production machine. Cray Vector Systems NERSC-2 - Three Cray SV 1 machines. A total of 96 vector processors in the cluster, 4 gigawords of memory, and a peak performance of 83 gigaflop/s. The SV-1 (Killeen) is used for interactive computing. The remaining two SV 1 machines (Bhaskara and Franklin) are batch-only machines. HPSS – Highly Parallel Storage System HPSS is a modern, flexible, performance-oriented mass storage system, designed and developed by a consortium of government and commercial entities. It is deployed at a number of sites and centers, and is used at NERSC for archival storage. PDSF - Parallel Distributed Systems Facility The PDSF is a networked distributed computing environment -- a cluster of workstations -- used by six largescale high energy and nuclear physics investigations for detector simulation, data analysis, and software development. The PDSF includes 281 processors in nodes for computing, and eight disk vaults with file servers for 7. 5 TB of storage. PC Cluster Projects A 36 -node PC Cluster Project Testbed, which is available to NERSC users for trial use; a 12 -node Alpha "Babel" cluster, which is being used for Modular Virtual Interface Architecture (M-VIA) development and Berkeley Lab collaborations. Associated with the LBNL 160 CPU (80 node) cluster system for mid range computing. Myrinet 2000 interconnect, 1 GB of memory per node, and 1 TB of shared, globally accessible disk. NERSC User Group Meeting
Major Systems • MPP — IBM SP – Phase 2 a • • — 158 16 -way SMP nodes 2144 Parallel Application CPUs/12 GB per node 20 TB Shared GPFS 11, 712 GB swap space - local to nodes ~8. 6 TB of temporary scratch space 7. 7 TB of permanent home space • — • • • ~240 Mbps aggregate I/O measured from user nodes (6 Hi. PPI, 2 GE, 1 ATM) — • • 644 Application Pes/256 MB per PE 383 GB of Swap Space - 582 GB Checkpoint File System 1. 5 TB /usr/tmp temporary scratch space - 1 TB permanent home space 281 IA-32 CPUs 3 LINUX and 3 Solaris file servers DPSS integration 7. 5 TB aggregate disk space 4 striped fast Ethernet connections to HPSS LBNL – Mid Range Cluster • • • 160 IA-32 CPUs LINUX with enhancements 1 TB aggregate disk space Myrinet 2000 Interconnect Giga. Ethernet connections to HPSS DMF managed • — ~ 35 MBps aggregate I/O measured from user nodes - (2 Hi. PPI, 2 FDDI) 1. 0 TB local /usr/tmp NERSC User Group Meeting HPSS • • 64 CPUs Total/8 GB of Memory per System (24 GB total) 1. 0 TB local /usr/tmp — 7 - 25 GB home quotas, • Storage PDSF - Linux Cluster • • — T 3 E-900 LC with 696 PEs UNICOS/mk • PVP - Three J 90 SV-1 Systems running UNICOS • — 4 -20 GB home quotas • • Serial • • — 8 STK Tape Libraries 3490 Tape drives 7. 4 TB of cache disk 20 Hi. PPI Interconnects, 12 FDDI connects, 2 GE connections Total Capacity ~ 960 TB ~160 TB in use HPSS - Probe
T 3 E Utilization 95% Gross Utilization Allocation Starvation Full Scheduling Functionality 4. 4% improvement per month Allocation Starvation Checkpoint t - Start of Capability Jobs Systems Merged NERSC User Group Meeting
SP Utilization • In the 80 -85% range which is above original expectations for first year • More variation than T 3 E NERSC User Group Meeting
T 3 E Job Size More than 70% of the jobs are “large” NERSC User Group Meeting
SP Job Size Full size jobs more than 10% of usage ~ 60% of the jobs are > ¼ the maximum size NERSC User Group Meeting
Storage: HPSS NERSC User Group Meeting
NERSC Network Architecture NERSC User Group Meeting
CONTINUE NETWORK IMPROVEMENTS NERSC User Group Meeting
LBNL Oakland Scientific Facility NERSC User Group Meeting
Oakland Facility • 20, 000 sf computer room; 7, 000 sf office space — 16, 000 sf computer space built out —NERSC occupying 12, 000 sf • Ten year lease with 3 five year options • $10. 5 M computer room construction costs • Option for additional 20, 000+ sf computer room NERSC User Group Meeting
LBNL Oakland Scientific Facility Move accomplished between Oct 26 to Nov 4 System SP T 3 E SV 1’s HPSS PDSF Other Systems Scheduled 10/27 – 9 am 11/3 – 10 am 11/6 – 10 am 11/3 – 10 am Actual no outage 11/3 – 3 am 11/2 – 3 pm 10/31 – 9: 30 am 11/2 – 11 am 11/1 – 8 am NERSC User Group Meeting
Computer Room Layout Up to 20, 000 sf of computer space Direct Esnet node at OC 12 NERSC User Group Meeting
2000 Activities and Accomplishments • PDSF Upgrade in conjunction with building move NERSC User Group Meeting
2000 Activities and Accomplishments • net. CDF parallel support developed by NERSC staff for the Cray T 3 E. — A similar effort is being planned to port net. CDF to the IBM SP platform. • Communication for Clusters: M-VIA and MVICH — M-VIA and MVICH are VIA-based software for lowlatency, high-bandwidth, inter-process communication. — M-VIA is a modular implementation of the VIA standard for Linux. — MVICH is an MPICH-based implementation of MPI for VIA. NERSC User Group Meeting
FY 2000 User Survey Results • Areas of most importance to users — — • Highest satisfaction (score > 6. 4) — — • Problem reporting/consulting services (timely response, quality, followup) Training Uptime (SP and T 3 E) FORTRAN (T 3 E and PVP) Lowest satisfaction (score < 4. 5) — — • available hardware (cycles) overall running of the center network access to NERSC allocations process PVP batch wait time T 3 E batch wait time Largest increases in satisfaction from FY 1999 — — — PVP cluster (we introduced interactive SV 1 services) HPSS performance Hardware management and configuration (we monitor and improve this continuously) HPCF website (all areas are continuously improved, with a special focus on topics highlights as needing improvement in the surveys) T 3 E Fortran compilers NERSC User Group Meeting
Client Comments from Survey "Very responsive consulting staff that makes the user feel that his problem, and its solution, is important to NERSC" "Provide excellent computing resources with high reliability and ease of use. " "The announcement managing and web-support is very professional. " "Manages large simulations and data. The oodles of scratch space on mcurie and gseaborg help me process large amounts of data in one go. " "NERSC has been the most stable supercomputer center in the country particularly with the migration from the T 3 E to the IBM SP". "Makes supercomputing easy. " NERSC User Group Meeting
NERSC 3 Phase 2 a/b NERSC User Group Meeting
Result: NERSC-3 Phase 2 a • • • System built and configured Started factory tests 12/13 Expect delivery 1/5 Undergoing acceptance testing General production April 2001 —What is different that needs testing • New Processors, — New Nodes, New memory system • New switch fabric • New Operating System • New parallel file system software NERSC User Group Meeting
IBM Configuration Phase 1 Compute Nodes Processors Phase 2 a/b 256 134* 256 x 2=512 134 x 16=2144* Networking Nodes 8 2 Interactive Nodes 8 2 GPFS Nodes 16 16 Service Nodes 16 4 Total Nodes (CPUs) 304 (604) 158 (2528) Total Memory (compute nodes) 256 GB 1. 6 TB Total Global Disk (user accessible) 10 TB 20 TB Peak (compute nodes) 409. 6 GF 3. 2 TF* Peak (all nodes) 486. 4 GF 3. 8 TF* Sustained System Perf 33 GF 235+ GF/280+ GF Production Dates April 1999 April 2001/Oct 2001 *is a minimum - may increase due to sustained system performance measure NERSC User Group Meeting
What has been completed • 6 nodes added to the configuration • Memory per node increased to 12 GB for 140 compute nodes • “Loan” of full memory for Phase 2 • System installed and braced • Switch adapters and memory added to system • System configuration • Security Audit • System testing for many functions • Benchmarks being run and problems being diagnosed NERSC User Group Meeting
Current Issues • Failure of two benchmarks need to be resolved — Best case: indicate broken hardware – likely with the switch adaptors — Worst case: indicate design and load issues that are fundamental • Variation • Loading and switch contention • Remaining tests — Throughput, ESP — Full System tests — I/O — Functionality NERSC User Group Meeting
General Schedule • Complete testing – TBD based on problem correction • Production Configuration set up — 3 rd party s/w, local tools, queues, etc • Availability Test — Add early users ~10 days after successful testing complete — Gradually add other users – complete ~ 40 days after successful testing • Shut down Phase 1 ~ 10 days after system open to all users — Move 10 TB of disk space – configuration will require Phase 2 downtime • Upgrade to Phase 2 b in late summer, early fall NERSC User Group Meeting
NERSC-3 Sustained System Performance Projections • Estimates the amount scientific computation that can really be delivered — Depends on delivery of Phase 2 b functionality — The higher the last number is the better since the system remains at NERSC for 4 more years Test/Config, Acceptance, etc NERSC User Group Meeting Software lags hardware
NERSC Computational Power vs. Moore’s Law NERSC User Group Meeting
NERSC 4 NERSC User Group Meeting
NERSC-4 • NERSC 4 IS ALREADY ON OUR MINDS — PLAN IS FOR FY 2003 INSTALLATION — PROCUREMENT PLANS BEING FORMULATED — EXPERIMENTATION AND EVALUATION OF VENDORS IS STARTING • ESP, ARCHITECTURES, BRIEFINGS • CLUSTER EVALUATION EFFORTS — USER REQUIREMENTS DOCUMENT (GREENBOOK) IMPORTANT NERSC User Group Meeting
How Big Can NERSC-4 be • Assume a delivery in FY 2003 • Assume no other space is used in Oakland until NERSC-4 • Assume cost is not an issue (at least for now) • Assume technology still progresses — ASCI will have a 30 Tflop/s system running for over 2 years NERSC User Group Meeting
How close is 100 Tflop/s • Available gross space in Oakland is 3, 000 sf without major changes — Assume it is 70% usable — The rest goes to air handlers, columns, etc. • That gives 3, 000 sf of space for racks • IBM system used for estimates — Other vendors are similar • Each processor is 1. 5 Ghz, to yield 6 Gflop/s • An SMP node is made up of 32 processors • 2 Nodes in a frame — 64 processors in a frame = 384 Gflops per frame. • Frames are 32 - 36" wide and 48” deep — service clearance of 3 feet in front and back (which can overlap) — 3 by 7 is 21 sf per frame NERSC User Group Meeting
Practical System Peak • Rack Distribution — 60% of racks are for CPUs • 90% are user/computation nodes • 10% are system support nodes — 20 % of racks are for switch fabric — 20% of racks for disks • 5, 400 sf / 21 sf per frames = 257 frames • 277 nodes that are directly used by computation — 8, 870 CPUS for computation — system total is 9, 856 (308 nodes) • Practical system peak is 53 Tflop/s —. 192 Tflop/s per node * 277 nodes — Some other places would claim 60 Tflop/s NERSC User Group Meeting
How much use will it be • Sustained vs Peak performance — Class A codes on T 3 E samples at 11% — LSMS • 44% of peak on T 3 E • So far 60% of peak on Phase 2 a (maybe more) • Efficiency — T 3 E runs at a 30 day average about 95% — SP runs at a 30 day average about 80+% • Still functionality planned NERSC User Group Meeting
How much will it cost • Current cost for a balanced system is about $7. 8 M per Tflop/s • Aggressive — Cost should drop by a factor of 4 — $1 -2 M per Teraflop/s — Many assumptions • Conservative — $3. 5 M per Teraflop/s • Added costs for install, operate and balance the facility is 20%. • The full cost is $140 M to $250 M • Too Bad NERSC User Group Meeting
The Real Strategy • Traditional strategy within existing NERSC Program funding. Acquire new computational capability every three years - 3 to 4 times capability increase of existing systems • Early, commercial, balanced systems with focus on - stable programming environment - mature system management tools - good sustained to peak performance ratio • Total value of $25 M - $30 M - About $9 -10 M/yr. using lease to own • Have two generations in service at a time - e. g. T 3 E and IBM SP • Phased introduction if technology indicates • Balance other system architecture components NERSC User Group Meeting
Necessary Steps 1) Accumulate and evaluate benchmark candidates 2) Create a draft benchmark suite and run it on several systems 3) Create the draft benchmark rules 4) Set basic goals and options for procurement and then create a draft RFP document 5) Conduct market surveys (vendor briefings, intelligence gathering, etc. ) - we do this after the first items so we can be looking for the right information and also we can tell the vendors what to expect. It is often the case that we have to "market" to the vendors on why they should be bidding - since it costs them a lot. 6) Evaluate alternative and options for RFP and tests. This is also where we do a technology schedule (when what is available) and estimate prices - price/performance, etc. 7) Refine RFP and benchmark rules for final release 8) Go thru reviews 9) Release RFP 10) Answer questions from vendors 11) Get responses - evaluate 12) Determine best value - present results and get concurrence 13) Prepare to negotiate 14) Negotiate 15) Put contract package together 16) Get concurrence and approval 17) Vendor builds the system 18) Factory test 19) Vendor delivers it 20) Acceptance testing (and resolving issues found in testing)- (1 st payment is 2 months after acceptance) 21) Preparation for production 22) Production NERSC User Group Meeting
Rough Schedule Goal – NERSC-4 installation in first half of CY 2003 • Vendor responses (#11) in early CY 2002 • Award in late summer/fall of CY 2002. — This is necessary in order to assure delivery and acceptance (# 22) in FY 2003. • A lot of work and long lead times (for example, we have to account for review and approval times, 90 days for vendors to craft responses, time to negotiate, . . . ) • NERSC Staff kick off meeting first week of march, — Been doing some planning work already. NERSC User Group Meeting
NERSC-2 Decommissioning • RETIRING NERSC 2 IS ALREADY ON OUR MINDS — IF POSSIBLE WE WOULD LIKE TO KEEP NERSC 2 IN SERVICE UNTIL 6 MONTHS BEFORE NERSC 4 INSTALLATION • Therefore, expect retirement at the end of FY 2002 — It is “risky” to assume there will be a viable vector replacement • Team is working to determine possible paths for traditional vector users — Report due in early summer NERSC User Group Meeting
SUMMARY NERSC does an exceptionally effective job delivering services to DOE and other researchers NERSC has made significant upgrades this year that position it well for future growth and continued excellence NERSC has a well mapped strategy for the next several years NERSC User Group Meeting


