ca770aebb395bb122e492b1ef4cefef0.ppt
- Количество слайдов: 13
DOE Perspective on Cyberinfrastructure - LBNL Gary Jung Manager, High Performance Computing Services Lawrence Berkeley National Laboratory Educause CCI Working Group Meeting November 5, 2009
Midrange Computing • DOE ASCR hosted a workshop in Oct 2008 to assess the role of mid -range computing in the Office of Science and revealed that this computation continues to play an increasingly important role in enabling the Office of Science. • Although it is not part of ASCR's mission, midrange computing, and the associated data management play a vital and growing role in advancing science in disciplines where capacity is as important as capability. • Demand for midrange computing services is… o o growing rapidly at many sites (>30% growth annually at LBNL) the direct expression of a broad scientific need • Midrange computing is a necessary adjunct to leadership-class facilities 2 November 5, 2009
Berkeley Lab Computing • Gap between desktop and National Centers • Midrange Computing Working Group 2001 • Cluster support program started in 2002 o Services for PI-owned clusters include: Pre purchase consulting; development of specs and RFP, facilities planning, installation and configuration, ongoing cluster support, user services consulting, cybersecurity, computer room colocation • Currently 32 clusters in production, over 1400 nodes, 6500 processor cores • Funding: Institution provides support for infrastructure costs, technical development. Researchers pay for cluster and incremental cost of support. 3 November 5, 2009
Cluster Support Phase II: Perceus Metacluster • All clusters interconnected into shared cluster infrastructure o o Permits sharing of resources, storage ~ Global home file system One ‘super master’ node, used to boot nodes across all clusters ~ multiple system images supported One master job scheduler, submitting to all clusters Simplifies provisioning new systems and ongoing support • Metacluster model made possible by Perceus software o successor to Warewulf (http: //www. perceus. org) o can run jobs across clusters, recapturing stranded capacity. 4 November 5, 2009
5 November 5, 2009
Laboratory-Wide Cluster - Drivers “Computation lets us understand everything we do. ” – LBNL Acting Lab Director Paul Alivisatos 38% of scientists depend on cluster computing for research. 69% of scientists are interested in cycles on a Lab-owned cluster. o early-career scientists twice as likely to be ‘very interested’ than later-career peers Why do scientists at LBNL need midrange computing resources? o ‘on ramp’ activities in preparation for running at supercomputing centers (development, debugging, benchmarking, optimization) o scientific inquiry not connected with ‘on ramp’ activities 6 November 5, 2009
Laboratory-Wide Cluster “Lawrencium” • Overhead funded program o o • • Production in Fall 2008 General purpose Linux cluster suitable for a wide range of applications o o • • • Capital equipment dollars shifted from business computing Overhead funded staffing - 2 FTE 198 -nodes, 1584 cores, DDR Infiniband interconnect 40 TB NFS home directory storage; 100 TB Lustre parallel scratch Commercial job scheduler and banking system #500 on the Nov 2008 Top 500 Open to all LBNL PIs and collaborators on their project Users are required to complete a survey when applying for accounts and later provide feedback on science results No user allocations at this time. This has been successful to date. 7 November 5, 2009
Networking - LBLNet • • Peer at 10 GBE with ESNET 10 Gb. E at core. Moving to 10 Gb. E to the buildings Goal is sustained high speed data flows with cybersecurity Network based IDS approach - traffic is innocent until proven guilty o o o Reactive firewall Does not impede data flow. no stateful firewall. Bro cluster allows us to scale our IDS to 10 GBE 8 November 5, 2009
Communications and Governance • General announcements at IT council • Steering committees used for scientific computing o o o Small group of stakeholders, technical experts, decision makers Helps to validate and communicate decisions Accountability 9 November 5, 2009
Challenges • Funding (past) o o Difficult for IT to shift funding from other areas of computing to support for science Recharge can constrain adoption. Full cost recovery definitely will. • New Technology (ongoing) • Facilities (current) o Computer room is approaching capacity despite upgrades ~ ~ o Environmental Monitoring Plenum in ceiling converted to hot air return Tricks to boost underfloor pressure Water cooled doors Underway ~ DCIE measurement in process ~ Tower and heat exchanger replacement ~ Data Center container investigation 10 November 5, 2009
Next Steps • Opportunities presented by cloud computing o Amazon investigation earlier this year. Others ongoing ~ ~ Latency sensitive applications ran poorly as expected Performance dependent of specific use case Data migration. Economics of storing vs moving Certain LBNL factors favor costs for build instead of buy • Large storage and computation for data analysis • GPU investigation 11 November 5, 2009
Points of Collaboration • UC Berkeley HPCC o o o Recent high profile joint projects between UCB and LBNL encourages close collaboration 25 -30% of scientists have dual appointment UC Berkeley proximity to LBNL facilitates the use of cluster services • University of California Shared Research Computing Services pilot (SRCS) o o o LBNL and SDSC joint pilot for the ten UC campuses Two 272 -node clusters located at UC Berkeley and SDSC Shared computing is more cost-effective Dedicated CENIC L 3 connecting network for integration Pilot consists of 24 research projects 12 November 5, 2009
13 November 5, 2009
ca770aebb395bb122e492b1ef4cefef0.ppt