
bf690c685f1db874731fe9a6999a5427.ppt
- Количество слайдов: 19
“The Pacific Research Platform: a Science-Driven Big-Data Freeway System. ” Opening Presentation Pacific Research Platform Workshop Calit 2’s Qualcomm Institute University of California, San Diego October 14, 2015 Dr. Larry Smarr Director, California Institute for Telecommunications and Information Technology Harry E. Gruber Professor, Dept. of Computer Science and Engineering Jacobs School of Engineering, UCSD http: //lsmarr. calit 2. net 1
Vision: Creating a West Coast “Big Data Freeway” Connected by CENIC/Pacific Wave to Internet 2 & GLIF Use Lightpaths to Connect All Data Generators and Consumers, Creating a “Big Data” Freeway Integrated With High Performance Global Networks “The Bisection Bandwidth of a Cluster Interconnect, but Deployed on a 20 -Campus Scale. ” This Vision Has Been Building for Over a Decade
NSF’s Opt. IPuter Project: Using Supernetworks to Meet the Needs of Data-Intensive Researchers LS Slide 2005 2003 -2009 $13, 500, 000 Opt. IPortal– Termination Device for the Opt. IPuter Global Backplane In August 2003, Jason Leigh and his students used RBUDP to blast data from NCSA to SDSC over the Tera. Grid DTFnet, achieving 18 Gbps file transfer out of the available 20 Gbps Calit 2 (UCSD, UCI), SDSC, and UIC Leads—Larry Smarr PI Univ. Partners: NCSA, USC, SDSU, NW, TA&M, Uv. A, SARA, KISTI, AIST Industry: IBM, Sun, Telcordia, Chiaro, Calient, Glimmerglass, Lucent
Quartzite: The Optical Core of the UCSD Campus-Scale Testbed -Evaluating Packet Routing versus Lambda Switching Funded by NSF MRI Grant Goals by 2007: >= 50 endpoints at 10 Gig. E >= 32 Packet switched >= 32 Switched wavelengths >= 300 Connected endpoints Lucent Approximately 0. 5 TBit/s Arrive at the “Optical” Center of Campus Switching will be a Hybrid Combination of: Packet, Lambda, Circuit -OOO and Packet Switches Already in Place LS Slide 2005 Glimmerglass Chiaro Networks Source: Phil Papadopoulos, SDSC, Calit 2
Integrated “Opt. IPlatform” Cyberinfrastructure System: A 10 Gbps Lightpath Cloud HD/4 k Telepresence End User Opt. IPortal LS 2009 Slide HD/4 k Video Cams Instruments 10 G Lightpath HPC National Lambda. Rail Campus Optical Switch Data Repositories & Clusters HD/4 k Video Images
So Why Don’t We Have a National Big Data Cyberinfrastructure? “Research is being stalled by ‘information overload, ’ Mr. Bement said, because data from digital instruments are piling up far faster than researchers can study. In particular, he said, campus networks need to be improved. High-speed data lines crossing the nation are the equivalent of six-lane superhighways, he said. But networks at colleges and universities are not so capable. “Those massive conduits are reduced to two-lane roads at most college and university campuses, ” he said. Improving cyberinfrastructure, he said, “will transform the capabilities of campus-based scientists. ” -- Arden Bement, the director of the National Science Foundation May 2005
DOE ESnet’s Science DMZ: A Scalable Network Design Model for Optimizing Science Data Transfers • A Science DMZ integrates 4 key concepts into a unified whole: – A network architecture designed for high-performance applications, with the science network distinct from the general-purpose network – The use of dedicated systems for data transfer – Performance measurement and network testing systems that are regularly used to characterize and troubleshoot the network Greg Bell, Director ESnet On Panel – Security policies and enforcement mechanisms that are tailored for high performance science environments The DOE ESnet Science DMZ and the NSF “Campus Bridging” Taskforce Report Formed the Basis for the NSF Campus Cyberinfrastructure Network Infrastructure and Engineering (CC-NIE) Program Science DMZ Coined 2010 http: //fasterdata. es. net/science-dmz/
Based on Community Input and on ESnet’s Science DMZ Concept, NSF Has Funded Over 100 Campuses to Build Local Big Data Freeways Red 2012 CC-NIE Awardees Yellow 2013 CC-NIE Awardees Green 2014 CC*IIE Awardees Blue 2015 CC*DNI Awardees Purple Multiple Time Awardees Source: NSF
Creating a “Big Data” Freeway on Campus: NSF-Funded CC-NIE Grants Prism@UCSD and CHeru. B CHERu. B Prism@UCSD, Phil Papadopoulos, SDSC, Calit 2, PI (2013 -15) CHERu. B, Mike Norman, SDSC PI
The Pacific Research Platform Creates a Regional End-to-End Science-Driven “Big Data Freeway System” NSF CC*DNI $5 M 10/2015 -10/2020 PI: Larry Smarr, UC San Diego Calit 2 Co-Pis: • Camille Crittenden, UC Berkeley CITRIS, • Tom De. Fanti, UC San Diego Calit 2, • Philip Papadopoulos, UC San Diego SDSC, • Frank Wuerthwein, UC San Diego Physics and SDSC Amy Walton, PRP NSF Program Officer on Panel CENIC/PW Backplane – Louis Fox, CEO CENIC, on Panel
FIONA – Flash I/O Network Appliance: Linux PCs Optimized for Big Data FIONAs Are Science DMZ Data Transfer Nodes & Optical Network Termination Devices UCSD CC-NIE Prism Award & UCOP Phil Papadopoulos & Tom De. Fanti Joe Keefe & John Graham UCOP Rack-Mount Build: Cost $8, 000 $20, 000 Intel Xeon Haswell Multicore E 5 -1650 v 3 6 -Core 2 x E 5 -2697 v 3 14 -Core RAM 128 GB 256 GB SSD SATA 3. 8 TB Network Interface 10/40 Gb. E Mellanox 2 x 40 Gb. E Chelsio+Mellanox GPU NVIDIA Tesla K 80 RAID Drives 0 to 112 TB (add ~$100/TB)
A UCSD Integrated Digital Infrastructure Project for Big Data Requirements of Rob Knight’s Lab – PRP Does This on a Sub-National Scale Knight 1024 Cluster In SDSC Co-Lo Gordon Knight Lab Data Oasis 7. 5 PB, 200 GB/s CHERu. B 100 Gbps 120 Gbps 10 Gbps FIONA 12 Cores/GPU 128 GB RAM 3. 5 TB SSD 48 TB Disk 10 Gbps NIC Emperor & Other Vis Tools 40 Gbps Prism@UCSD 64 Mpixel Data Analysis Wall
FIONAs as Uniform DTN End Points FIONA DTNs Existing DTNs As of October 2015 UC FIONAs Funded by UCOP “Momentum” Grant Tom Andriola, UCOP CIO on Panel
Ten Week Sprint to Demonstrate the West Coast Big Data Freeway System: PRPv 0 FIONA DTNs Now Deployed to All UC Campuses And Most PRP Sites Presented at CENIC 2015 March 9, 2015
PRP First Application: Distributed IPython/Jupyter Notebooks: Cross-Platform, Browser-Based Application Interleaves Code, Text, & Images IJulia IHaskell IFSharp IRuby IGo IScala IMathics Ialdor Lua. JIT/Torch Lua Kernel IRKernel (for the R language) IErlang IOCaml IForth IPerl 6 Ioctave Calico Project • kernels implemented in Mono, including Java, Iron. Python, Boo, Logo, BASIC, and many others IScilab IMatlab ICSharp Bash Clojure Kernel Hy Kernel Redis Kernel jove, a kernel for io. js IJavascript Calysto Scheme Calysto Processing idl_kernel Mochi Kernel Lua (used in Splash) Spark Kernel Skulpt Python Kernel Meta. Kernel Bash Meta. Kernel Python Brython Kernel IVisual VPython Kernel Source: John Graham, QI
PRP Has Deployed Powerful FIONA Servers at UCSD and UC Berkeley to Create a UC-Jupyter Hub Backplane FIONAs Have GPUs and Can Spawn Jobs to SDSC’s Comet Using in. Common CILogon Authenticator Module for Jupyter. Deep Learning Libraries Have Been Installed Source: John Graham, QI
Pacific Research Platform Multi-Campus Science Driver Teams • Particle Physics • Astronomy and Astrophysics – Telescope Surveys – Galaxy Evolution – Gravitational Wave Astronomy • Biomedical – Cancer Genomics Hub/Browser – Microbiome and Integrative ‘Omics – Integrative Structural Biology Key Task for This Workshop: Determine the Big Data Needs of These Teams and Translate into PRP Cyberinfrastructure Requirements • Earth Sciences – – Data Analysis and Simulation for Earthquakes and Natural Disasters Climate Modeling: NCAR/UCAR California/Nevada Regional Climate Data Analysis CO 2 Subsurface Modeling • Scalable Visualization, Virtual Reality, and Ultra-Resolution Video 17
Science Teams Require High Bandwidth Across Campus and Between Campuses and National Facilities Big Data Flows Add to Commodity Internet to Fully Utilize CENIC’s 100 G Campus Connection • Connecting Scientific Instrument Data Production to Remote Campus Compute & Storage Clusters • Providing Access to Remote Data Repositories • Bringing Supercomputer Data to Local Users • Enabling Remote Collaborations • MORE?
PRP Timeline • PRPv 1 – – – A Layer 3 System Completed In 2 Years Tested, Measured, Optimized, With Multi-domain Science Data Bring Many Of Our Science Teams Up Each Community Thus Will Have Its Own Certificate-Based Access To its Specific Federated Data Infrastructure. • PRPv 2 – Advanced Ipv 6 -Only Version with Robust Security Features – e. g. Trusted Platform Module Hardware and SDN/SDX Software – Support Rates up to 100 Gb/s in Bursts And Streams – Develop Means to Operate a Shared Federation of Caches
bf690c685f1db874731fe9a6999a5427.ppt