Open Cirrus Cloud Computing Testbed A joint initiative

Open Cirrus™ Cloud Computing Testbed A joint initiative sponsored by HP, Intel, and Yahoo! http: //opencirrus. org 6/22/2009 For more info, contact Dave O’Hallaron, david. ohallaron@intel. com Intel Open Cirrus team: David O’Hallaron, Michael Kozuch, Michael Ryan, Richard Gass, James Gurganus, Milan Milenkovic, Eric Gayles, Virginia Meade

Open Cirrus™ Cloud Computing Testbed Shared: research, applications, infrastructure (11 K cores), data sets Global services: sign on, monitoring, store. Open src stack (prs, tashi, hadoop) Sponsored by HP, Intel, and Yahoo! (with additional support from NSF) • 2 2 9 sites currently, target of around 20 in the next two years. Dave O’Hallaron – Open Cirrus™ Overview

Open Cirrus Goals • Goals — Foster new systems and services research around cloud computing — Catalyze open-source stack and APIs for the cloud • How are we unique? — Support for systems research and applications research — Federation of heterogeneous datacenters 3 Dave O’Hallaron – Open Cirrus™ Overview

Process • Central Management Office, oversees Open Cirrus — Currently owned by HP • Governance model — — Research team Technical team New site additions Support (legal (export, privacy), IT, etc. ) • Each site — Runs its own research and technical teams — Contributes individual technologies — Operates some of the global services • E. g. — HP site supports portal and PRS — Intel site developing and supporting Tashi — Yahoo! contributes to Hadoop 4 Dave O’Hallaron – Open Cirrus™ Overview

Intel Big. Data Open Cirrus site http: //opencirrus. intel-research. net 45 Mb/s T 3 to Internet Switch 48 Gb/s 1 Gb/s (x 4) Switch 48 Gb/s 1 Gb/s (x 4 p 2 p) 1 Gb/s (x 4) Switch 48 Gb/s 1 Gb/s (x 15 p 2 p) 1 Gb/s (x 4 p 2 p) 1 Gb/s (x 5 p 2 p) 1 Gb/s (x 15 p 2 p) Rack of 40 blade compute/ storage nodes Rack of 15 1 u compute/ storage nodes Rack of 15 2 u compute/ storage nodes 10 nodes: 8 Core 2 cores (4 x 2), 8 GB RAM, 0. 3 TB disk (2 x 150 GB) 20 nodes: 1 Xeon core, 6 GB RAM, 366 GB disk (36+300 GB) 10 nodes: 4 Xeon cores (2 x 2), 4 GB RAM, 150 GB disk (2 x 75 GB) Node: 8 Core 2 cores (4 x 2), 8 GB RAM, 0. 3 TB disk (2 x 150 GB) Node: 8 Core 2 cores (4 x 2), 8 GB RAM, 2 TB disk (2 x 1 TB) Node: 8 Core 2 cores (2 x 4), 8 GB RAM, 6 TB disk (6 x 1 TB) x 2 x 3 Nodes/cores: 40/140 40/320 RAM (GB): 240 320 Storage (TB) 11 12 Dave 80 O’Hallaron – Open Cirrus™ Overview Spindles: 80 5 Rack of 5 3 u storage nodes Node: 12 TB disk (12 x 1 TB) [totals] 30/240 60 60 45/360 270 60 60 [155/1060] [1160] [413] [550]

Open Cirrus Sites Characteristics Site #Cores #Srvrs Public Memory Storage Spindles Network Focus HP 1, 024 256 178 3. 3 TB 632 TB 1152 10 G internal 1 Gb/s x-rack Hadoop, Cells, PRS, scheduling IDA 2, 400 300 100 4. 8 TB 43 TB+ 16 TB SAN 600 1 Gb/s Apps based on Hadoop, Pig Intel 1, 060 155 145 1. 16 TB 353 TB local 60 TB attach 550 1 Gb/s Tashi, PRS, MPI, Hadoop KIT 2, 048 256 128 10 TB 1 PB 192 1 Gb/s Apps with high throughput UIUC 1, 024 128 64 2 TB ~500 TB 288 1 Gb/s Datasets, cloud infrastructure CMU 1, 024 128 64 2 TB -- -- 1 Gb/s Storage, Tashi Yahoo (M 45) 3, 200 480 400 2. 4 TB 1. 2 PB 1600 1 Gb/s Hadoop on demand 1, 029 25 TB 2. 6 PB Total 11, 780 1, 703 6 Dave O’Hallaron – Open Cirrus™ Overview

Testbed Comparison Testbeds Open Cirrus IBM/Google Tera. Grid Type of research Data. Systems & intensive Scientific services applications research Approach Federation of heterogeneous data centers A cluster supported by Google and IBM HP, Intel, IBM, Google, IDA, KIT, Stanford, Participants UIUC, U. Wash, Yahoo! MIT CMU 6 sites 1, 703 Distribution nodes 1 site 11, 780 cores 7 Planet. Lab Multi-site hetero clusters super comp Systems and services Open Cloud Amazon LANL/NSF Consortium EC 2 cluster Systems Interoperab. across clouds Commer. using open use APIs A few 100 A single-site nodes cluster with hosted by flexible research control instit. Many schools and orgs 11 partners in US Emu. Lab Many schools and orgs University of Utah Multi-site heteros clusters, focus on network Raw access to virtual machines Re-use of LANL’s retiring clusters 4 centers Amazon CMU, LANL, NSF 1 site 1000 s of older, still useful nodes > 700 480 cores, >300 nodes world distributed in univ@Utah -wide four locations Dave O’Hallaron – Open Cirrus™ Overview Systems

Open Cirrus Stack Management and control subsystem Compute + network + storage resources Power + cooling Physical Resource set (PRS) service 8 Dave O’Hallaron – Open Cirrus™ Overview Credit: John Wilkes (HP)

Open Cirrus Stack PRS clients, each with their own “physical data center” Research Tashi NFS storage service PRS service 9 Dave O’Hallaron – Open Cirrus™ Overview HDFS storage service

Open Cirrus Stack Virtual clusters (e. g. , Tashi) Virtual cluster Research Virtual cluster Tashi NFS storage service PRS service 10 Dave O’Hallaron – Open Cirrus™ Overview HDFS storage service

Open Cirrus Stack Big. Data App Hadoop Virtual cluster Research Application running On Hadoop On Tashi virtual cluster On a PRS On real hardware Virtual cluster Tashi NFS storage service PRS service 11 1. 2. 3. 4. 5. Dave O’Hallaron – Open Cirrus™ Overview HDFS storage service

Open Cirrus Stack Big. Data app Hadoop Virtual cluster Research Virtual cluster Tashi NFS storage service PRS service 12 Experiment/ save/restore Dave O’Hallaron – Open Cirrus™ Overview HDFS storage service

Open Cirrus Stack Big. Data App Hadoop Platform services Virtual cluster Research Virtual cluster Tashi NFS storage service PRS service 13 Experiment/ save/restore Dave O’Hallaron – Open Cirrus™ Overview HDFS storage service

Open Cirrus Stack User services Big. Data App Hadoop Platform services Virtual cluster Research Virtual cluster Tashi NFS storage service PRS service 14 Experiment/ save/restore Dave O’Hallaron – Open Cirrus™ Overview HDFS storage service

Open Cirrus Stack User services Big. Data App Hadoop Platform services Virtual cluster Research Virtual cluster Tashi PRS 15 Experiment/ save/restore Dave O’Hallaron – Open Cirrus™ Overview NFS storage service HDFS storage service

Open Cirrus stack - PRS • PRS service goals — Provide mini-datacenters to researchers — Isolate experiments from each other — Stable base for other research • PRS service approach — Allocate sets of physical co-located nodes, isolated inside VLANs. • PRS code from HP being merged into Tashi Apache project. — Running on HP site — Being ported to Intel site — Will eventually run on all sites 16 Dave O’Hallaron – Open Cirrus™ Overview

Open Cirrus Stack - Tashi • An open source Apache Software Foundation project sponsored by Intel (with CMU, Yahoo, HP) — Infrastructure for cloud computing on Big Data — http: //incubator. apache. org/projects/tashi • Research focus: — Location-aware co-scheduling of VMs, storage, and power. — Seamless physical/virtual migration. • Joint with Greg Ganger (CMU), Mor Harchol. Balter (CMU), Milan Milenkovic (CTG) 17 Dave O’Hallaron – Open Cirrus™ Overview

Tashi High-Level Design Most decisions happen in the scheduler; manages compute/storage/power in concert Scheduler Services are instantiated through virtual machines Data location and power information Virtualization Service is exposed to scheduler and services Storage Service The storage service aggregates the capacity of the commodity nodes to house Big Data repositories. Cluster Manager Node 18 CM maintains databases and routes messages; decision logic is limited Node Dave O’Hallaron – Open Cirrus™ Overview Node Cluster nodes are assumed to be commodity machines Node

Open Cirrus Stack - Hadoop • An open-source Apache Software Foundation project sponsored by Yahoo! — http: //wiki. apache. org/hadoop/Project. Description • Provides a parallel programming model (Map. Reduce), a distributed file system, and a parallel database (HDFS) 19 Dave 19 O’Hallaron – Open Cirrus™ Overview

How do users get access to Open Cirrus sites? • • Contact names, email addresses, and web links for applications to each site will be available on the Open Cirrus Web site (which goes live Q 209) — http: //opencirrus. org • Each Open Cirrus site decides which users and projects get access to its site. • 20 Project PIs apply to each site separately. Developing a global sign on for all sites (Q 2 09) — Users will be able to login to each Open Cirrus site for which they are authorized using the same login and password. Dave O’Hallaron – Open Cirrus™ Overview

What kinds of research projects are Open Cirrus sites looking for? • Open Cirrus is seeking research in the following areas (different centers will weight these differently): — — Datacenter federation Datacenter management Web services Data-intensive applications and systems • The following kinds of projects are generally not of interest: — Traditional HPC application development. — Production applications that just need lots of cycles. — Closed source system development. 21 Dave O’Hallaron – Open Cirrus™ Overview

Open Cirrus Systems Research at Intel • Tashi — Open source software infrastructure for cloud computing on big data (IRP, with Apache Software Foundation, CMU, Yahoo, HP) • http: //incubator. apache. org/projects/tashi — Research focus: — Location-aware co-scheduling of VMs, storage, and power — Seamless physical/virtual migration — Joint with Greg Ganger (CMU), Mor Harchol-Balter (CMU), Milan Milenkovic (CTG) • Sprout — Software infrastructure for parallel video stream processing (IRP, IRS, ESP project) — Central parallelization infrastructure for ESP and SLIPStream projects 22 Dave O’Hallaron – Open Cirrus™ Overview

Open Cirrus Application Research at Intel • ESP (Everyday Sensing and Perception) — Detection of everyday activities from video • SLIPStream (with CMU) — — Parallel event and object detection in massive videos Robot sensing and perception Gesture based game and computer interfaces Food Recognizing Interactive Electronic Nutrition Display (FRIEND) • Neuro. Sys - Parallel brain activity analysis — Interactive functional MRI (with CMU) • Parallel machine learning (with CMU) — Parallel belief propagation on massive graphical models — Automatically converting movies from 2 D to 3 D • Stem cell tracking (with UPitt/CMU) • Parallel dynamic physical rendering (with CMU) • Log-based architecture (with CMU) • Autolab autograding handin service for the world (with CMU) 23 Dave O’Hallaron – Open Cirrus™ Overview

Summary • Intel is collaborating with HP and Yahoo! to provide a cloud computing testbed for the research community • Primary goals are to — Foster new systems research around cloud computing — Catalyze open-source stack and APIs for the cloud • Opportunities for Intel and Intel customers 24 Dave O’Hallaron – Open Cirrus™ Overview