8d7a28cacb3dfebb52ba4117847beae2.ppt
- Количество слайдов: 54
Microsoft Large Databases and Grid Computing Jim Gray Microsoft Research Gray@Microsoft. com http: //research. Microsoft. com/~gray Presentation to Kaiser Information Management Briefing 21 May 2003
About me • in Microsoft research (located in San Francisco) • A database researcher – IBM, Tandem, DEC, Microsoft • Work on Scalable Systems – Building supercomputers from commodity components. • Do academic/government things too – PITAC, Gri. Phyn TAB, NSF/CISE, Library of Congress, … • For the last 4 years, been working with the astronomy community to build the World Wide Telescope.
Agenda • Terra. Server – What it is – What we learned – What we are doing now. • Sky. Server / WWT – What it is – What we learned – What we are doing now • Grid Computing – General comments – Build a web service
Terra. Server Terra. Service. net • A photo of the United States – – – 1 meter resolution (photographic/topographic) USGS data Some demographic data (Best. Places. net) Home sales data Linked to Encarta Encyclopedia • 15 TB raw, 6 TB cooked (grows 10 GB/w) • Point, Pan, zoom interface • Among top 1, 000 websites – 40 k visitors/day – 4 M queries/day – 3 B page views (in 5 years) • All in an SQL database
Terra. Server Statistics June ‘ 98 Unique Users Page Views Image Tiles Db Queries Bytes Xfered Jan ‘ 99 Jan ‘ 00 Daily Peak Average Day 40, 011 277, 292 1, 266, 838 2, 401, 209 3, 735, 789 10, 475, 674 4, 484, 089 12, 388, 104 70 gb 163 gb May ‘ 00 June 1998 Oct, 2002 63, 656, 904 2, 015, 539, 605 5, 943, 641, 024 7, 134, 186, 170 108 tb Sept ’ 01 900 m Rows 755 m Rows SQL 2000 298 m Rows 231 m Rows SQL 2000 217 m Rows 173 m Rows SQL 7. 0. 75 TB Db . 8 TB Db SQL 7. 0 1. 0 TB Db 1 Server / Win NT 4. 0 EE SQL 2000 1 Server. 8 TB Db SQL 7. 0 1 Server 1. 5 TB Db 2 nd Server / Win 2 k Data. Center Dec ‘ 02 1. 4 TB Db SQL 2000 2. 0 TB Db 1. 2 TB Db SQL 2000 1. 0 TB Db SQL 2000 2. 0 TB Db 4 Node / Win 2 k Datacenter Failover Cluster
Terra. Server Cluster 8 Compaq DL 360 “Photon” Web Servers One SQL database per rack Each rack contains 4. 5 TB 1 rack not in picture 18. 0 TB total 2200 Meta Data Stored on 101 GB “Fast, Small Disks” (18 x 18. 2 GB) Imagery Data Stored on 4 339 GB “Slow, Big Disks” (15 x 73. 8 GB) Added 90 72. 8 GB Disks in Feb 2001 to create 18 TB SAN O O J 2200 Fiber SAN Switches E E J 2200 P Q K L 2200 F SQLInst 1 G SQLInst 2 2200 R S 2200 M N 2200 SQLInst 3 H I Spare 4 Compaq Pro. Liant 8500 Db Servers
Cluster Configuration Terra. Server SAN Compaq Storage. Works 1 Database Internet G Et igi he bit rn et Cisco 12000 Internet Router ps Mb et 100 hern Et Compaq DL 360 (10) Cluster MA 8000/HSG 80 Controllers (3) 2 Summit 7 i Switch (2) Compaq SAN switch by Brocade Communications ADIC LTO Tape Library 3 Compaq Pro. Liant 8500 (4) Extreme Networks Summit 48 Switch Compaq DL 360 (6) (Windows 2000 Web Servers) Terra. Server. microsoft. com Internet Microsoft Corporat e LAN
Terra. Server Becomes a Web Service Terra. Server. net -> Terra. Service. Net • Web server is for people. • Web Service is for programs – The end of screen scraping – No faking a URL: pass real parameters. – No parsing the answer: data formatted into your address space. • Hundreds of users but a specific example: – US Department of Agriculture
And now. . 4 slides from the “customer” who built a portal using Terra. Service
Data Gateway Functional Overview ITC - Fort Collins, Colorado NCGC - Fort Worth, Texas Customer Orders Data Terra Service Navigation Service Soil Data Viewer XML Billing Services Rimage CD Service XML ASP Catalog Service FTP Services Ship Service <<Requests Products>> Send order info validate (dtd) Insert into SQL @@Identity / GUID to client return est time raise Order. Mgr. event Order Placer Package Service Product Catalog Updates Order Database Called by anyone rasies to stats svc' XML Request for data Logger Selects from Listen for Order. Placer Raised Event Select sequenced Item Output XML rasie event : stats. delivery start Data Services Item Broker Acknowledges item ready for delivery Geospatial Data
Custom End Product Soil XML Soil Report Web. Interpretation Map Data Viewer
Web Server - COM+ Applications Arc. IMS Connector Web. SDV Connects to Arc. IMS; communication is done through Arc. IMS XML (AXL) Retrieves and processes Soils Data from the NASIS relational Database IMSNavigator Generates maps (JPGs) using Arc. IMS Image Retriever Retrieves imagery from the Microsoft Terra. Server Database Server - ESRI Spatial Data Server ESRI Spatial Data Engine Database Server - Microsoft SQL Server Business Rules National Soils Data Geospatial Data Microsoft Terraserver
Brief tour of Terra. Service • Show map service • Show some methods • See Terra. Service. NET: An Introduction to Web Services Tom Barclay; Jim Gray; Eric Strand; Steve Ekblad; Jeffrey Richter, MSR TR 2002 -53, pp 13, June 2002
What We Learned • You can build and manage a very popular website with relatively little effort (if you do it right and have Tom Barclay) • Loading 20 TB takes a lot of energy • And you get to do it many times -- automate • Tape and tape software problematic • Triplex and snap-shot disks works (we have never had to use it, but. . ) • The internet gives you 2 -9’s Servers can run at 4 9’s easily, 5 9’s with effort.
What we are doing now. • • Building with 3 K$ 2 TB bricks 4 bricks = 1 backend Triplexing systems Duplexing sites. 4*3*2 = 24 k$ for Geoplex Very simple operations model See: • “Tera. Scale Sneaker. Net: Using Inexpensive Disks for Backup, Archiving, and Data Exchange, ” Jim Gray; Wyman Chong; Tom Barclay; Alex Szalay; Jan Vandenberg, pp. 1 -8, May 2002
Agenda • Terra. Server – What it is – What we learned – What we are doing now. • Sky. Server / WWT – What it is – What we learned – What we are doing now • Grid Computing – General comments – Build a web service
Sky. Server. SDSS. org • Like the Terra. Server, but looking the other way: a picture of ¼ of the universe • Pixels + Data Mining • Astronomers get about 400 attributes for each “object” • Get Spectrograms for 1% of the objects
Why Astronomy Data? IRAS 25 m • It has no commercial value –No privacy concerns –Can freely share results with others –Great for experimenting with algorithms 2 MASS 2 m • It is real and well documented –High-dimensional data (with confidence intervals) –Spatial data –Temporal data • Many different instruments from many different places and many different times • Federation is a goal • The questions are interesting DSS Optica IRAS 100 m WENSS 92 cm NVSS 20 cm –How did the universe form? • There is a lot of it (petabytes) ROSAT ~ke. V GB 6 cm
Demo of Sky. Server • • • Shows standard web server Pixel/image data Point and click Explore one object Explore sets of objects (data mining)
Virtual Observatory http: //www. astro. caltech. edu/nvoconf/ http: //www. voforum. org/ Premise: Most data is (or could be online) So, the Internet is the world’s best telescope: – – It has data on every part of the sky In every measured spectral band: optical, x-ray, radio. . As deep as the best instruments (2 years ago). It is up when you are up. The “seeing” is always great (no working at night, no clouds no moons no. . ). – It’s a smart telescope: links objects and data to literature on them.
Time and Spectral Dimensions The Multiwavelength Crab Nebulae Crab star 1053 AD X-ray, optical, infrared, and radio views of the nearby Crab Nebula, which is now in a state of chaotic expansion after a supernova explosion first sighted in 1054 A. D. by Chinese Astronomers. Slide courtesy of Robert Brunner @ Cal. Tech.
Data Federations of Web Services • Massive datasets live near their owners: – – Near the instrument’s software pipeline Near the applications Near data knowledge and curation Super Computer centers become Super Data Centers • Each Archive publishes a web service – Schema: documents the data – Methods on objects (queries) • Scientists get “personalized” extracts • Uniform access to multiple Archives Federation – A common global schema
Grid and Web Services Synergy • I believe the Grid will be many web services share data (computrons are free) • IETF standards Provide – Naming – Authorization / Security / Privacy – Distributed Objects Discovery, Definition, Invocation, Object Model – Higher level services: workflow, transactions, DB, . . • Synergy: commercial Internet & Grid tools
Web Services: The Key? • Web SERVER: – Given a url + parameters – Returns a web page (often dynamic) Your h t program tp • Web SERVICE: – Given a XML document (soap msg) – Returns an XML document – Tools make this look like an RPC. • F(x, y, z) returns (u, v, w) – Distributed objects for the web. – + naming, discovery, security, . . • Internet-scale distributed computing b We e pag Your s o program ap Data In your address space Web Server t jec l ob m in x Web Service
Sky. Query: a prototype • Defining Astronomy Objects and Methods. • Federated 3 Web Services (fermilab/sdss, jhu/first, Cal Tech/dposs) multi-survey cross-match Distributed query optimization (T. Malik, T. Budavari, Alex Szalay @ JHU) http: //Sky. Query. net/ • My first web service (cutout + annotated SDSS images) online – http: //skyservice. pha. jhu. edu/devel/Img. Cutout/chart. asp • WWT is a great Web Services (. Net) application – Federating heterogeneous data sources. – Cooperating organizations – An Information At Your Fingertips challenge.
Demo of Image Cutout Service • • Shows image cutout Show project and debugging project Show hello World Show “the. Answer” method
Sky. Query (http: //skyquery. net/) • Distributed Query tool using a set of services • Feasibility study, built in 6 weeks from scratch – Tanu Malik (JHU CS grad student) – Tamas Budavari (JHU astro postdoc) • Implemented in C# and. NET • Allows queries like: SELECT o. obj. Id, o. r, o. type, t. obj. Id FROM SDSS: Photo. Primary o, TWOMASS: Photo. Primary t WHERE XMATCH(o, t)<3. 5 AND AREA(181. 3, -0. 76, 6. 5) AND o. type=3 and (o. I - t. m_j)>2
Sky. Node Basic Web Services • Metadata information about resources – Waveband – Sky coverage – Translation of names to universal dictionary (UCD) • Simple search patterns on the resources – Cone Search – Image mosaic – Unit conversions • Simple filtering, counting, histogramming • On-the-fly recalibrations
Portals: Higher Level Services • Built on Atomic Services • Perform more complex tasks • Examples – – – Automated resource discovery Cross-identifications Photometric redshifts Outlier detections Visualization facilities • Goal: – Build custom portals in days from existing building blocks (like today in IRAF or IDL)
Architecture Image cutout Sky. Node First Web Page Sky. Node 2 Mass Sky. Query Sky. Node SDSS
Summary So Far • • • Some real web services deployed today Easy to build & deploy Services publish data, Portals unify it Tools really work! I’m using C# and foundation classes of Visual. Studio, a great! Tool • A nice book explaining the ideas: (. Net Framework Essentials, Thai, Lam isbn 0 -596 -00302 -1)
Possible Relevance to You • This web service stuff is REAL • If you have a class, It is a way to publish data: Your Internet soap program Web Intranet Service obj • It is a way to find data Data in x ect ml data comes with schema In your no more screen scraping/parsing address space • Business model unclear – Your ideas go here.
What We Learned • Web services really are a breakthrough. • Data mining worked beautifully. See Data Mining the SDSS Sky. Server Database, ” J. Gray, D. Slutz, A. Szalay, A. Thakar, P. Kuntz, C. Stoughton, MSR TR 2002 -1, pp 1 -40, 2002. • You can operate a system in Chicago from San Francisco – Terminal Server is wonderful. • The Internet gives you 2 9’s of availability • Tera. Scale Sneaker. Net works well
What we are doing now. • • Loading more data (next data release) Preparing for the next generation Building the WWT Web Services for the Virtual Observatory, Alexander S. Szalay, Tamás Budavária, Tanu Malika, Jim Gray, and Ani Thakar, SPIE Astronomy Telescopes and Instruments, 22 -28 August 2002, Waikoloa, Hawaii, • Petabyte Scale Data Mining: Dream or Reality? , Alexander S. Szalay; Jim Gray; Jan vanden. Berg, SIPE Astronomy Telescopes and Instruments, 22 -28 August 2002, Waikoloa, Hawaii, • Online Scientific Data Curation, Publication, and Archiving Jim Gray; Alexander S. Szalay; Ani R. Thakar; Christopher Stoughton; Jan vanden. Berg, SPIE Astronomy Telescopes and Instruments, 22 -28 August 2002, Waikoloa, Hawaii,
Agenda • Terra. Server – What it is – What we learned – What we are doing now. • Sky. Server / WWT – What it is – What we learned – What we are doing now • Grid Computing – General comments – Build a web service
The Grid • • Computation Grid: harvest Internet cpus. Data Grid: Share files Application Grid: Web services Access Grid: teleconferencing
The Microsoft View • Web Services will subsume the Grid – The Grid will be data and services not renting cycles • OGSA: evolution of Globus Toolkit to Web services concepts and technologies… • Lots of encouragement from Microsoft, IBM, Oracle, Sun • GGF as forum for discussion
Engagement with Grid Community • Goal: GXA as infrastructure for Grids • Working with Globus & GGF – Funding work at Argonne National Lab (Globus) – Globus Toolkit 3, and Condor. G on Windows • http: //www. globus. org/win-alpha/ (we sponsored this) – OGSA for. NET (prototyping) • http: //www. globus. org/ogsa/ – Also OGSI. NET at U. VA is very interesting • http: //www. cs. virginia. edu/~gsw 2 c/ogsi. net. html – GGF • Active membershp • HPC. net kit – see http: //www. microsoft. com/HPC – Part of. net server scale out development – Includes MPI-CH 1. 2. 4, distributed job scheduler, … – Thomas Sterling, Beowulf on Windows, MIT Press 2001
What’s Microsoft Doing • Mostly. NET, W 3 C standards, web services, … • I think Sky. Query is the best web service (grid app) in Gri. Phy. N today. • My stuff is grid computing • But… • Globus (GT 3), OGSA, and Condor. G ported to Windows (we sponsored it) • We have a HPC toolkit: MPI-CH 1. 2. 4 • See http: //www. microsoft. com/windows 2000/hpc/ for many useful links
I Can Talk About Computing on Demand But… Best to read • Distributed Computing Economics, Jim Gray, MSR-TR-2003 -24, March 2003 • The slides that follow are based on that paper.
Distributed Computing Economics • • • Why is Seti@Home a great idea Why is Napster a great deal? Why is the Computational Grid uneconomic When does computing on demand work? What is the “right” level of abstraction Is the Access Grid the real killer app?
Computing is Free • Computers cost 1 k$ (if you shop right) • So 1 cpu day == 1$ • If you pay the phone bill (and I do) Internet bandwidth costs 50 … 500$/mbps/m (not including routers and management). • So 1 GB costs 1$ to send and 1$ to receive
Why is Seti@Home a Good Deal? • Send 300 KB for • User computes for ½ day: • ROI: 1500: 1 costs 3 e-4$ benefit. 5 e-1$
Why is Napster a Good Deal? • Send 5 MB costs 5 e-3$ • ½ a penny per song • Both sender and receiver can afford it. • Same logic powers web sites (Yahoo!. . . ): – 1 e-3$/page view advertising revenue – 1 e-5$/page view cost of serving web page – 100: 1 ROI
The Cost of Computing: Computers are NOT free! • Capital Cost of a Tpc. C system is mostly storage and storage software (database) • IBM 32 cpu, 512 GB ram 2, 500 disks, 43 TB (680, 613 tpm. C @ 11. 13 $/tpmc available 11/08/03) http: //www. tpc. org/results/individual_results/IBMp 690 es_05092003. pdf • A 7. 5 M$ super-computer • Total Data Center Cost: 40% capital &facilities 60% staff (includes app development)
Computing Equivalents 1 $ buys • • 1 day of cpu time 4 GB ram for a day 1 GB of network bandwidth 1 GB of disk storage 10 M database accesses 10 TB of disk access (sequential) 10 TB of LAN bandwidth (bulk)
Some consequences • Beowulf networking is 10, 000 x cheaper than WAN networking factors of 105 matter. • The cheapest and fastest way to move a Terabyte cross country is sneakernet. 24 hours = 4 MB/s 50$ shipping vs 1, 000$ wan cost. • Sending 10 PB CERN data via network is silly: buy disk bricks in Geneva, fill them, ship them – one way. Tera. Scale Sneaker. Net: Using Inexpensive Disks for Backup, Archiving, and Data Exchange Jim Gray; Wyman Chong; Tom Barclay; Alex Szalay; Jan vanden. Berg Microsoft Technical Report may 2002, MSR-TR-2002 -54 http: //research. microsoft. com/research/pubs/view. aspx? tr_id=569
How Do You Move A Terabyte? Speed Mbps Rent $/month Home phone 0. 04 40 1, 000 3, 086 6 years Home DSL 0. 6 70 117 360 5 months T 1 1. 5 1, 200 800 2, 469 2 months T 3 43 28, 000 651 2, 010 2 days OC 3 155 49, 000 316 976 14 hours OC 192 9600 1, 920, 000 200 617 14 minutes 100 Mpbs 100 1 day Gbps 1000 2. 2 hours Context $/TB $/Mbps Sent Time/TB
Computational Grid Economics • To the extent that computational grid is like Seti@Home or Zeta. Net or Folding@home or… it is a great thing • The extent that the computational grid is MPI or data analysis, it fails on economic grounds: move the programs to the data, not the data to the programs. • The Internet is NOT the cpu backplane. • The USG should not hide this economic fact from the academic/scientific research community.
Computing on Demand • Was called outsourcing / service bureaus in my youth. CSC and IBM did it. • Payroll is standard outsource. • Now we have Hotmail, Salesforce. com, Oracle. com, …. • Works for standard apps. • Airlines outsource reservations. Banks outsource ATMs. • But Amazon, Amex, Wal-Mart, . . . Can’t outsource their core competence. • So, COD works for commoditized services. • It is not a new way of doing things: think payroll.
What’s the right abstraction level for Internet Scale Distributed Computing? • • Disk block? File? Database? Application? No too low. Yes, of course. – Blast search – Google search – Send/Get e. Mail – Portals that federate astronomy archives (http: //sky. Query. Net/) • Web Services (. NET, EJB, OGSA) give this abstraction level.
Access Grid • • • Q: What comes after the telephone? A: e. Mail? A: Instant messaging? Both seem retro technology: text & emotons. Access Grid could revolutionize human communication. • But, it needs a new idea. • Q: What comes after the telephone?
Distributed Computing Economics • • • Why is Seti@Home a great idea? Why is Napster a great deal? Why is the Computational Grid uneconomic When does computing on demand work? What is the “right” level of abstraction? Is the Access Grid the real killer app? Based on: Distributed Computing Economics, Jim Gray, Microsoft Tech report, March 2003, MSR-TR-2003 -24 http: //research. microsoft. com/research/pubs/view. aspx? tr_id=655
Agenda • Terra. Server – What it is – What we learned – What we are doing now. • Sky. Server / WWT – What it is – What we learned – What we are doing now • Grid Computing – General comments – Build a web service
8d7a28cacb3dfebb52ba4117847beae2.ppt