Managing Innovation: How Microsoft Research Works Jim Gray Distinguished Engineer Microsoft Corporation Managing Innovation: How Microsoft Research Works Jim Gray Distinguished Engineer Microsoft Corporation

Actionable Ideas Co-lo if possible Adopt a “university model” Recruit from the top Recruit Actionable Ideas Co-lo if possible Adopt a “university model” Recruit from the top Recruit for passion and a desire to have impact Install a Research Program Management organization to orchestrate tech-transfer Institute an annual Tech. Fest

Innovation Build versus Buy versus Invest Build: Have in-house research Bell Labs, IBM, GM, Innovation Build versus Buy versus Invest Build: Have in-house research Bell Labs, IBM, GM, Pfizer, Merc, Microsoft… Buy: Acquire startups or whole companies IBM, Cisco, Intel, Microsoft, Pfizer, Merc… Invest: All boats rise Government research funding IBM, Cisco, Intel, Microsoft, Pfizer, Merc… All 3 approaches valid Complement one another

Companies Are Different Intel Product 19% Gross 40% Gross 50% S G&A 23% R&D Companies Are Different Intel Product 19% Gross 40% Gross 50% S G&A 23% R&D 15% Accenture Gross 32% Gross 27% 44% Product 33% R&D 6% S G&A 16% S G&A 26% R&D 6% Cisco HP other 7% Product R&D 0% Product 26% Gross 38% Product 31% Gross 36% S G&A 27% R&D 15% S G&A 21% Oracle IBM Product 18% S G&A 16% Product 47% other 2% Microsoft S G&A 25% DELL Gross 18% Gross 26% R&D 12% Product 73% R&D 1% EDS other 14% Gross 8% S G&A 9% S G&A 8% Product 69% Selected IT company FY 02 R&D budgets: Notice that R&D is correlated with margin IBM and HP have large service revenues So, their “real” R&D investment rate is higher Dell, Accenture, EDS have modest R&D – innovate in other ways R&D 0%

Most R&D Is D How to Do Basic Research in Industry? Critical questions (from Most R&D Is D How to Do Basic Research in Industry? Critical questions (from Rick Rashid) How can I create and maintain a world class research organization in an industrial setting? How do I keep the lines of communication open between product teams and researchers? How do I get new technology into products quickly?

Approach Adapt the Academic Model Organizational goal: Advance state of the art University organizational Approach Adapt the Academic Model Organizational goal: Advance state of the art University organizational model Flat structure, critical mass groups Open research environment Aggressive publication in peer-reviewed literature Frequent visitors, daily seminars Strong ties to University Research Nearly 15% of basic research budget directly invested in Universities Lab grants, research grants, fellowships, etc. Hundreds of interns and visitors

Microsoft Research Today Founded in 1991 Staff of over 700 in over 55 areas Microsoft Research Today Founded in 1991 Staff of over 700 in over 55 areas Internationally recognized research teams Research lab locations : Redmond, Washington, San Francisco, California 1% Cambridge, United Kingdom Beijing, People’s Republic of China Mountain View, California 75% 10% 5%

Microsoft Research Expanding the State of the Art Thousands of peer-reviewed publications 10%… 30% Microsoft Research Expanding the State of the Art Thousands of peer-reviewed publications 10%… 30% of papers at our focus conferences graphics, programming, systems, data management… Community leadership Professional societies Journals Conferences Mentoring Interns Hosting academic summers and sabbaticals Special workshops

How To Build A Group Identify a promising area Hire the leader (internal or How To Build A Group Identify a promising area Hire the leader (internal or external) Support her/him Build team around senior researcher Look for people who Want to have impact Have passion for their ideas Same template works for whole labs Cambridge, Beijing, Silicon Valley

Keeping Open The Lines Of Communication To Product Teams Co-location helps: 75% “on campus” Keeping Open The Lines Of Communication To Product Teams Co-location helps: 75% “on campus” “How can I help? ” attitude demonstrates willingness to “get dirty” to help product succeed Product group spin-offs build strong ties Over time a number of product groups evolved from research (e. g. , Windows Media) Researchers involved in all corporate product reviews

MSR Relationship To MS Products Virtually every research group actively engaged with product groups MSR Relationship To MS Products Virtually every research group actively engaged with product groups E. G. , Windows, Office, streaming media, SQL, Exchange, IIS, commerce server, visual studio, office, consumer products, MSN, etc. Tech transfer: Ideas Code People Contacts Recruiting

Focused Technology Transfer Quickly getting technology into products Program management team with sole focus Focused Technology Transfer Quickly getting technology into products Program management team with sole focus on tech transfer Researchers on product “advisory” boards “Mind-swaps” – joint product/research off-sites Joint product/research teams, e. g. , Clear. Type (Windows XP) Datamining (SQL 2000) Natural Language & Speech (Office) Tablet. PC Smart Personal Objects (SPOT) Encourage and recognize contributions

MSR Techfest Internal open house for Microsoft Research Annual event since 2001 ~ 7000 MSR Techfest Internal open house for Microsoft Research Annual event since 2001 ~ 7000 attendees 170 demos, 26 lectures “Research in progress” Breadboard demos This is research idea/prototype Great networking event: Breaks down barriers Serendipitous connections.

Examples Of Technology Transfer Critical support technologies Memory Optimization Technology enabled sim-ship of Win Examples Of Technology Transfer Critical support technologies Memory Optimization Technology enabled sim-ship of Win 95/Office 95 Automated bug detection in Windows 2000 Key technologies that drive products E. G. , MS audio 4. 0, Clear. Type, intelligent search, collaborative filtering, Intellimirror, etc. Incubated major products Windows streaming media Windows CE, Tablet. PC, e. Book Ecommerce, Datamining Natural language and speech technologies, etc.

MSR Mission Statement Expand the state of the art in each of the areas MSR Mission Statement Expand the state of the art in each of the areas in which we do research Rapidly transfer innovative technologies into Microsoft products Ensure that Microsoft products have a future

Personal Examples of R&D Scaleable Servers Terra. Server Sky. Server Databases Data Cube, Snapshot Personal Examples of R&D Scaleable Servers Terra. Server Sky. Server Databases Data Cube, Snapshot Isolation SQL Stress testing Reliable Multicast Personal Media Management

Terra. Server & Terra. Service Terra. Server Terra. Service http: //terraserver-usa. com http: //terraservice. Terra. Server & Terra. Service Terra. Server Terra. Service http: //terraserver-usa. com http: //terraservice. net USGS Photo and Topo maps A. NET web service 16 TB of data Open. GIS Online since 1997 Place Search 7 billon pages served Terra. Server Map Server 120 TB served Landmarks & annotations Shows Scalability Availability Manageability SQL + Windows layered on imagery Used by thousands of real apps today Shows Web Services Performance

Terra. Server Today Terra. Server Today

Terra. Server Tomorrow Mirrored System versus SAN 3 mirrored DB servers + spare versus Terra. Server Tomorrow Mirrored System versus SAN 3 mirrored DB servers + spare versus 4 DB servers Commodity versus Enterprise White box Dual Xeon versus 8 -way branded DAS 250 GB SATA versus FC-SAN 73 GB SCSI No Tape versus LTO Tape Robot $0. 1 M versus $1. 8 M Geoplex: 2 sites You can afford 2! KVM / IP

World Wide Telescope http: //www. voforum. org/ Premise: Most Astro data is online So, World Wide Telescope http: //www. voforum. org/ Premise: Most Astro data is online So, the Internet is the world’s best telescope: Has data on every part of the sky In every measured spectral band As deep as the best instruments It is up when you are up; the “seeing” is always great (no working at night, no clouds no moons no…) It’s a smart telescope: links objects and data to literature on them

Next-Generation Data Analysis Looking for Needles in haystacks – the Higgs particle Haystacks: Dark Next-Generation Data Analysis Looking for Needles in haystacks – the Higgs particle Haystacks: Dark matter, Dark energy Needles are easier than haystacks Global statistics have poor scaling Correlation functions are N 2, likelihood techniques N 3 As data and computers grow at same rate, we can only keep up with N log. N A way out? data is fuzzy, answers are approximate Requires combination of statistics and computer science

Data Federations Of Web Services Massive datasets live near their owners: Near the instrument’s Data Federations Of Web Services Massive datasets live near their owners: Near the instrument’s software pipeline Near the applications Near data knowledge and curation Super Computer centers become Super Data Centers Each Archive publishes a web service Schema: documents the data Methods on objects (queries) Scientists get “personalized” extracts Uniform access to multiple Archives A common global schema Challenge: Federation What is the object model for your science?

Web Services – The Key? Web SERVER: Your program Web Service Given a url Web Services – The Key? Web SERVER: Your program Web Service Given a url + parameters Returns a web page (often dynamic) b We e g pa Web SERVICE: Given a XML document (soap msg) Returns an XML document Tools make this look like an RPC. F(x, y, z) returns (u, v, w) Distributed objects for the web. + naming, discovery, security, . . Internet-scale distributed computing http Your program Data In your address space soa p t jec l ob m in x Web Service

Federating Astronomy Archives Great Test for data mining algorithms IRAS 25 m 2 MASS Federating Astronomy Archives Great Test for data mining algorithms IRAS 25 m 2 MASS 2 m It is real and well documented data High-dimensional data (with confidence intervals) Spatial data Temporal data DSS Optical IRAS 100 m Many different instruments from many different places and many different times Federation is a goal There is a lot of it (petabytes) Can share cross company University researchers WENSS 92 cm NVSS 20 cm ROSAT ~ke. V GB 6 cm

Sky. Server – One such archive Sky. Server. SDSS. org Sloan Digital Sky Survey Sky. Server – One such archive Sky. Server. SDSS. org Sloan Digital Sky Survey Pixels + Data Mining 400 attributes per “object” Spectrograms for 1% Demo: pixel space record space set space teaching

Sky. Query: Federating Archives http: //skyquery. net/ Distributed Query tool using a set of Sky. Query: Federating Archives http: //skyquery. net/ Distributed Query tool using a set of web services Federates ten astronomy archives from Pasadena, Chicago, Baltimore, Cambridge (England) Implemented in C# and. NET Allows queries like: SELECT o. obj. Id, o. r, o. type, t. obj. Id FROM SDSS: Photo. Primary o, TWOMASS: Photo. Primary t WHERE XMATCH(o, t)<3. 5 AND AREA(181. 3, -0. 76, 6. 5) AND o. type=3 and (o. I - t. m_j)>2

Sky. Query Structure Each Sky. Node publishes Schema Web Service Database Web Service Portal Sky. Query Structure Each Sky. Node publishes Schema Web Service Database Web Service Portal Plans Query (2 phase) Integrates answers Is itself a web service Image Cutout INT SDSS 2 MASS Sky. Query Portal FIRST

Databases Theory to practice Data Cube Wrote paper SQL Server product and ISO Standard Databases Theory to practice Data Cube Wrote paper SQL Server product and ISO Standard adopted idea Snapshot Isolation Paper in 1996 Product in 2004 old Reader version new

Databases Stress Test Generate millions of random SQL queries Send them to 4 different Databases Stress Test Generate millions of random SQL queries Send them to 4 different products Compare the answers: If all agree, good! If not, a bug somewhere Found many bugs in DB products Much appreciated by MS DB group Tool cloned by other DB vendors Sql. Server DB 2 = Oracle Informix

SQL Automated Test Example Four SQL systems on 2, 000 statements Case X 1672 SQL Automated Test Example Four SQL systems on 2, 000 statements Case X 1672 232 Y Z 1672 234 241 31 15 12 28 1 12 5 116 0 29 32 4 18 18 19 25 45 Error W 19 18 113 All four agree 84% W, X, and Y agree 95% Problem with intermediate table.

PGM Pretty Good Multicast Reliable multicast protocol Scales using hierarchy, suppression, and FEC “on-demand” PGM Pretty Good Multicast Reliable multicast protocol Scales using hierarchy, suppression, and FEC “on-demand” (FEC on-demand is our contribution) Joint work with Cisco and others IETF standard Implemented prototype (Multicast Power. Point) Shipped in Windows XP

My. Life. Bits “A lifetime store of everything” The experiment: digitizing Gordon Bell’s life My. Life. Bits “A lifetime store of everything” The experiment: digitizing Gordon Bell’s life The software: Based on SQL server Tools to capture web pages, IM chats, TV, radio & telephone Reports, links, full text search, pivot by time or any other attribute

My. Life. Bits Software Radio capture tool TV capture tool Internet TV EPG download My. Life. Bits Software Radio capture tool TV capture tool Internet TV EPG download tool Browser tool Telephone capture tool My. Life. Bits store database Pocket. PC transfer tool Radio EPG tool MAPI interface files My. Life. Bits Shell Legacy applications Voice annotation tool Text annotation tool Pocket. Radio player Legacy email client

Research Failures Not everything is a success We had technology transfer failures We had Research Failures Not everything is a success We had technology transfer failures We had projects with little impact Success and Failure depend on environment Even if you have a GREAT! idea There are many exogenous factors in technology transfer And, sometimes the idea or focus is wrong Allow people to fail once or twice.

Summary Actionable Ideas Co-lo if possible Adopt a “university model” Recruit from the top Summary Actionable Ideas Co-lo if possible Adopt a “university model” Recruit from the top Recruit for passion and a desire to have impact Install a Research Program Management organization to orchestrate tech-transfer Institute an annual Tech. Fest

