
00fe7e1631053a2375968ee491b1ea9c.ppt
- Количество слайдов: 60
The Future of Distributed Systems. Jim Gray Researcher Microsoft Corp. [email protected] com ™ 1
Outline u Global forces Ø Ø u Micro dollars per transaction Ø u Moore’s, Metcalf’s, Bell’s, Bills, Andy’s laws Cyber-content is key value because distribution costs go to zero Distributed Systems Concepts and terms Key software technologies Ø objects, transactions 2
Metcalf’s Law Network Utility = Users 2 u How many connections can it make? Ø Ø u 1 user: no utility 100, 000 users: a few contacts 1 million users: many on Net 1 billion users: everyone on Net That is why the Internet is so “hot” Ø Exponential benefit
Moore’s First Law u XXX doubles every 18 months 60% increase per year Ø Ø u Exponential growth: Ø Ø u Micro processor speeds Chip density Magnetic disk density Communications bandwidth WAN bandwidth approaching LAN speeds 1 GB 128 MB 1 chip memory size ( 2 MB to 32 MB) 1 MB 128 KB 1970 bits: 1 K 1980 1990 4 K 16 K 64 K 256 K 1 M 4 M 16 M 64 M 256 M The past does not matter 10 x here, 10 x there, soon you’re talking REAL change PC costs decline faster than any other platform Ø Ø 2000 Volume and learning curves PCs will be the building bricks of all future systems
Bumps In The Moore’s Law Road u DRAM: Ø Ø u 1988: United States anti-dumping rules 1993 -1995: ? price flat Magnetic disk: 1965 -1989: 10 x/decade Ø 1989 -1996: 4 x/3 year! 100 X/decade Ø $/MB of DRAM 1000000 100 1 1970 10, 000 1980 1990 2000 $/MB of DISK 100 1. 01 1970 1980 1990 2000
Gordon Bell’s Seven Price Tiers 10$: wrist watch computers 100$: pocket/ palm computers 1, 000$: portable computers • 10, 000$: personal computers (desktop) 100, 000$: departmental computers (closet) 1, 000$: site computers (glass house) 10, 000$: regional computers (glass castle) Super server: costs more than $100, 000 “Mainframe”: costs more than $1 million Must be an array of processors, disks, tapes, comm ports
Bell’s Evolution Of Computer Classes Technology enables two evolutionary paths: 1. constant performance, decreasing cost 2. constant price, increasing performance Log price Mainframes (central) Minis (dep’t. ) WSs PCs (personals) Time ? ? 1. 26 = 2 x/3 yrs -- 10 x/decade; 1/1. 26 =. 8 1. 6 = 4 x/3 yrs --100 x/decade; 1/1. 6 =. 62
Software Economics u u u An engineer costs about $150, 000/year R&D gets [5%… 15%] of budget Need [$3 million… $1 million] revenue per engineer Intel: $16 billion Profit 22% Tax 12% R&D 8% SG&A 11% P&S 47% Microsoft: $9 billion Profit 24% R&D 16% SG&A 34% Tax 13% Product and Service 13% IBM: $72 billion Profit Tax 6% 5% R&D 8% SG&A 22% P&S 59% Oracle: $3 billion Profit 15% Tax 7% P&S 26% R&D 9% SG&A 43%
Software Economics: Bill’s Law Fixed_ Cost Price = + Marginal _Cost Units u u u Bill Joy’s law (Sun): don’t write software for less than 100, 000 platforms @$10 million engineering expense, $1, 000 price Bill Gate’s law: don’t write software for less than 1, 000 platforms @$10 engineering expense, $100 price Examples: ØUNIX versus Windows NT: $3, 500 versus $500 ØOracle versus SQL-Server: $100, 000 versus $6, 000 ØNo spreadsheet or presentation pack on UNIX/VMS/. . . u Commoditization of base software and hardware
Gordon Bell’s Platform Economics u u Traditional computers: custom or semi-custom, high-tech and high-touch New computers: high-tech and no-touch 100000 10000 Price (K$) Volume (K) Application price 1000 10 1 0. 01 Mainframe WS Computer type Browser
Grove’s Law The New Computer Industry u u u Horizontal integration is new structure Each layer picks best from lower layer Desktop (C/S) market Ø 1991: 50% Ø 1995: 75% Function Operation Integration Applications Middleware Baseware Systems Silicon & Oxide Example AT&T EDS SAP Oracle Microsoft Compaq Intel & Seagate
Outline u Global forces Ø Ø u Micro dollars per transaction Ø u Moore’s, Metcalf’s, Bell’s, Bills, Andy’s laws Cyber-content is key value because distribution costs go to zero Distributed Systems Concepts and terms Key software technologies Ø objects, transactions 12
1987: 256 tps Benchmark u u u 14 M$ computer (Tandem) A dozen people False floor, 2 rooms of machines Admin expert Hardware experts A 32 node processor array Simulate 25, 600 clients Network expert Manager Performance expert DB expert Auditor OS expert A 40 GB disk array (80 drives) 13
1997: 10 years later 1 Person and 1 box = 1250 tps u u 1 Breadbox ~ 5 x 1987 machine room 23 GB is hand-held One person does all the work Cost/tps is 1, 000 x less 1 micro dollar per transaction Hardware expert OS expert Net expert DB expert App expert 4 x 200 Mhz cpu 1/2 GB DRAM 12 x 4 GB disk 3 x 7 x 4 GB disk arrays 15
What Happened? u Moore’s law: Things get 4 x better every 3 years (applies to computers, storage, and networks) u New Economics: Commodity class price/mips software $/mips k$/year mainframe 10, 000 100 minicomputer 100 10 microcomputer 10 1 main price u fram e min i micr o time GUI: Human - computer tradeoff optimize for people, not computers 16
What Happens Next u u u Last 10 years: 1000 x improvement Next 10 years: ? ? Today: 1985 1995 text and image servers are free 1 m$/hit cost 70, 000 m$/hit advertising revenue Advertising pays for them Content is only “real” expense “You ain’t seen nothing yet!” performance u 2005 17
Kinds Of Information Processing Point-to-point Immediate Timeshifted Broadcast Conversation Money Lecture Concert Network Mail Book Newspaper Database It’s ALL going electronic Immediate is being stored for analysis (so ALL database) Analysis and automatic processing are being added
Low rent min $/byte Shrinks time now or later Shrinks space here or there Automate processing knowbots Immediate OR time-delayed Why Put Everything In Cyberspace? Point-to-point OR broadcast Network Locate Process Analyze Summarize Database
Billions Of Clients u u u Every device will be “intelligent” Doors, rooms, cars… Computing will be ubiquitous
Billions Of Clients Need Millions Of Servers u All clients networked to servers Ø u u May be nomadic or on-demand Fast clients want faster servers Servers provide Shared Data Ø Control Ø Coordination Ø Communication Clients Mobile clients Fixed clients Server Ø Super server
Thesis Many little beat few big $1 million Mainframe 3 1 MM $100 K Mini $10 K Micro Nano 1 MB Pico Processor 10 pico-second ram 10 nano-second ram 100 MB 10 GB 10 microsecond ram 1 TB 14" u u 9" 5. 25" 3. 5" 2. 5" 1. 8" 10 millisecond disc 100 TB 10 second tape archive Smoking, hairy golf ball How to connect the many little parts? How to program the many little parts? Fault tolerance? 1 M SPECmarks, 1 TFLOP 106 clocks to bulk ram Event-horizon on chip VM reincarnated Multiprogram cache, On-Chip SMP
Future Super Server: 4 T Machine u Array of 1, 000 4 B machines Ø 1 bps processors Ø 1 BB DRAM Ø 10 BB disks Ø 1 Bbps comm lines Ø 1 TB tape robot u u A few megabucks Challenge: Ø Manageability Ø Programmability CPU 50 GB Disc 5 GB RAM Cyber Brick a 4 B machine Ø Security Ø Availability Ø Scaleability Ø Affordability u As easy as a single system Future servers are CLUSTERS of processors, discs Distributed database techniques make clusters work
The Hardware Is In Place… And then a miracle occurs ? u u u SNAP: scaleable network and platforms Commodity-distributed OS built on: Ø Commodity platforms Ø Commodity network interconnect Enables parallel applications
Outline u Global forces Ø Ø u Micro dollars per transaction Ø u Moore’s, Metcalf’s, Bell’s, Bills, Andy’s laws Cyber-content is key value because distribution costs go to zero Distributed Systems Concepts and terms Key software technologies Ø objects, transactions 25
Outline Concepts and Terminology Ø Why Distributed Ø Distributed data & objects Ø Distributed execution Ø Three tier architectures Ø Transaction concepts 26
What’s a Distributed System? u Centralized: Ø Ø u everything in one place stand-alone PC or Mainframe Distributed: Ø some parts remote Ø distributed users Ø distributed execution Ø distributed data 27
Why Distribute? u No best organization u Companies constantly swing between Ø Ø u Centralized: focus, control, economy Decentralized: adaptive, responsive, competitive Why distribute? Ø Ø Ø reflect organization or application structure empower users / producers improve service (response / availability) distributed load use PC technology (economics) 28
What Should Be Distributed? u Users and User Interface Thin client Presentation Processing workflow Ø u Data Ø u Trim client Fat client Business Objects Database Will discuss tradeoffs later 29
Transparency in Distributed Systems u Make distributed system as easy to use and manage as a centralized system u Give a Single-System Image u Location transparency: Ø Ø Ø u hide fact that object is remote hide fact that object has moved hide fact that object is partitioned or replicated Name doesn’t change if object is replicated, partitioned or moved. 30
Outline Concepts and Terminology u Why Distribute u Distributed data & objects Ø Ø u Partitioned Replicated Distributed execution Ø Ø remote procedure call queues u Three tier architectures u Transaction concepts 44
Distributed Execution Threads and Messages u Thread is Execution unit threads (software analog of cpu+memory) u Threads execute at a node u Threads communicate via Ø Ø shared memory Shared memory (local) Messages (local and remote) messages 45
Peer-to-Peer or Client-Server u Peer-to-Peer is symmetric: Ø u Either side can send Client-server Ø Ø Ø client sends requests server sends responses simple subset of peer-to-peer req uest resp ons e 46
Remote Procedure Call: The key to transparency u Object may be local or remote u y = p. Obj->f(x); Methods on object work wherever it is. u Local invocation x f() return val; y = val; val 48
Remote Procedure Call: The key to transparency u Remote invocation y = p. Obj->f(x); x proxy Obj Local? x marshal stub x un marshal p. Obj->f(x) f() x Obj Local? f() return val; y = val; val return val; un marshal val 49
Object Request Broker (ORB) Orchestrates RPC u u u Registers Servers Manages pools of servers Connects clients to servers Does Naming, request-level authorization, Provides transaction coordination (new feature) Old names: Ø Ø Ø Transaction Processing Monitor, Web server, Transaction Net. Ware Object-Request Broker 50
History and Alphabet Soup CORBA Solaris 1995 Object Management Group (OMG) 1990 X/Open UNIX International 1985 Open software Foundation (OSF) Microsoft DCOM based on OSF-DCE Technology DCOM and Active. X extend it Open Group OSF DCE OD B XA C / TX CEC D P s R UID G L s ID NS ero D erb K COM NT 51
Active. X and COM u u COM is Microsoft model, engine inside OLE ALL Microsoftware is based on COM (Active. X) CORBA + Open. Doc is equivalent Heated debate over which is best Both share same key goals: Encapsulation: hide implementation Ø Polymorphism: generic operations key to GUI and reuse Ø Versioning: allow upgrades Ø Transparency: local/remote Ø Security: invocation can be remote Ø Shrink-wrap: minimal inheritance Ø Automation: easy Ø u COM now managed by the Open Group
Linking And Embedding Objects are data modules; transactions are execution modules u Link: pointer to object somewhere else Ø Think u u URL in Internet Embed: bytes are here Objects may be active; can callback to subscribers
Bottom Line Re ORBs u u Microsoft Promises Cairo distributed objects, secure, transparent, fast invocation Netscape promises the CORBA Both will deliver Customers can pick the best one Transaction Object-Request Broker 54
Outline Concepts and Terminology u Why Distributed u Distributed data & objects u Distributed execution Ø Ø u Three tier architectures Ø Ø u remote procedure call queues what why Transaction concepts 57
Work Distribution Spectrum u u Presentation and plug-ins Workflow manages session & invokes objects Business objects Database Fat Thin Presentation workflow Business Objects Database Fat Thin 61
Transaction Processing Evolution to Three Tier Intelligence migrated to clients. Mainframe u Mainframe Batch processing (centralized) u Dumb terminals & Remote Job Entry u u cards green screen 3270 TP Monitor Intelligent terminals database backends Workflow Systems Object Request Brokers Application Generators Server Active ORB 62
Web Evolution to Three Tier Intelligence migrated to clients (like TP) Web u Character-mode clients, smart servers WAIS Server archie ghopher green screen Mosaic u GUI Browsers - Web file servers u GUI Plugins - Web dispatchers - CGI u Smart clients - Web dispatcher (ORB) pools of app servers (ISAPI, Viper) workflow scripts at client & server NS & IE Active 63
PC Evolution to Three Tier u Intelligence migrated to server Stand-alone PC (centralized) u PC + File & print server message per I/O u PC + Database server message per SQL statement u PC + App server u Active. X Client, ORB Active. X server, Xscript message per transaction IO request reply disk I/O SQL Statement Transaction 64
The Pattern: Three Tier Computing Presentation u Clients do presentation, gather input u Clients do some workflow (Xscript) u Clients send high-level requests to ORB (Object Request Broker) u ORB dispatches workflows and business objects -- proxies for client, Business Objects orchestrate flows & queues u Server-side workflow scripts call on distributed business objects to execute task workflow Database 65
The Three Tiers Web Client HTML VB Java plug-ins VBscritpt Java. Scrpt VB or Java Script Engine Middleware Object server Pool VB or Java Virt Machine Internet HTTP+ DCOM ORB TP Monitor Web Server. . . Object & Data server. DCOM (ole. DB, ODBC, . . . ) 2 6. U L IBM Legacy Gateways 66
Why Did Everyone Go To Three-Tier? u Manageability Ø Ø u Business rules must be with data Middleware operations tools Performance (scaleability) Ø Server resources are precious ORB dispatches requests to server pools Technology & Physics Ø Ø Presentation Put UI processing near user Put shared data processing near shared data workflow Business Objects Database 67
Why Put Business Objects at Server? MOM’s Business Objects DAD’s. Raw Data Customer comes to store Takes what he wants Fills out invoice Leaves money for goods Easy to build No clerks Customer comes to store with list Gives list to clerk Clerk gets goods, makes invoice Customer pays clerk, gets goods Easy to manage Clerks controls access Encapsulation 68
What Middleware Does ORB, TP Monitor, Workflow Mgr, Web Server u u Registers transaction programs workflow and business objects (DLLs) Pre-allocates server pools Provides server execution environment Dynamically checks authority (request-level security) u u Does parameter binding Dispatches requests to servers Ø parameter binding Ø load balancing Provides Queues Operator interface 69
Server Side Objects u Easy Server-Side Execution A Server ORB gives simple execution environment Object gets Network Ø Ø u u start invoke shutdown Everything else is automatic Drag & Drop Business Objects Queue Connections Context Security Thread Pool Configuration Ø Management u Receiver Service logic Synchronization Shared Data 70
A new programming paradigm u u u Develop object on the desktop Better yet: download them from the Net Script work flows as method invocations All on desktop Then, move work flows and objects to server(s) Gives Ødesktop development Øthree-tier deployment ØSoftware Cyberbricks
Why Server Pools? u Server resources are precious. u Pre-allocate everything on server Clients have 100 x more power than server. Ø Ø u Keep high duty-cycle on objects (re-use them) Ø u preallocate memory pre-open files pre-allocate threads N clients x N Servers x F files = N x F file opens!!! pre-open and authenticate clients Pool threads, not one per client Classic example: TPC-C benchmark IE Ø 2 processes Ø everything pre-allocated Pool of DBC links HTTP 7, 000 clients IIS SQL 72
Ø Ø Ø u u order entry , payment , status (oltp) delivery (mini-batch) restock (mini-DSS) Metrics: Throughput, Price/Performance Shows best practices: Ø everyone three tier Ø Ø 2 processes at server everything pre-allocated HTTP u Transaction Processing Performance Council (TPC): standard performance benchmarks 5 transaction types IIS = Web Pool of DBC links ODBC u Classic Three-Tier Example TPC-C 7, 000 Web clients SQL 73
Outline u Laws & micro$/transaction u Distributed Systems Ø Why Distributed Ø Distributed data & objects Ø Distributed execution Ø Three tier architectures Ø why: manageability & performance Ø what: server side workflows & objects Ø Transaction concepts Ø Why transactions? Ø Using transactions 75
Thesis u u Transactions are key to structuring distributed applications ACID properties ease exception handling Ø Ø Atomic: all or nothing Consistent: state transformation Isolated: no concurrency anomalies Durable: committed transaction effects persist 76
What Is A Transaction? u Programmer’s view: Ø u Bracket a collection of actions A simple failure model Ø Only two outcomes: Begin() action Commit() Success! Begin() action action Rollback() Fail ! Failure! 77
Why ACID For Client/Server And Distributed u u u ACID is important for centralized systems Failures in centralized systems are simpler In distributed systems: Ø Ø u More and more-independent failures ACID is harder to implement That makes it even MORE IMPORTANT Ø Ø Simple failure model Simple repair model 81
Outline u u u Why Distributed data & objects Distributed execution Three tier architectures Transaction concepts Ø Why transactions? Ø Using transactions Ø Ø programming workflow 90
References u Essential Client/Server Survival Guide 2 nd ed. Ø u Client/Server Programming with Java and CORBA Ø u Orfali, Harkey, J Wiley, 1997 Principles of Transaction Processing Ø u Orfali, Harkey & Edwards, J. Wiley, 1996 Bernstein & Newcomer, Morgan Kaufmann, 1997 Transaction Processing Concepts and Techniques Ø Gray & Reuter, Morgan Kaufmann, 1993 91
™ 92