Скачать презентацию ADSa T Scalability Availability Paul Greenfield CSIRO Скачать презентацию ADSa T Scalability Availability Paul Greenfield CSIRO

56b31e8641f2222c7042a88252529415.ppt

  • Количество слайдов: 41

ADSa. T Scalability & Availability Paul Greenfield CSIRO Advanced Distributed Software Architectures and Technology ADSa. T Scalability & Availability Paul Greenfield CSIRO Advanced Distributed Software Architectures and Technology group 1

ADSa. T Building Real Systems • Scalable – Fast enough to handle expected load ADSa. T Building Real Systems • Scalable – Fast enough to handle expected load – Grow easily when load grows • Available – Available enough of the time • Performance and availability cost – Aim for ‘enough’ of each but not more Advanced Distributed Software Architectures and Technology group 2

ADSa. T Scalable • Scale-up – Bigger and faster systems • Scale-out – Systems ADSa. T Scalable • Scale-up – Bigger and faster systems • Scale-out – Systems working to handle load – Server farms – Clusters • Implications for application design Advanced Distributed Software Architectures and Technology group 3

ADSa. T Available • Goal is 100% availability – 24 x 7 operations • ADSa. T Available • Goal is 100% availability – 24 x 7 operations • Redundancy is the key – No single points of failure – Spare everything • Disks, disk channels, processors, power supplies, fans, memory, . . • Automated fail-over and recovery Advanced Distributed Software Architectures and Technology group 4

ADSa. T Performance • How fast is this system? – Not the same as ADSa. T Performance • How fast is this system? – Not the same as scalability but related • Scalability is concerned with the limits to possible performance – Measured by response time and throughput – Aim for enough performance • Have a performance target • Tune and add hardware until target hit • Then worry about tomorrow… Advanced Distributed Software Architectures and Technology group 5

ADSa. T Performance Measures • Response time – What delay does the user see? ADSa. T Performance Measures • Response time – What delay does the user see? – Instantaneous is good but 95% under 2 seconds is acceptable – Response time varies with ‘heaviness’ of transactions • Fast read-only transactions • Slower update transactions • Effects of database contention Advanced Distributed Software Architectures and Technology group 6

ADSa. T Response Times Advanced Distributed Software Architectures and Technology group 7 ADSa. T Response Times Advanced Distributed Software Architectures and Technology group 7

ADSa. T Response Times Advanced Distributed Software Architectures and Technology group 8 ADSa. T Response Times Advanced Distributed Software Architectures and Technology group 8

ADSa. T Response Times Advanced Distributed Software Architectures and Technology group 9 ADSa. T Response Times Advanced Distributed Software Architectures and Technology group 9

ADSa. T Throughput • How many transactions can be handled in some period of ADSa. T Throughput • How many transactions can be handled in some period of time – Transactions/second or tpm, tph or tpd – A measure of overall capacity • Transaction Processing Council – – Standard benchmarks for TP systems TPCC for typical transaction system www. tpc. org Current record is 227, 000 tpmc Advanced Distributed Software Architectures and Technology group 10

ADSa. T Throughput • Throughput increases until some resource limit is hit – Adding ADSa. T Throughput • Throughput increases until some resource limit is hit – Adding more clients just increases the response time – Run out of processor, disk bandwidth, network bandwidth – Some resources overload badly • Ethernet network performance degrades Advanced Distributed Software Architectures and Technology group 11

ADSa. T Throughput Advanced Distributed Software Architectures and Technology group 12 ADSa. T Throughput Advanced Distributed Software Architectures and Technology group 12

ADSa. T System Capacity • How many clients can you support? – Name an ADSa. T System Capacity • How many clients can you support? – Name an acceptable response time – Average 95% under 2 secs is common • And what is ‘average’? – Plot response time vs # of clients • Great if you can run benchmarks – Reason for prototyping and proving proposed architectures before leaping into full-scale implementation Advanced Distributed Software Architectures and Technology group 13

ADSa. T System Capacity Advanced Distributed Software Architectures and Technology group 14 ADSa. T System Capacity Advanced Distributed Software Architectures and Technology group 14

ADSa. T Load Balancing I • A few different but related meanings • 1. ADSa. T Load Balancing I • A few different but related meanings • 1. Balancing across server processes – CORBA-style where clients use objects that live inside server processes – Want all server processes to be busy – Client calls have to go to the process containing their object, even if this process is busy and others are idle Advanced Distributed Software Architectures and Technology group 15

ADSa. T Load Balancing I Advanced Distributed Software Architectures and Technology group 16 ADSa. T Load Balancing I Advanced Distributed Software Architectures and Technology group 16

ADSa. T Load Balancing I • Client calls on name server to find the ADSa. T Load Balancing I • Client calls on name server to find the location of a suitable server • Name server can spread client objects across multiple servers – Often ‘round robin’ • Client is bound to server and stays bound forever – Can lead to performance problems Advanced Distributed Software Architectures and Technology group 17

ADSa. T Load Balancing I Server Object Reference Client Numbers Total Clients per server ADSa. T Load Balancing I Server Object Reference Client Numbers Total Clients per server object 1 1 -100, 201, 206, 211, …. 496 160 2 101 -200, 202, 207, 212, …, 497 160 100 3 203, 208, 213, …, 498 60 301 -400 100 4 204, 209, 214, …, 499 60 401 -500 100 5 205, 210, 215, …, 500 60 Server Object Reference Client Numbers Total Clients per server object 1 1 -100 2 101 -200 100 3 201 -300 4 5 Initial Later Advanced Distributed Software Architectures and Technology group 18

ADSa. T Load Balancing I • Solution to static allocation problem is for clients ADSa. T Load Balancing I • Solution to static allocation problem is for clients to throw away their server objects and get new ones every now and again • Application coding problem – And can be objects be discarded? – What kind of ‘objects’ are they if they can be discarded? Advanced Distributed Software Architectures and Technology group 19

ADSa. T Name Servers • Server processes call name server when they come up ADSa. T Name Servers • Server processes call name server when they come up – Advertising their services • Clients call name server to find the location of a server process – Up to the name server to match clients to servers • Client calls server process to create objects Advanced Distributed Software Architectures and Technology group 20

ADSa. T Load Balancing I Name Server Request server reference Return server reference Get ADSa. T Load Balancing I Name Server Request server reference Return server reference Get server object reference Client Call server object’s methods Client Advertise service Server process Load balancing across processes within a server Advanced Distributed Software Architectures and Technology group 21

ADSa. T Load Balancing II • What happens when our single system is full? ADSa. T Load Balancing II • What happens when our single system is full? – Use faster systems • Scale-up – Use additional systems • Scale-out • Now load-balancing is used to spread load across systems Advanced Distributed Software Architectures and Technology group 22

ADSa. T Load Balancing II • CORBA world… – Name server can distribute across ADSa. T Load Balancing II • CORBA world… – Name server can distribute across server processes running on different systems – Scales well… • Name server only involved when handing out a reference to a server, not on every method call Advanced Distributed Software Architectures and Technology group 23

ADSa. T Load Balancing II Name Server Request server reference Return server reference Get ADSa. T Load Balancing II Name Server Request server reference Return server reference Get server object reference Client Advertise service Server process Call server object’s methods Client Load balancing across multiple systems Server process Advanced Distributed Software Architectures and Technology group 24

ADSa. T Load Balancing II • COM+ world… – No need for load-balancing within ADSa. T Load Balancing II • COM+ world… – No need for load-balancing within a system • Multithreaded server process • All objects live in a single process space – Component load balancing across systems • Client calls router when creating object • Router returns reference to an object in a COM+ server process • Load balanced at time of object creation Advanced Distributed Software Architectures and Technology group 25

ADSa. T Load Balancing II MTS process Client D C O M / M ADSa. T Load Balancing II MTS process Client D C O M / M T S A pp D LL Thread pool Shared object space Application code COM+/MTS using thread pools rather than load balancing within a single system Advanced Distributed Software Architectures and Technology group 26

COM+ Component Load Balancing ADSa. T Response time tracker Create object Client Router Pass COM+ Component Load Balancing ADSa. T Response time tracker Create object Client Router Pass request to server Create object and pass back reference Call object’s methods Client COM + CLB balancing load across multiple systems Advanced Distributed Software Architectures and Technology group 27

ADSa. T Load Balancing II • COM+ scales well… – Router only involved when ADSa. T Load Balancing II • COM+ scales well… – Router only involved when object is created • May change in later release to support dynamic re-balancing as server load changes – Method calls direct from client to server – Allocation based on response time rather than round-robin • Allocate to least-loaded server Advanced Distributed Software Architectures and Technology group 28

ADSa. T Load Balancing II • No name server in COM world? – COM/MTS ADSa. T Load Balancing II • No name server in COM world? – COM/MTS clients ‘know’ the name of the server • Set at client installation time • Can change using GUI tools • Admin problem if server app is moved – COM+ uses Active Directory to find services Advanced Distributed Software Architectures and Technology group 29

ADSa. T Load Balancing II • Some systems involve the router in every method ADSa. T Load Balancing II • Some systems involve the router in every method call/request – Request goes to router process who then passes it on to a server process – Scales poorly as the router can be a major bottle-neck – Some availability concerns as well • What happens if the router fails? Advanced Distributed Software Architectures and Technology group 30

ADSa. T Load Balancing II Server process Client Router Client Server process Client Load ADSa. T Load Balancing II Server process Client Router Client Server process Client Load balancing with router in main call path Advanced Distributed Software Architectures and Technology group 31

ADSa. T Scale-up • No need for load-balancing across systems • Just use a ADSa. T Scale-up • No need for load-balancing across systems • Just use a bigger box – Add processors, memory, …. – SMP (symmetric multiprocessing) • Runs into limits eventually • Could be less available Advanced Distributed Software Architectures and Technology group 32

ADSa. T Scale-up • Example from the Web – Large auction site – Server ADSa. T Scale-up • Example from the Web – Large auction site – Server farm of NT boxes (scale-out) – Single database server (scale-up) • 64 -processor SUN box – More capacity needed? • Add more NT boxes easily • SUN box is full so have to shift some databases to another box Advanced Distributed Software Architectures and Technology group 33

ADSa. T Clusters • A group of independent computers acting like a single system ADSa. T Clusters • A group of independent computers acting like a single system – – – Shared disks Single IP address Single set of services Fail-over to other members of cluster Load sharing within the cluster DEC, IBM, MS, … Advanced Distributed Software Architectures and Technology group 34

ADSa. T Clusters Client PCs Server A Server B Heartbeat Disk cabinet A Cluster ADSa. T Clusters Client PCs Server A Server B Heartbeat Disk cabinet A Cluster management Disk cabinet B Advanced Distributed Software Architectures and Technology group 35

ADSa. T Clusters • Address scalability – Add more boxes to the cluster • ADSa. T Clusters • Address scalability – Add more boxes to the cluster • Address availability – Fail-over – Add & remove boxes from the cluster for upgrades and maintenance • Can be used as one element of a highly-available system Advanced Distributed Software Architectures and Technology group 36

ADSa. T Web Server Farms • Web servers are highly scalable – Web applications ADSa. T Web Server Farms • Web servers are highly scalable – Web applications are normally stateless • Next request can go to any Web server • State comes from client or database – Just need to spread incoming requests • IP sprayers (hardware, software) • >1 Web server looking at same IP address with some coordination (see MS WLB docs) – Same technique for other network apps Advanced Distributed Software Architectures and Technology group 37

ADSa. T Available System Web Clients Web Servers Load balanced using Convoy App Servers ADSa. T Available System Web Clients Web Servers Load balanced using Convoy App Servers use COM+ LB Database is installed on Wolfpack cluster for high availability COM+ LBS router node Advanced Distributed Software Architectures and Technology group 38

ADSa. T Availability • How much? – 99% – 99. 99% 87. 6 hours ADSa. T Availability • How much? – 99% – 99. 99% 87. 6 hours a year 8. 76 hours a year 0. 876 hours a year • Need to consider operations as well – Maintenance, software upgrades, backups, application changes – Not just faults and recovery time Advanced Distributed Software Architectures and Technology group 39

ADSa. T Availability and Scalability • Often a question of application design – Stateful ADSa. T Availability and Scalability • Often a question of application design – Stateful vs stateless • What happens if a server fails? • Can requests go to any server? – What language and database API • Balance cost vs speed – VB/C++ - ODBC/ADO – Synchronous method calls or asynchronous messaging? • Reduce dependency between components • Failure tolerant designs Advanced Distributed Software Architectures and Technology group 40

ADSa. T Next Week • Distributed application architectures – How to design systems that ADSa. T Next Week • Distributed application architectures – How to design systems that will work, scale and be available – Web-based systems – Web technology Advanced Distributed Software Architectures and Technology group 41