3cca96c025f97eae0b0d42a186a9c504.ppt
- Количество слайдов: 25
Team 2: The House Party Blackjack Mohammad Ahmad Jun Han Joohoon Lee Paul Cheong Suk Chan Kang
Team Members Hwi Cheong (Paul) hcheong@andrew. cmu. edu Mohammad Ahmad mohman@cmu. edu Jun Han junhan@andrew. cmu. edu Joohoon Lee jool@ece. cmu. edu Suk. Chan Kang sckang@andrew. cmu. edu
Baseline Application n Blackjack game application q q n User can create tables and play Blackjack. User can create/retrieve profiles. Configuration q q q Operating System: Linux Middleware: Enterprise Java Beans (EJB) Application Development Language: Java Database: My. SQL Servers: JBOSS J 2 EE 1. 4
Baseline Architecture n n Three-tier system Server completely stateless Hard-coded server name into clients Every client talks to Host. Bean (session)
Fault-Tolerant Design n Passive replication q Completely stateless servers n n n q n No need to transfer states from primary to backup All states stored in database Only one instance of Host. Bean (session bean) needed to handle multiple client invocations efficient on server-side Degree of replication depends on number of available machines Sacred machines q q q Replication Manager (chess) my. SQL database (mahjongg) Clients
Replication Manager n n Responsible for server availability notification and recovery Server availability notification q q n Server notifies Replication Manager during boot. Replication Manager pings each available server periodically. Server recovery q q Process fault: pinging fails; reboot server by sending script to machine Machine fault (Crash fault): pinging fails; sending script does nothing; machine has to be booted and server has to be manually launched.
Replication Manager (cont’d) n Client-RM communication q q Client contacts Replication Manager each time it fails over Client quits when Replication Manager returns no server or Replication Manager can’t be reached.
Evaluation of Performance (without failover)
Observable Trend
Failover Mechanism n n n Server process is killed. Client receives a Remote. Exception Client contacts Replication Manager and asks for a new server. Replication Manager gives the client a new server. Client remakes invocation to new server Replication Manager sends script to recover crashed server
Failover Experiment Setup n n n 3 servers initially available Replication Manager on chess 30 fault injections Client keeps making invocations until 30 failovers are complete. 4 probes on server, 3 probes on client to calculatency
Failover Experiment Result Latency (ms) Invocation #
Failover Experiment Results n n n Maximum jitter: ~700 ms Minimum jitter: ~300 ms Average failover time: ~ 404 ms
Failover Pie-chart Most of latency comes from getting an exception from server and connecting to the new server
Real-time Fault-Tolerant Baseline Architecture Improvements n Fail-over time Improvements q Saving list of servers in client n q Pre-creating host beans n n Reduces time communicating with replication manager Client will create host beans on all servers as soon as it receives list from replication manager Runtime Improvements q Caching on the server side
Client-RM and Client-Server Improvements n Client-RM and Client-Server communication q q q Client contacts Replication Manager each time it runs out of servers to receive a list of available servers. Client connects to all servers in the list and makes a host beans in them, then starts the application with one server During each failover, client connects to the next server in the list. No looping inside list Client quits when Replication Manager returns an empty list of servers or Replication Manager can’t be reached.
Real-time Server n Caching in server q q q Saves commonly accessed database data in server Use Hashmap to map query to previously retrieved data. O(1) performance for caching
Real-time Failover Experiment Setup n n n n 3 servers initially available Replication Manager on chess 30 fault injections Client keeps making invocations until 30 failovers are complete. 4 probes on server, 5 probes on client to calculatency and naming service time Client probes q Probes around get. Player. Name() and get. Table. Name() q Probes around get. Host() – for failover Server probes q Record source of invocation – name of method q Record invocation arrival and result return times
Real-time Failover Experiment Results Latency (ms) Invocation #
Real-time Failover Experiment Results Average failover time: 217 ms n q n Half the latency without improvements (404 ms) Non-failover RTT is visibly lower (shown on graphs below) Before Real-Time Implementation After Real-Time Implementation
Real-time Failover Experiment Results
Open Issues n n n Blackjack game GUI Load-balancing using Replication Manager Multiple number of clients per table (JMS) Profiling on JBoss to help improve performance Generating a more realistic workload Timeout. Exception
Conclusions n What we have accomplished q Fault-tolerant system with automatic server detection and recovery q Our real-time implementations proved to be successful in improving failover time as well as general performance n What we have learned q Merging code can be a pain. q A stateless bean are accessed by multiple clients. q State can exist even in stateless beans and is useful if accessed by all clients cache! n What we would do differently q Start evaluation earlier… q Put more effort and time into implementing timeout’s to enable bounded detection of server failure.
3cca96c025f97eae0b0d42a186a9c504.ppt