d52688436a7db64dbbc86212dc463863.ppt
- Количество слайдов: 25
性能测试那些事儿 刘博 boliu@thoughtworks. com.
WHERE WE ARE 3 1 BASIC CONCEPT 3 2 TROUBLESHOOTING 1
WHAT IS PERFORMANCE TESTING? Ø To determine how a system performs in terms of responsiveness and stability under a particular workload. Ø To investigate, measure, validate or verify other quality attributes of the system, such as scalability, reliability and resource usage.
PERFORMANCE TESTING TYPES Ø Load testing Ø Configuration testing Ø Isolation testing Ø Capacity testing Ø Stress testing Ø Soak testing Ø Spike testing
PERFORMANCE TESTING TOOLS Ø Neoload, Load. Runner Ø Silk Performer, Rational Performance Tester Ø Load. UI, Gatling, Grinder, JMeter
PERFORMANCE TESTING S. O. P Ø Identify Performance Acceptance Criteria Ø Plan and Design Tests Ø Identify, Configure and Validate the Test Environment Ø Implement, Validate and Verify the Test Script Ø Execute the Test (Warm-up first) Ø Analyze Results, Tune, and Retest
KEY MEASUREMENTS Ø Hardware Resource v CPU (Context Switches/sec, Processor Queue Length) v Memory (Pages/sec) v IO (Average Disk Queue Length, Network Usage) or IOPS Ø Software Resources v Web Server v Database v Customized Performance Counters v x. VM v Logs
KEY MEASUREMENTS Ø Monitoring Tools v App. Dynamics, Dynatrace, New Relic, One APM v Performance counter tool along with testing tools or OS v Zabbix, nagois
KEY FACTOR AS/FOR PERFORMANCE ENGINEER Ø ALL-ROUND Ø For the target system v Architecture Design v Cluster Configuration v Network Topology v Capacity of Test Agents v Communication
PSEUDO PRODUCT 1
CASE 1 – PHONE INTERVIEW SLOWS DOWN Ø Key Measurements v Get Sample v Start Interview v Page to Page v End Interview
CASE 1 – PHONE INTERVIEW SLOWS DOWN Ø Performance degrade ~10% steadily with build 0615, only on page to page time Ø CPU usage ~10% higher in build 0615 Ø No such issue with build 0501 Ø No error in logs Ø No such issue on Web Interview Ø ~ 150 bugs fixed between 0501 and 0615 Ø No performance bug fixed between 0501 and 0615
CASE 1 – ROOT CAUSE Ø One base class in common framework modified with extra features, which is NOT supposed to be used by Phone Interview, causes unnecessary load/unload operations in Next/Previous page operations v Simulate clicking to Next/Previous page operations is ultra frequently especially under heavy load Ø Actions?
PSEUDO PRODUCT 1
CASE 2 – WEB INTERVIEW TIMES OUT Ø Lots of Web Interviews timed out in production randomly Ø After a restart everything’s fine but as time goes on, the error recurs Ø Error calling WS method 'Method'. URL 'URL', Error codes: Client 5, HTTP -1, SOAP 0, TCP 10048 Ø IIS works well Ø Load is heavy sometimes but not exceeds upper limit Ø Cannot reproduce with given load/scenario in house Ø Not related with anti-virus software or firewall
CASE 2 – CONTINUE INVESTIGATION Ø Increase load and monitor pages/sec from customized counters v Drop down dramatically when the issue reproduced Ø Then web tier server could only handle interview in slow rate Ø Drill down to the entire web interview process in back end, i. e. from client, to web server, and then to interview server v Every request to web server will open a new TCP port! v netstat -an
CASE 2 – ROOT CAUSE Ø TCP port exhaustion on web tier server v Default release time for TCP TIME_WAIT is 4 minutes in Windows Ø Actions?
PSEUDO PRODUCT 2 17
CASE 3 – ERRORS IN MULTI-TENANT ONLY Ø Error occurs in 10 minutes accurately with multi-tenant Ø No such issue with single-tenant Ø Massive errors in logs - not so helpful Ø CPU Usage is higher than single-tenant Ø GC Activity is much higher (5% to 10% in CPU time) Ø No use to adjust -Xmx since physical memory is not the bottleneck 18
CASE 3 – CONTINUE INVESTIGATION 19
CASE 3 – CONTINUE INVESTIGATION Ø Architect team guarantees this issue is not relevant with single or multiple tenant Ø System. gc() is called explicitly in code but exists for long time Ø System. gc() is called only under specified condition out of test scope Ø Check Oracle Java Doc on GC policy and confirmed using correct one Ø Check JVM startup parameters with Ops 20
CASE 3 – CONTINUE INVESTIGATION 21
CASE 3 – CONTINUE INVESTIGATION 22
CASE 3 – ROOT CAUSE Ø JVM startup parameter configuration on multi-tenant v Add -XX: New. Ratio to adjust young generation and old generation to avoid frequently GC Ø Actions? 23
THANK YOU Q&A
d52688436a7db64dbbc86212dc463863.ppt