e314d12053fb9f938db8b774f7ded921.ppt
- Количество слайдов: 71
Advancing Medical Equipment Maintenance using RCM Methodology Malcolm G. Ridgway, Ph. D. , CCE Senior Vice President, Technology Management Masterplan, Inc. , Chatsworth, California 1
How A Machine Fails Traditional / Classical Concept (Pre-1945) 2
First Generation Maintenance (Pre-1945) Was – like the machines – relatively simple. Primary maintenance strategy was “keep it looking sharp” and “Run To Failure” Primary maintenance tool was an oily rag 3
How A Machine Fails Second Generation Concept The “Bath Tub” Curve 4
Second Generation Maintenance (1945 - 60) Was – like the machines – a little more complex because the consequences of unreliable machines had become more serious (economically). Maintenance strategy – Fixed Interval Overhauls PM was still relatively primitive – more of a craft than a science, and based on the manufacturer’s experience-based (? ) recommendations. 5
Third Generation Maintenance (1960 s) Became – like the machines – considerably more complex. The civil aviation industry became the driver on machine reliability because of the FAA’s concerns for the public safety 1960 - FAA established a Task Force which became known as the Maintenance Steering Group (MSG) 1968 – Landmark document (MSG-1) revolutionized the maintenance business and made the 747 viable 6
How Machines Really Fail Third Generation Concept Based on FAA data 7
In the case of aircraft components ü Only 6% show a wear-out failure (Type B) pattern ü And only 14% have a random failure (Type E) pattern Whereas ü 72% show an infant mortality (Type F) characteristic 8
The Famous Moment of Enlightenment in the 1960 s… . . . About Scheduled Maintenance 9
More frequent PM can lead to lower reliability !! 10
How This New Approach To Maintenance Made Jumbo Jets Economically Feasible DC 8 – Required the scheduled overhaul of 339 items and 4 M man-hours of maintenance prior to its 20, 000 hour inspection DC 10 – Required the scheduled overhaul of 7 items and 66 K man-hours of maintenance prior to its 20, 000 hour inspection The DC 10 is 3 X larger, more complex, and 200 X more reliable than the DC 8 The “event” rate of the DC 8 is 60 per million takeoffs; The “event” rate of the DC 10 is 0. 3 per million takeoffs. 11
The 1970 s Introduction of the systems approach to maintenance 1974 – DOD contracted with United Airlines to document the maintenance processes being used by the civil aviation industry, and directed that the new approach embodied in the pioneering new concepts be labeled Reliability-Centered Maintenance (RCM). 1978 – Publication of the book “Reliability-Centered Maintenance” by Stanley Nowlan and Howard Heap. 12
Explosive growth of RCM during the 80 s & 90 s The military adopts RCM for its ships (including its nuclear submarines) and its aircraft NASA joins in with its Shuttle Program The utility industry adopts RCM for many of its power stations, including its nuclear power plants. 1982 – MSG-3 rev 2 Type Certification for the 757/ 767 13
What Exactly Is Reliability-Centered Maintenance? q Uses processes based on modern reliability analyses q Considers the entire system: equipment; accessories; user; maintainer; environment; utilities; & the patient q Focuses on maintaining the device’s function with minimum downtime and acceptable levels of safety q Uses FMEA to define what can go wrong and why q Uses precise effectiveness metrics and criteria for whether or not proactive maintenance is cost effective q If interval-based maintenance is feasible, it provides precise formulas for what the intervals should be 14
Benefits (claimed to result) from using RCM 1. Increased reliability – 50 -70% reduction in repairs 2. Increased availability – 25 -50% reduction in downtime 3. Greater maintenance cost effectiveness 4. Improved levels of safety 5. Longer useful life of maintained items 6. Creation of comprehensive maintenance databases 15
Current Joint Commission Standards Standard EC. 02. 04. 01 The hospital manages medical equipment risks Elements of Performance for EC. 02. 04. 01 3. The hospital identifies the activities, in writing, for maintaining, inspecting, and testing for all medical equipment on the inventory Note: Hospitals may use different strategies for different items, as appropriate. For example, strategies such as predictive maintenance, reliability-centered maintenance, interval-based inspections, corrective maintenance, or metered maintenance may be selected to ensure reliable performance. 16
Reality Check • Maintenance (particularly PM) is an issue of declining importance - relative to several other equipment issues (such as use errors and network connectivity) • But we are still dedicating an estimated 3000 FTEs (costing about $300 M /year) to our PM programs • We could (and should) be doing something more productive and more valuable with these resources ! 17
Key PM Issues 1. We still do not have a good consensus on what we mean by the term “PM”, or even why we do it ! 2. Although the Joint Commission has allowed us to exclude “non-critical” devices from our PM programs since 1989, we still don’t have a rational definition for a non-critical/ non-life-support device. 3. We don’t have any good methods for justifying the PM intervals that we use. 4. The PM procedures that most of us use could be improved. 18
What Causes Equipment To Fail? (1) 1) Progressive wear or deterioration of a component part 2) Random failure of a component part 3) Poor fabrication or assembly of the hardware 4) Poor design of the system (hardware or processes) 5) Subjecting the device to physical stress outside its design tolerances 6) Exposing the device to environmental stress outside its design tolerances 19
What Causes Equipment To Fail? (2) 7) Incorrect set up or operation of the device by the user 8) The use of a wrong or defective accessory 9) Poor or incomplete initial set-up or installation, or a poor quality previous repair 10) Human interference with the device including (possibly) earlier intrusive PM 11) Only the first and (possibly) the last of these could be classed as maintenance-related failures 20
Hidden failures q Equipment failures are either likely to be noticed (they are evident…i. e. overt) or they are hidden. q Ideally, devices that are safety-critical or downtime-critical and that have hidden failure modes i. e. failures that are unlikely to be noticed by the “operating crew” should be provided with special protection mechanisms. q It is important to subject devices that are safety critical or downtime-critical and that have hidden failure modes, without reliable special protection mechanisms , to appropriate performance and safety testing. 21
Special Protection Mechanisms 1) Operator warning devices 2) Automatic shut-down devices 3) Automatic relief devices 4) Dual components for functional redundancy 5) Guard mechanisms 6) Special concern = “multiple failures” = failure modes within the protection mechanisms 22
PM Basics – Why do we do it? • PM should address: 1. Failures that result from the degradation of the device’s non-durable parts and 2. Detecting the presence of hidden failures. • • • PM cannot and does not prevent all types of equipment failures. There are several other, more common, causes of device failure. Very important PM issue = hidden failures of any special protection mechanisms 23
What does PM achieve? • PM prevents some equipment failures and the associated downtime. • It creates a certain (usually unspecified) level of confidence that the devices tested are safe (because they are not in a hidden failed state). 24
Indirect benefits of PM programs 1. 2. 3. Finding failed or damaged devices that have not been reported as needing to be repaired Periodically confirming that the devices are actually still present in the facility Providing some level of comfort and security that everything possible is being done to maximize the level of equipment safety. 25
What PM does not achieve? • PM cannot and does not prevent all equipment failures – only those that would have resulted from the degradation of the device’s non-durable parts. • PM cannot and does not mitigate the most common causes of adverse equipment-related accidents 26
The Bottom Line on PM • With respect to: • reducing the downtime of downtime-critical equipment, and • eliminating the most common causes of adverse equipment related incidents and accidents…. . • . . even a well implemented PM program provides only a relatively limited value – and it also has a cost • The more we can optimize the program and quantify the benefits, the easier it will be to balance the value gained from a well-implemented PM program against its cost 27
Better PM terminology • True preventive maintenance (TPM) = inspecting, cleaning, lubricating, adjusting or replacing the device’s non-durable parts… (aka scheduled restoration, scheduled discard tasks or predictive maintenance - JIT remediation via Condition Monitoring) • Performance verification and/or safety testing (PVST) = functional testing to detect hidden failures … (aka failure-finding tasks) 28
TPM = True Preventive Maintenance …is the inspection, cleaning, lubricating, adjustment or replacement of a device’s non-durable parts. Non-durable parts are those components of the device that have been identified either by the device manufacturer or by general industry experience as needing periodic attention, or being subject to functional deterioration and having a useful lifetime less than that of the complete device. Examples include filters, batteries, cables, bearings, gaskets, and flexible tubing. 29
Predictive Maintenance… …involves direct monitoring of some variable that will provide a reliable early warning that a non-durable part is about to fail (aka Condition Monitoring). An example might be using an oil contaminant sensor in your car’s engine lubricant to turn on a dashboard warning light to tell you when it is time to change your oil. At the moment this particular PM strategy probably has more potential in the physical plant area than in the biomedical area. Physical plant examples include: using vibration analysis to warn of bearing wear, and using infrared scanning to detect overheating in electrical switchgear 30
PVST = Performance Verification and Safety Testing …is functional testing to detect hidden failures. Examples of hidden failures include: Defibrillators that are delivering significantly less energy than they are set to deliver; heart rate alarms that do not alarm at the set threshold, and protective power cut-offs on hypo-hyperthermia machines that do not operate at the pre-set cut-off temperature. 31
32
33
34
Special features of the ASHE format • • The procedure number as a “universal product code” Separation of the TPM and PVST tasks Use of the Note box for concise reporting User tasks disclaimer 35
36
37
38
Repair Call Cause Coding 39
Repair Call Cause Coding Cat 1 Are the device and its accessories still working properly and safely? If yes, this a Category 1 failure (aka: use error; “cannot duplicate”). Cat 2. Is the device itself OK; the problem is due to use of a wrong or defective accessory or problem in a connected network? If … Cat 3. Is the problem due to physical stress? If … Cat 4. Is there evidence that this problem could be the result of a poor initial installation or an incomplete repair of a previous problem (a “run on”)? If …. Cat 5. Is there evidence that the failure was due to an out-of-tolerance ambient environmental condition? 40
Repair Call Cause Coding Cat 8. Is there evidence that the failure is due to a battery problem? If yes, …. Cat 7. Is there evidence that the failure was due to a lack of preventive maintenance? If yes, …. Cat 8. Is there evidence that the failure was caused by human interference e. g. earlier intrusive PM? If Cat 9. Is there any reason to believe that the failure was due to general wear and tear? If yes, …. Cat 0. The cause of failure is unknown (cannot be categorized). 41
Typical Cause Coding Analysis Code Cause of repair call Count %age Aust. 1 User-related 54 10. 2 14% 2 Accessory or connectivity 7 1. 3 3% 3 Physical stress-related 120 22. 8 25% 4 Run-on related 11 2. 1 1% 5 Environmental stress-related 13 2. 5 1% 6 Battery-related 32 6. 1 - 7 Inadequate PM-related 17 3. 2 1% 8 Human interference-related 0 0 0 9 Random, unpredictable failures 273 51. 8 52% 0 Uncategorized repair calls 527 100% 42
Some types of devices will benefit more than others from receiving PM: (1) Those with non-durable parts 1. 2. 3. 4. Identify all possible PM–preventable failure modes by examining each TPM task listed in the PM procedure Perform a PM Risk Analysis. Rank each failure mode according to the Level of Severity of its potential adverse consequences (LOS score). Estimate the MTBF (Likelihood of Occurrence score) (How far out is the knee on the Type B Failure Curve) Multiply the LOS score by the LOO score to determine the device’s PM Risk Score. 43
Classifying the Level of Severity (LOS) of any likely adverse consequences from (1) any non-durable parts-related failures LOS 4 A PM-preventable failure mode that could be lifethreatening or economically “catastrophic” ($$$$) 3 A PM-preventable failure mode that could cause an injury, have a major impact on patient care, or ($$$) 2 A PM-preventable failure mode that could have some impact on patient care, or facility economics ($$) 1 A PM-preventable failure mode that would have only a minor impact on patient care, or facility economics ($) 44
Adverse consequences of (overt) equipment failures Three different kinds of consequences: 1. Adverse safety consequences • Life-threatening (LOS = 4), safety-major concern (LOS=3), safety-moderate concern (LOS=2), safety-only minor concern 2. Adverse operational consequences (uptime) • Uptime-critical (LOS = 4), uptime-major concern (LOS = 3), uptime-moderate concern (LOS=2), etc 3. Adverse non-operational consequences (cost of repair) • Very high cost of repair (LOS = 4), high cost of repair (LOS=3), moderate cost of repair (LOS=2), etc 45
Adverse consequences of (overt) equipment failures Economic consequences: • Uptime-critical devices (LOS =4) • • Uptime-major concern devices (LOS =3) • • Sophisticated imaging devices, such as CT scanners Key devices with little or no back-up, such as large central sterilizers and automated lab analyzers High and very high cost of repair devices (LOS = 3 and 4) • Specialized devices, such as lasers, some sterilizers, some ventilators, etc. 46
Classifying the Likelihood of Failure (LOF) of (1) any non-durable parts LOF 4 Frequent. Wear-out type failure likely to occur within a one year period (MTBF of up to 1 year) 3 Occasional. Wear-out type failure likely to occur within a one to two year period (MTBF of between 1 and 2 years) 2 Uncommon. Wear-out type failure likely to occur within a two to five year period (MTBF of between 2 and 5 years) 1 Remote. Wear-out type failure not likely to occur within a five year period (MTBF of more than 5 years) 47
RCM Risk Score. Compounding Level of Severity (LOS) and Likelihood of Failure (LOF) LOS = 4 4 8 12 16 LOS = 3 3 6 9 12 LOS = 2 2 4 6 8 LOS = 1 1 2 3 4 LOF = 1 LOF = 2 LOF = 3 LOF = 4 “Remote” “Uncommon” “Occasional” “Frequent” 12 - 16 = Critical risk 6 – 9 = “Worth doing” 48
Some types of devices will benefit more than others from receiving PM: (2) Those with hidden failure modes 1. 2. 3. 4. Identify all possible hidden failure modes by examining each PVST task listed in the PM procedure Perform a PM Risk Analysis. Rank each hidden failure mode according to the Level of Severity of its potential adverse consequences (LOS Score). Rank the Likelihood of Failure of each hidden failure (LOF Score) by reviewing data on the “yield” of previous PVST testing (# of HFs/ device-year) Multiply the LOS Score by the LOF Score to determine the device’s PM Risk Score. 49
Classifying the Level of Severity (LOS) of any likely adverse consequences from (2) any hidden failures LOS 4 A hidden failure mode that could be life-threatening or economically “catastrophic” ($$$$s) 3 A hidden failure mode that could cause an injury or have a major impact on patient care (or $$$s) 2 A hidden failure mode that could have some moderate impact on patient care (or $$s) 1 A hidden failure mode that would have only a minor impact on patient care (or only $) 50
Adverse consequences of hidden equipment failures Safety consequences: • Safety-life-threatening devices (LOS =4) • Defibrillator with zero or very low output • Safety-major impact devices (LOS =3) • • Blood warmer with defective over-temp alarm Hypo/ hyperthermia with defective over-temp alarm or power cut-off mechanism 51
Classifying the Likelihood of Failure (LOF) of (2) any hidden failures LOO 4 Frequent. “Yield” or hidden failure discovery rate of more than 1 per device- year 3 Occasional. “Yield” or hidden failure discovery rate of 0. 5 – 1. 0 per device- year 2 Uncommon. “Yield” or hidden failure discovery rate of 0. 2 – 0. 5 per device- year 1 Remote. “Yield” or hidden failure discovery rate of less than 0. 2 per device- year 52
RCM Risk Score. Compounding Level of Severity (LOS) and Likelihood of Failure (LOF) LOS = 4 4 8 12 16 LOS = 3 3 6 9 12 LOS = 2 2 4 6 8 LOS = 1 1 2 3 4 LOF = 1 LOF = 2 LOF = 3 LOF = 4 “Remote” “Uncommon” “Occasional” “Frequent” 12 - 16 = Critical risk 6 – 9 = “Worth doing” 53
Classifying a device’s PM Priority according to its (worst-case) RCM Risk Score Risk PM Score Priority 12 -16 1 “Must-do PM” = (PM–Critical) 6 -9 2 PM judged to be “worth doing” 3 -4 3 1 -2 0 PM worth doing – if economics justify (3 A) – otherwise (3 B) RTF Do no PM = “Run to Failure” 54
Documenting the PM Risk Analysis (1) Note device type and PM procedure number For each TPM task statement • Describe briefly the severity of the consequence if this part degenerates either partially or totally • Is the LOS a 4, 3, 2 or 1? • Estimate the time lapse before this degeneration will occur. Is the LOF a 4, 3, 2, or 1? • What is the combined RCM Risk Score? • What is the corresponding PM Priority Level? 55
Documenting the PM Risk Analysis (2) For each PVST task statement • Describe briefly the hidden failure that this testing will detect and the severity of the consequences • Is the LOS a 4, 3, 2 or 1? • Consult database or estimate how often this failure is likely to occur. Is the LOF a 4, 3, 2, or 1? • What is the combined RCM Risk Score? • What is the corresponding PM Priority Level? If worst case is Priority 1, 2 or 3 A, which PM strategy will be implemented? If implementing fixed interval PM, what is the optimum? 56
Alternative PM strategies 1. Performing JIT TPM when indicated by direct condition monitoring (aka Predictive Maintenance) • 2. Optimum approach, but techniques are scarce Using JIT on-board automated or operatorimplemented performance and safety testing • optimum approach, but no techniques available (yet) 3. Using variable intervals based on usage (metered maintenance) 4. Using fixed intervals (prescriptive or optimized) • This is the traditional approach, favored by many regulators 5. Allowing • the device to Run-to-Failure Most cost-effective approach for PM Priority 3 B and 0 devices 57
Selecting the most cost-effective PM strategy § § If device is PM Priority 3 B or 0 – Use RTF Otherwise – select in the following order • • JIT TPM / JIT PVST (Predictive Maintenance) Metered maintenance Fixed interval (optimized) Fixed interval (prescriptive) 58
Infusion Pump Analysis 1. 2. 3. Using standard FMEA analysis from the classical RCM method, the Thorburn team from The Royal Adelaide Hospital in South Australia identified 145 potential failure modes. But only six were judged to be addressable by some kind of PM task One had a risk score of 8 (PM Priority 2) which the team described as “worth doing” 59
Metrics for Monitoring PM Effectiveness 1. What percentage of repair calls are caused by Category 7 failures (lack of PM) - and what percentage were considered to be in the highest Level of Severity? 2. The frequency of occurrence and level of potential severity of equipment-related patient incidents that were attributable to a hidden failure 60
Determining PM intervals How we do it now • Based on the Fennigkoh-Smith EM number (No-no) • Whatever the manufacturer recommends (? ) • Pursuant to the JC’s July 1, 2001 revision to EC. 1. 6. (f) and EC. 2. 10. 3. permitting “maintenance strategies” other than the traditional time-based inspection intervals. Text change from “apply professional judgment” to “data-driven decisions” (But which data and how? ) 61
Finding Optimum PM Intervals 1) For Predictive (On-Condition) Maintenance - this involves finding a condition monitoring technique with a long P – F (warning) interval 2) For TPM (aka scheduled restoration or scheduled discard tasks) – this requires knowledge of the device’s age-related failure pattern. 3) For PVST functional testing (aka failure-finding tasks) - this requires data on the device’s Mean Time Between Failures (MTBF). 62
Finding the Optimum PM Interval 2) For TPM (True Preventive Maintenance) • • • Requires knowledge of the device’s age -related failure pattern (interval exploration) The period between being put into service and the “knee” is called the Economic Life Limit. The most efficient interval is just less than 100% of the Economic Life Limit. 63
Age-related failure pattern • • The period between being put into service and the “knee” is called the Economic Life Limit. Most efficient interval is just less than 100% of the economic life limit. 64
Finding the Optimum PM Interval 3) For PVST (functional testing) • • • Requires knowledge of the failure mode’s mean time between failures (MTBF) – from PM testing database And what level of confidence (LOC) is desired that the device is in a “safe operating condition” (SOC)? These two factors set the maximum testing interval. 65
Hypothetical data from 4 years of PM testing 100 devices were checked annually for 4 years Hidden failure (e. g. high leakage current) found 16 times MTBF = 400 (device-years)/ 16 = 25 years From this data we can establish a statistical probability (level of confidence) that, between the tests, one of these devices was actually in a (hidden) failed state 16 devices were in a failed state for (on average) 6 months Total hidden downtime was therefore 8 device-years Probability that device in (hidden) failed state = 8/ 400 = 2% Probability that device is in safe operating condition = 98% 66
According to RCM theory, the relationship between the MTBF, the testing interval (TI), and the probability that the device is in a (hidden) failed state (HFS) is: HFS (%) = 50 x TI (in years) / MTBF (in years) And the level of confidence (LOC) that the device is in a safe operating condition is: LOC (%) = 100 – HFS (%) As the ratio of the test interval to the MTBF gets smaller, the probability that the device is in a (hidden) failed state also gets smaller. 67
The ratio of the test interval (TI) to the MTBF determines the Level of Confidence (LOC) that the device is in a Safe Operating Condition (i. e. not in a HFS) TI (yrs) MTBF (yrs) HFS (%) LOC/SOC (%) 0. 5 25 1% 99% 0. 5 50 0. 5% 99. 5% 0. 5 100 0. 25% 99. 75% 1 25 2% 98% 1 50 1% 99% 1 100 0. 5% 99. 5% 2 50 2% 98% 2 100 1% 99% 4 100 2% 98% 4 200 1% 99% 68
Relationship between the LOC (that the device is not in a HFS), the Testing Interval (TI) and the MTBF HFS 98% . Yr s. Yrs s. r 0 Y 50 25 2% LOC 10 1% 00 F= 2 Yrs. 99% MTB 1 2 3 4 Testing Interval (TI) in years 69
Manufacturer-recommended maintenance intervals ü Legal question: “Did you follow the manufacturer’s maintenance recommendations? ” ü Selection of the optimum interval requires knowledge of the NDP’s age-related failure pattern ü Extensive (pre-market) testing in a simulated environment is time consuming and costly. Therefore it is highly likely that the manufacturer’s recommendations are based more on “guestimates” than on actual testing. 70
Questions ? 71


