__Reliability, Maintainability and Risk: Practical Methods for Engineers, Tenth Edition__ has taught reliability and safety engineers techniques to minimize process design, operation defects and failures for over 40 years. For beginners, the book provides tactics on how to avoid pitfalls in this complex and wide field. For experts in the field, well-described, realistic and illustrative examples and case studies add new insights and assistance. The author uses his more than 40 years of experience to create a comprehensive and detailed guide to the field, while also providing an excellent description of reliability and risk computation concepts. The book is organized into many parts, covering reliability parameters and costs, the history of reliability and safety technology, a cost-effective approach to quality, reliability and safety, how to interpret failure rates, a focus on the prediction of reliability and risk, a discussion of design and assurance techniques, and much more. Front Cover Reliability, Maintainability and Risk Also by the same author Reliability, Maintainability and Risk Copyright Contents Preface Acknowledgments 1 - Understanding Reliability Parameters and Costs 1 - The History of Reliability and Safety Technology 1.1 Failure Data 1.2 Hazardous Failures 1.3 Predicting Reliability and Risk 1.4 Achieving Reliability and Safety-Integrity 1.5 The RAMS-Cycle 1.6 Contractual and Legal Pressures 1.7 Reliability versus Functional Safety 2 - Understanding Terms and Jargon 2.1 Defining Failure and Failure Modes 2.2 Failure Rate and Mean Time Between Failures 2.2.1 The Observed Failure Rate 2.2.2 The Observed Mean Time Between Failures 2.2.3 The Observed Mean Time to Fail 2.2.4 Mean Life 2.3 Interrelationships of Terms 2.3.1 Reliability and Failure Rate 2.3.2 Reliability and Failure Rate as an Approximation 2.3.3 Reliability and MTBF 2.4 The Bathtub Distribution 2.5 Down Time and Repair Time 2.6 Availability, Unavailability and Probability of Failure on Demand 2.7 Hazard and Risk-Related Terms 2.8 Choosing the Appropriate Parameter 3 - A Cost-Effective Approach to Quality, Reliability and Safety 3.1 Reliability and Optimum Cost 3.1.1 Optimum Reliability/Availability 3.1.2 Cost of Reliability Prediction/Assessment 3.1.3 Financial Justification for Further Reliability Improvement 3.2 Costs and Safety 3.2.1 The Need for Optimization 3.2.2 Costs and Savings Involved with Safety Engineering 3.3 The Cost of Quality bksec2_6 Prevention Costs Appraisal Costs Failure Costs 2 - Interpreting Failure Rates 4 - Realistic Failure Rates and Prediction Confidence 4.1 Data Accuracy 4.2 Sources of Data 4.2.1 Electronic Failure Rates 4.2.1.1 US Military Handbook 217 (generic, no failure modes) 4.2.1.2 HRD5 Handbook of Reliability Data for Electronic Components used in Telecommunications Systems (industry specific, no failu ... 4.2.1.3 Recueil de Donnés de Fiabilité du CNET (industry specific, no failure modes) 4.2.1.4 Bellcore/Telcordia 4.2.1.5 Electronic Data NOT Available for Purchase 4.2.2 Other General Data Collections 4.2.2.1 Non-Electronic Parts Reliability Data Book – NPRD (generic, some failure modes) 4.2.2.2 OREDA – Offshore Reliability Data (industry specific, detailed failure modes, mean times to repair) 4.2.2.3 TECHNIS/FARADIP.THREE 4.2.2.4 HSE Failure Rate and Event Data 4.2.2.5 Sources of Nuclear Generation Data (industry specific) 4.2.2.6 US Sources of Power Generation Data (industry specific) 4.2.2.7 SINTEF (industry specific) 4.2.2.8 Data NOT Available for Purchase 4.2.3 Some Older Sources 4.3 Data Ranges 4.3.1 Using the Ranges 4.4 Confidence Limits of Prediction 4.5 Manufacturers’ Data (Warranty Claims) 4.6 Soft Errors/Failures 4.7 Overall Conclusions 5 - Interpreting Data and Demonstrating Reliability 5.1 The Four Cases 5.2 Inference and Confidence Levels 5.3 The Chi-Square Test 5.4 Understanding the Method in More Detail 5.5 Double-Sided Confidence Limits 5.6 Reliability Demonstration 5.7 Sequential Testing 5.8 Setting Up Demonstration Tests 6 - Variable Failure Rates and Probability Plotting 6.1 The Weibull Distribution 6.2 Using the Weibull Method 6.2.1 Curve Fitting to Interpret Failure Data 6.2.2 Manual Plotting 6.2.3 Using the COMPARE Computer Tool 6.2.4 Significance of the Result 6.2.5 Optimum Preventive Replacement 6.3 More Complex Cases of the Weibull Distribution 6.4 Continuous Processes 3 - Predicting Reliability and Risk 7 - Basic Reliability Prediction Theory 7.1 Why Predict RAMS? 7.2 Probability Theory 7.2.1 The Multiplication Rule 7.2.2 The Addition Rule 7.2.3 The Binomial Theorem 7.2.4 Bayes Theorem 7.3 Reliability of Series Systems 7.4 Redundancy Rules 7.4.1 General Types of Redundant Configuration 7.4.2 Full Active Redundancy (Without Repair) 7.4.3 Partial Active Redundancy (Without Repair) 7.4.4 Conditional Active Redundancy 7.4.5 Standby Redundancy 7.4.6 Load Sharing 7.5 General Features of Redundancy 7.5.1 Incremental Improvement 7.5.2 Further Comparisons of Redundancy 7.5.3 Redundancy and Cost Exercises 8 - Methods of Modeling 8.1 Block Diagrams and Repairable Systems 8.1.1 Reliability Block Diagrams 8.1.1.1 Establish failure criteria 8.1.1.2 Create a reliability block diagram 8.1.1.3 Failure mode analysis 8.1.1.4 Calculation of system reliability 8.1.1.5 Reliability allocation 8.1.2 Repairable Systems (Revealed Failures) 8.1.3 Repairable Systems (Unrevealed Failures) 8.1.4 Systems With Cold Standby Units and Repair 8.1.5 Modeling Repairable Systems with Both Revealed and Unrevealed Failures 8.1.6 Conventions for Labeling ‘Dangerous’, ‘Safe’, Revealed and Unrevealed Failures 8.2 Common Cause (Dependent) Failure 8.2.1 What is CCF? 8.2.2 Types of CCF Model 8.2.3 The BETAPLUS Model 8.2.3.1 Checklists and scoring of the (A) and (B) factors in the model 8.2.3.2 Assessment of the diagnostic interval factor (C) 8.2.3.3 ‘M out of N’ redundancy/voting ‘one out of six’ voting ‘five out of six’ voting 8.3 Fault Tree Analysis 8.3.1 The Fault Tree 8.3.2 Calculations 8.3.3 Cutsets 8.3.4 Computer Tools 8.3.5 Allowing for Common Cause Failure 8.3.6 Fault Tree Analysis in Design 8.3.7 A Cautionary Note (Illogical Trees) 8.4 Event Tree Diagrams 8.4.1 Why Use Event Trees? 8.4.2 The Event Tree Model 8.4.3 Quantification 8.4.4 Differences 8.4.5 Feedback Loops 9 - Quantifying the Reliability Models 9.1 The Reliability Prediction Method 9.2 Allowing for Diagnostics and Proof Tests 9.2.1 Establishing and Modelling Diagnostic Coverage 9.2.2 Assessing and Allowing for Imperfect Proof Test Coverage 9.2.2.1 Assessing the degree of coverage 9.2.2.2 Allowing for Imperfect Proof Test coverage 9.2.3 Partial Stroke Testing 9.2.4 Safe Failure Fraction 9.2.5 No Fault Found 9.3 FMEDA (Failure Mode, Effects and Diagnostic Analysis) 9.4 Human Factors 9.4.1 Background 9.4.2 Models 9.4.3 HEART (Human Error Assessment and Reduction Technique) 9.4.4 THERP (Technique for Human Error Rate Prediction) 9.4.5 TESEO (Empirical Technique to Estimate Operator Errors) 9.4.6 Other Methods 9.4.7 Human Error Probabilities 9.4.8 Trends in Rigor of Assessment 9.4.9 Some Human Error Data 9.5 Simulation 9.5.1 The Technique 9.5.2 Some Packages 9.5.2.1 DNV 9.5.2.1.1 OPTAGON and MAROS 9.5.2.1.2 TARO 9.5.2.2 Atkins (SNC-Lavalin group) 9.5.2.2.1 RAM4 9.5.2.2.2 RAMP 9.5.2.2.3 SAM 9.5.2.3 ITEM 9.5.2.3.1 ToolKit 9.5.2.4 ISOGRAPH 9.5.2.4.1 AVSIM 9.5.2.4.2 RCMCost 9.5.2.5 PELOTON 9.5.2.5.1 MIRIAM 9.6 Comparing Predictions with Targets 10 - Risk Assessment (QRA) 10.1 Frequency and Consequence 10.2 Perception of Risk, ALARP and Cost per Life Saved 10.2.1 Maximum Tolerable Risk (Individual Risk) 10.2.2 Maximum Tolerable Failure Rate 10.2.3 ALARP and Cost Per Life Saved 10.2.4 Societal Risk 10.2.5 Production/Damage Loss 10.2.6 Environmental Loss 10.3 Hazard Identification 10.3.1 HAZOP 10.3.2 HAZID 10.3.3 HAZAN (Consequence Analysis) 10.4 Factors to Quantify 10.4.1 Reliability 10.4.2 Lightning and Thunderstorms 10.4.3 Aircraft Impact 10.4.3.1 Background 10.4.3.2 Airfield Proximity 10.4.4 Earthquake 10.4.5 Meteorological Factors 10.4.6 Other Consequences 4 - Achieving Reliability and Maintainability 11 - Design and Assurance Techniques 11.1 Specifying and Allocating the Requirement 11.2 Stress Analysis 11.3 Environmental Stress Protection 11.4 Failure Mechanisms 11.4.1 Types of Failure Mechanism 11.4.2 Failures in Semiconductor Components 11.4.3 Discrete Components 11.5 Complexity and Parts 11.5.1 Reduction of Complexity 11.5.2 Part Selection 11.5.3 Redundancy 11.6 Burn-In and Screening 11.7 Maintenance Strategies 12 - Design Review, Test and Reliability Growth 12.1 Review Techniques 12.2 Categories of Testing 12.2.1 Environmental Testing 12.2.2 Marginal Testing 12.2.3 High-Reliability Testing 12.2.4 Testing for Packaging and Transport 12.2.5 Multiparameter Testing 12.2.6 Step-Stress Testing 12.3 Reliability Growth Modeling 12.3.1 The CUSUM Technique 12.3.2 Duane Plots 13 - Field Data Collection and Feedback 13.1 Reasons for Data Collection 13.2 Information and Difficulties 13.3 Times to Failure 13.4 Spreadsheets and Databases Equipment code How found Type of fault Action taken Discipline Free text 13.5 Best Practice and Recommendations 13.6 Analysis and Presentation of Results 13.7 Manufacturers’ data 13.8 Anecdotal Data 13.9 No-Fault-Found Causes of NFF Extent/Cost of the Problem Failure Reporting Recommendations for Possible Actions Relating to Failure Reporting Relating to Training Relating to Procedures Allow for NFF in Reliability Prediction 14 - Factors Influencing Down Time 14.1 Key Design Areas 14.1.1 Access 14.1.2 Adjustment 14.1.3 Built-In Test Equipment 14.1.4 Circuit Layout and Hardware Partitioning 14.1.5 Connections 14.1.6 Displays and Indicators 14.1.7 Handling, Human and Ergonomic Factors 14.1.8 Identification 14.1.9 Interchangeability 14.1.10 Least Replaceable Assembly 14.1.11 Mounting 14.1.12 Component Part Selection 14.1.13 Redundancy 14.1.14 Safety 14.1.15 Software 14.1.16 Standardization 14.1.17 Test Points 14.2 Maintenance Strategies and Handbooks 14.2.1 Organization of Maintenance Resources 14.2.1.1 First-line maintenance – Corrective maintenance – Call – Field maintenance 14.2.1.2 Preventive maintenance – Routine maintenance 14.2.1.3 Second-line maintenance – Workshop – Overhaul shop – Repair depot 14.2.2 Maintenance Procedures 14.2.3 Tools and Test Equipment 14.2.4 Personnel Considerations 14.2.5 Maintenance Manuals 14.2.5.1 Requirements 14.2.5.2 Types of manual 14.2.6 Spares Provisioning 14.2.7 Logistics 14.2.8 The User and the Designer 14.2.9 Computer Aids to Maintenance 15 - Predicting and Demonstrating Repair Times 15.1 Prediction Methods 15.1.1 US Military Handbook 472 – Procedure 3 15.1.2 Checklist – Mil 472 – Procedure 3 Checklist A: Checklist B: Checklist C: 15.1.2.1 Checklist A – Scoring Physical Design Factors Scores Scoring criteria Scores Scoring criteria Scores Scoring criteria 15.1.2.2 Checklist B – Scoring Design Dictates – Facilities Scores Scoring criteria Scores Scoring criteria Scores Scoring criteria 15.1.2.3 Checklist C – Scoring Design Dictates – Maintenance Skills Scores Scoring criteria 15.1.3 Using a Weighted Sample 15.2 Demonstration Plans 15.2.1 Demonstration Risks 15.2.2 US Military Standard 471A (1973) Test Method 1 Test Method 2 Test Method 3 Test Method 4 Test Method 5 Test Method 6 Test Method 7 15.2.3 Data Collection 16 - Quantified Reliability Centered Maintenance 16.1 What is QRCM? 16.2 The QRCM Decision Process 16.3 Optimum Replacement (Discard) 16.4 Optimum Spares 16.5 Optimum Proof Test 16.6 Condition Monitoring 17 - Systematic Failures, Especially Software 17.1 Random versus Systematic Failures 17.2 Software-related Failures 17.3 Software Failure Modeling 17.4 Software Quality Assurance (Life Cycle Activities) 17.4.1 Organization of Software QA 17.4.2 Documentation Controls 17.4.2.1 Change Control 17.4.3 Programming (Coding) Standards 17.4.4 Fault-Tolerant Design Features 17.4.5 Reviews 17.4.6 Integration and Test 17.5 Modern/Formal Methods 17.5.1 Requirements Specification and Design 17.5.2 Static Analysis 17.5.3 Test Beds 17.6 Cyber Security 17.6.1 The Problem 17.6.2 Areas of Vulnerability 17.6.3 Types of Attack Backdoor Denial-of-Service Attacks Direct-Access Attacks Eavesdropping Phishing Privilege Escalation Social Engineering Spoofing Tampering 17.6.4 Defenses Risk Management Addressing Cyber Security During Design Control of Changes to the Configuration Network Security Limiting User Privileges Malware Prevention Monitoring and Detection Removable Media Controls Home and Mobile Working Firewalls 17.6.5 Cyber Risk Assessment 17.7 Software Checklists 17.7.1 Organization of Software QA 17.7.2 Documentation Controls 17.7.3 Programming Standards 17.7.4 Design Features 17.7.5 Code Inspections and Walkthroughs 17.7.6 Integration and Test 5 - Legal, Management and Safety Considerations 18 - Project Management and Competence 18.1 Setting Objectives and Making Specifications 18.2 Planning, Feasibility and Allocation 18.3 Program Activities 18.4 Responsibilities and Competence 18.5 Functional Safety Capability (Management) 19 - Contract Clauses and Their Pitfalls 19.1 Essential Areas 19.1.1 Definitions 19.1.2 Environment 19.1.3 Maintenance Support 19.1.4 Demonstration and Prediction 19.1.5 Liability 19.2 Other Areas 19.2.1 Reliability and Maintainability Program 19.2.2 Reliability and Maintainability Analysis 19.2.3 Storage 19.2.4 Design Standards 19.2.5 Safety-Related Equipment 19.3 Pitfalls 19.3.1 Definitions 19.3.2 Repair Time 19.3.3 Statistical Risks 19.3.4 Quoted Specifications 19.3.5 Environment 19.3.6 Liability 19.3.7 In Summary 19.4 Penalties 19.4.1 Apportionment of Costs During Guarantee 19.4.2 Payment According to Down Time 19.4.3 In Summary 19.5 Subcontracted Reliability Assessments 20 - Product Liability and Safety Legislation 20.1 The General Situation 20.1.1 Contract Law 20.1.2 Common Law 20.1.3 Statute Law 20.1.4 In Summary 20.2 Strict Liability 20.2.1 Concept 20.2.2 Defects 20.3 The Consumer Protection Act 1987 20.3.1 Background 20.3.2 Provisions of the Act 20.4 Health and Safety at Work Act 1974 20.4.1 Scope 20.4.2 Duties 20.4.3 Concessions 20.4.4 Responsibilities 20.4.5 European Community Legislation 20.4.6 Management of Health and Safety at Work Regulations 1992 20.4.7 COSHH 20.4.8 REACH 20.5 Insurance and Product Recall 20.5.1 The Effect of Product Liability Trends 20.5.2 Some Critical Areas 20.5.3 Areas of Cover 20.5.4 Product Recall 21 - Major Incident Legislation 21.1 History of Major Incidents 21.2 Development of major incident legislation 21.3 Safety reports 21.4 Offshore Safety Cases 21.5 Problem Areas 21.6 Rail 21.7 Corporate Manslaughter and Corporate Homicide 22 - Integrity of Safety-Related Systems 22.1 Safety-Related or Safety-Critical? 22.2 Safety-Integrity Levels (SILs) 22.2.1 Targets Low demand High demand More complex example 22.2.2 Assessing Equipment Against the Targets 22.2.2.1 Quantitative versus qualitative features 22.2.2.2 Safe failure fraction (SFF) 22.2.2.3 Life-cycle activities 22.2.2.4 Functional safety capability (management) 22.3 Programable electronic systems (PESs) 22.4 Current Guidance 22.4.1 IEC International Standard 61508 (2010): Functional safety of electrical/ electronic/programmable electronic safety–related ... 22.4.2 IEC International Standard 61511: Functional safety—safety instrumented systems for the process industry sector 22.4.3 Industry-Specific Documents 22.5 Framework for Certification 22.5.1 Self-certification 22.5.2 Third-party assessment 22.5.3 Use of a certifying body 23 - A Case Study: The Datamet Project 23.1 Introduction 23.2 The Datamet Concept 23.3 The Contract 23.4 Detailed Design 23.5 Syndicate Study Outline placeholder First Session Second Session 23.6 Hints Outline placeholder Project Contract 24 - A Case Study: Gas Detection System 24.1 Safety-Integrity Target 24.2 Random Hardware Failures 24.3 ALARP 24.4 Architectures 24.5 Life-Cycle Activities 24.6 Functional Safety Capability 25 - A Case Study: Pressure Control System 25.1 The Unprotected System 25.2 Protection System 25.3 Assumptions 25.4 Reliability Block Diagram 25.5 Failure Rate Data 25.6 Quantifying the Model 25.7 Proposed Design and Maintenance Modifications 25.8 Modeling Common Cause Failure (Pressure Transmitters) 25.9 Quantifying the Revised Model 25.10 ALARP 25.11 Architectural Constraints 26 - Helicopter Incidents and Risk Assessment 26.1 Helicopter Incidents 26.2 Risk Assessment - Floatation Equipment 26.2.1 Assessment of the Scenario 26.2.2 ALARP 26.3 Effect of Pilot Experience on Incident Rate Glossary A1.1 Terms Related to Failure A1.1.1 Failure A1.1.2 Failure Mode A1.1.3 Failure Mechanism A1.1.4 Failure Rate A1.1.5 Mean Time Between Failures and Mean Time to Fail A1.1.6 Common Cause Failure A1.1.7 Common Mode Failure A1.1.8 Dangerous Failure A1.1.9 Safe Failure A1.1.10 No-Fault-Found A1.1.11 Soft Failure A1.2 Reliability Terms A1.2.1 Reliability A1.2.2 Redundancy A1.2.3 Diversity A1.2.4 Failure Mode and Effect Analysis A1.2.5 FMEDA (Failure Mode Effect and Diagnostic Analysis) A1.2.6 Fault Tree Analysis A1.2.7 Cause Consequence Analysis (Event Trees) A1.2.8 Reliability Growth A1.2.9 Reliability Centered Maintenance A1.3 Maintainability Terms A1.3.1 Maintainability A1.3.2 Mean Time to Repair (MTTR) A1.3.3 Repair Rate A1.3.4 Repair Time A1.3.5 Down Time A1.3.6 Corrective Maintenance A1.3.7 Preventive Maintenance A1.3.8 Least Replaceable Assembly (LRA) A1.3.9 Second-Line Maintenance A1.3.10 Maximum Repair Time A1.4 Terms Associated With Software A1.4.1 Software A1.4.2 Programable Device A1.4.3 High-Level Language A1.4.4 Assembler A1.4.5 Compiler A1.4.6 Diagnostic Software A1.4.7 Simulation A1.4.8 Emulation A1.4.9 Load Test A1.4.10 Functional Test A1.4.11 Software Error A1.4.12 Bit Error Rate A1.4.13 Automatic Test Equipment (ATE) A1.4.14 Data Corruption A1.5 Terms Related to Safety A1.5.1 Hazard A1.5.2 Major Hazard A1.5.3 Hazard Analysis A1.5.4 HAZOP A1.5.5 LOPA A1.5.6 Risk A1.5.7 Consequence Analysis A1.5.8 Safe Failure Fraction A1.5.9 Safety-Integrity A1.5.10 Safety-Integrity level A1.5.11 ALARP (As Low as Reasonably Practicable) A1.5.12 Cost Per Life Saved A1.5.13 GDF (Gross Disproportionality Factor) A1.5.14 FAFR (Fatal Accident Frequency) A1.5.15 Cyber Security A1.6 General Terms A1.6.1 Availability (Steady State) A1.6.2 Unavailability (PFD) A1.6.3 Burn-In A1.6.4 Confidence Interval A1.6.5 Consumer’s Risk A1.6.6 Derating A1.6.7 Ergonomics A1.6.8 Mean A1.6.9 Median A1.6.10 PFD A1.6.11 Producer’s Risk A1.6.12 Quality A1.6.13 Random A1.6.14 FRACAS A1.6.15 RAMS Percentage Points of the Chi-Square Distribution Microelectronic Failure Rates General Failure Rates Failure Mode Percentages Human Error Probabilities Fatality Rates Answers to Exercises Bibliography Scoring Criteria for BETAPLUS Common Cause Model Example of HAZOP HAZID Checklist Markov Analysis of Redundant Systems Calculating the GDF A Suggested “Standard” for Achieving Functional Safety Index A B C D E F G H I J L M N O P Q R S T U V W Z Back Cover Reliability, Maintainability and Risk: Practical Methods for Engineers, Tenth Edition has taught reliability and safety engineers techniques to minimize process design, operation defects and failures for over 40 years. For beginners, the book provides tactics on how to avoid pitfalls in this complex and wide field. For experts in the field, well-described, realistic and illustrative examples and case studies add new insights and assistance. The author uses his more than 40 years of experience to create a comprehensive and detailed guide to the field, while also providing an excellent description of reliability and risk computation concepts. The book is organized into many parts, covering reliability parameters and costs, the history of reliability and safety technology, a cost-effective approach to quality, reliability and safety, how to interpret failure rates, a focus on the prediction of reliability and risk, a discussion of design and assurance techniques, and much more. Covers models for partial valve stroke test, fault tree logic and quantification difficulties Includes more detail on the use of tools such as FMEDA and programming standards like MISRA Presents case studies on the Datamet Project, Gas Detection System, Pressure Control System, and Helicopter Incidents and Risk Assessment Provides user exercises and answers