A failure occurs when a device at some level (a system, a unit, a module, or a component) fails to perform its intended function. Each safety instrumented function (SIF) in a safety instrumented system must perform its protection function, must not falsely shut down the process.
Random and systematic failures differ significantly in their causes, characteristics, predictability, and mitigation strategies. Functional safety standards (IEC 61511 /61508) provide definitions of two different categories of failures: random failures and systematic failures. Here’s a detailed comparison:
-
Cause
- Random Failures:
- Caused by physical degradation mechanisms in hardware, such as corrosion, thermal stress, and wear-out.
- Typically result from external factors or inherent wear and tear over time.
- Example: A capacitor losing its electrolyte over time or a lightning strike causing an electrical surge that damages components.
- Systematic Failures:
- Caused by errors in processes, procedures, or design during the system’s life cycle, including specification, development, operation, and maintenance.
- Often linked to human errors, inadequate procedures, or gaps in testing.
- Example: A software crash caused by incorrect logic programming or an untested data input causing a safety system to fail.
-
Nature
- Random Failures:
- Occur unpredictably and are not repeatable under the same conditions.
- Typically isolated to specific components or devices within the system.
- Systematic Failures:
- Repeatable under identical circumstances since they stem from flaws in the system’s design or operation.
- Can have widespread effects, impacting multiple devices, loops, or even entire systems.
-
Predictability
- Random Failures:
- Can be statistically analyzed and predicted using probabilities (e.g., Mean Time Between Failures, or MTBF).
- Their occurrence is inherently random but falls within a calculable range.
- Systematic Failures:
- Cannot be statistically predicted as they are unique to the specific conditions and processes causing them.
- Qualitative measures are used to anticipate and mitigate them.
-
Scope
- Random Failures:
- Limited in scope, typically affecting a single device or component.
- Example: A transistor failure due to electrical stress damages only the specific device.
- Systematic Failures:
- Broader in impact, potentially affecting multiple systems, devices, or loops across an organization.
- Example: A programming error in safety logic affects all instances of the system.
-
Examples
- Random Failures:
- Failure of a transistor due to an electrical surge caused by lightning.
- A power supply failure due to the evaporation of electrolyte in a capacitor.
- Systematic Failures:
- A software crash caused by an untested data input.
- Incorrect maintenance leading to a safety system’s inability to perform its function.
- Mitigation Strategies
- Random Failures:
- Use of high-integrity equipment and materials.
- Addition of redundant or backup components.
- Regular maintenance and timely replacement of aging hardware.
- Systematic Failures:
- Rigorous administrative controls and monitoring.
- Improved training, testing, and documentation during system development.
- Implementation of qualitative measures like Life Cycle Activities to address design and procedural flaws.
Top References
- Safety Instrumented Systems Verification: Practical Probabilistic Calculations William M. Goble Harry Cheddie
- IEC-61511
- www.exida.com
- https://www.exida.com/Blog/random-versus-systematic-faults-whats-the-difference
- Guidelines for Safe Automation of Mechanical Processes by Center for Chemical Process Safety
- Reliability, Maintainability and Risk by Dr David J Smith