Understanding Tolerable Risk
In functional safety, the primary objective is to ensure that a facility remains safe throughout its operational lifetime. This is achieved by implementing Safety Instrumented Functions (SIFs) that provide adequate risk reduction. However, the question arises: how much risk reduction is ‘enough’? This is determined by setting a tolerable risk level—a level of risk that is deemed acceptable based on industry standards, societal norms, and, in some cases, government regulations.
Since achieving zero risk is impractical, organizations define acceptable risk thresholds that balance safety and feasibility. Tolerable risk is usually expressed as a maximum allowable frequency of an unwanted event, such as a fatality occurring no more than once in 10,000 years (10⁻⁴ per year).
Methods of Defining Tolerable Risk
There are several approaches to defining tolerable risk:
- Frequency of an individual harmful event (e.g., overfilling a specific tank)
- Combined frequency of harm from multiple events affecting the same risk receptor
- F-N Curve: A graph showing the relationship between the frequency (F) of incidents and the number (N) of expected fatalities
For Safety Integrity Level (SIL) assessments, the first approach—assessing the frequency of an individual event—is the most practical.
Tolerable Risk Matrices and Severity Categories
A straightforward method of defining tolerable risk is to base it on the severity of an incident. This is commonly done by categorizing incidents based on their impact on personnel, property, or the environment. A typical severity classification for personnel injury includes:
- Category A: Negligible outcome or first-aid injury
- Category B: Lost time injury (up to 3 working days lost)
- Category C: Significant injury (more than 3 working days lost)
- Category D: Disabling injury or a single fatality
- Category E: Multiple fatalities
For each severity category, an associated tolerable frequency is defined, creating a tolerable risk matrix. This structured approach ensures that risk levels are managed systematically and transparently.
Financial Considerations in Risk Definition
In addition to safety concerns, defining tolerable risk must also consider financial implications, including:
- Cost of lost production due to downtime (a major factor in oil and gas industries)
- Deferred vs. lost production (e.g., whether lost output can be recovered later)
- Equipment repair costs
- Cost of destroyed products, especially in the chemical and pharmaceutical sectors
- Financial penalties, such as shipment delays or supply disruptions
To ensure accurate financial risk assessment, it is important to define assumptions about downtime, spare availability, and plant restart times.
Individual Risk vs. Event-Based Risk
Tolerable risk can also be defined in terms of total risk to an individual onsite, rather than per hazardous event. This is known as individual risk, and it is often stated as:
“The fatality risk to any individual onsite shall not exceed X per year.”
Typical values range from 10⁻³ to 10⁻⁴ per year, aligning with general fatality risks from all causes. This approach helps organizations assess cumulative risks from all potential hazards in a facility.
Balancing Precision in Risk Quantification
A key challenge in defining tolerable risk is determining how much precision is necessary. Using low-precision estimates requires building in safety margins, often leading to over-designed SIFs with:
- Higher reliability components
- Additional diagnostics (e.g., smart sensors vs. mechanical switches)
- Increased redundancy
- More frequent or stringent testing
While over-design increases costs, excessive precision demands expensive and time-consuming risk studies, such as Quantitative Risk Analysis (QRA). A balanced approach involves a two-step risk assessment:
- Initial qualitative study to estimate risk levels and target SILs.
- Detailed analysis for high-SIL risks (e.g., SIL 2 or greater).
This strategy ensures an efficient allocation of resources while preventing unnecessary over-design or excessive risk analysis costs.
The ALARP Concept in Tolerable Risk
Tolerable risk is not an absolute threshold; it operates within the ALARP (As Low As Reasonably Practicable) principle. ALARP defines two risk levels:
- Risk tolerability level: Risk is tolerable only if further reduction is impractical or not cost-effective.
- Risk acceptance level: Risk is considered acceptable without additional mitigation.
For instance, if a single fatality risk has a tolerability level of 10⁻³ per year and an acceptance level of 10⁻⁵ per year, then a risk level of 10⁻⁴ per year would require efforts to reduce it further until it reaches ALARP.
References:
- Functional Safety from Scratch by Peter Clarke
- Layer of Protection Analysis, Simplified Risk Assessment by CCPS