Demystifying SOTIF Acceptance Criteria and Validation Targets – Part 2
In the first entry of our blog series, we introduced the concepts of acceptance criteria and validation targets in the context of ISO 21448:2022 (Safety of the Intended Functionality, or SOTIF). We discussed how these elements help determine whether an Advanced Driver Assistance System (ADAS) or Automated Driving System (ADS) is “safe enough” for real-world deployment at scale.
In the next two blog entries, we will get into the building blocks that underpin these acceptance criteria and validation targets. In Part 2, we will explore the high-level frameworks for setting risk thresholds and acceptance criteria. In Part 3, we will work to translate the abstract objectives in the standard into concrete, testable validation targets.
From Residual Risk to Acceptance Criteria: Establishing “Safe Enough”
An acceptance criterion in SOTIF defines what level of residual risk is considered acceptable after all feasible SOTIF safety measures i.e., functional modifications have been applied. In other words, it sets the bar for how safe the system needs to be before it can be released. These criteria are crucial because they describe what it means to not pose an unreasonable risk. According to ISO 21448, formulating acceptance criteria should take into account several factors, including relevant existing regulations, the performance of similar existing functions, and even the performance of a very competent human driver.
For example, if an existing lane-keeping assist system (LKAS) is known to have a certain safety performance, an organization might set its next generation system’s acceptance criteria to meet or exceed that existing benchmark. Likewise, if local traffic laws and/or regulations specify certain safety targets (e.g., a maximum allowed rate of undesired functionality), those become baseline considerations while specifying the acceptance criteria.
The ISO 21448 standard emphasizes that we must judge the risk acceptability from the perspective of all who might be exposed to the risk – including the vehicle occupants, other road users, pedestrians. Ultimately, an acceptance criterion should reflect that the risk posed by the ADAS and ADS is not unreasonable for them. In practice, this often means aiming for a level of safety performance for the ADAS or ADS that is comparable to or better than that of human drivers under similar operational conditions. Of course, the automotive industry is not the first to grapple with this type of problem right?
Risk Tolerability Frameworks
The SOTIF standard points to established risk tolerability principles to help justify and structure acceptance criteria. These principles come from decades of safety engineering experience and provide high-level yardsticks for “how safe is safe enough.” Four key frameworks highlighted in ISO 21448 are:
1. GAMAB (Globalement Au Moins Aussi Bon)
French for “globally at least as good.” This principle essentially dictates that a newly introduced system must not be more dangerous than the existing state of the art. In other words, the residual risk associated with the ADAS or ADS must be no higher than the risk from comparable human-driven vehicles or previous-generation systems. Simply, introduction of new technology shouldn’t make things worse. If human drivers have on average, for instance, one harmful accident per million miles of driving, an automated system following the GAMAB principle should target at most one (but preferably fewer) harmful accidents per million miles of operation.
2. ALARP (As Low As Reasonably Practicable)
ALARP is a classic risk management approach widely used in industries like Railways and Oil & Gas. It acknowledges that while zero risk is unachievable, we should work to reduce risk as much as is reasonably practicable and that further risk reduction would require effort or cost grossly disproportionate to the safety benefit gained. Under ALARP principle, an acceptance criterion might be set to a level where any further improvement would impose unreasonable cost or technical burden relative to the incremental risk reduction achieved. This framework is particularly useful for novel technologies where clear regulatory limits may not exist – it forces us to carefully weigh the residual risk versus the effort to mitigate that risk.
3. MEM (Minimum Endogenous Mortality)
The MEM principle says a new technology should not significantly increase the natural background risk of death in society. It’s a more abstract safety benchmark – it derives acceptable risk levels by looking at the lowest levels of mortality that society experiences from natural causes, and insists that the technology-induced risk must be of the same order of magnitude or lower. For instance, if in a given society the chance of death from natural causes is X per hour of exposure, an automated driving system’s contribution to fatality risk should be well below X per hour. This principle sets a very stringent bar, effectively saying the electrical/ electronic or software system should not make being alive noticeably riskier than it already is. This is the underpinning risk framework we use in ISO 26262 for automotive functional safety.
4. Positive Risk Balance (PRB)
This principle takes a holistic view – a new autonomous system can be deemed acceptable even if it slightly increases certain specific risks, provided that it decreases other risks enough to yield an overall net safety benefit. Basically, all changes in risk are weighed together. For example, an ADS might introduce a new type of hazard, but if it greatly reduces occurrence of common human driver errors, the total risk might still be lower for the introduction of the system. PRB allows such trade-offs, ensuring that the overall balance of risk is still “positive” (i.e. safety-improving).
While they may seem intimidating at first, it’s important to remember that these risk frameworks are not mutually exclusive –- they often converge to similar numeric targets for risk acceptance. In fact, standards from other domains (e.g., railway standard EN 50126-2) discuss these principles in detail and we at SRES have worked over the past 10+ years to interpret them for ADAS/ ADS applications.
These risk frameworks serve as rationale to justify why a chosen acceptance criterion is defensible. For example, a company might claim: “Our virtual driver is at least as safe as a human driver (GAMAB) and we’ve reduced risk to ALARP; therefore, our residual risk is acceptable.” Using a combination of these frameworks helps make the case that the acceptance criteria aren’t simply arbitrary, but grounded in generally accepted safety thinking.
Good Data is Everything
In addition to these risk tolerability principles, ISO 21448 pushes development teams to consider credible and concrete data sources while specifying acceptance criteria. These may include but are not limited to real-world traffic accident statistics and performance data from similar systems. If adequate field data exists for how often a certain hazardous behavior occurs in today’s human-operated vehicles, that data can be used to directly inform what “acceptable” looks like for the new system.
For example, you might have data that in a given country, human-driven vehicles experience, say, one unintended emergency braking incident per 100,000 miles on average. By examining how frequently rear-end crashes occur in human driving, we can set a benchmark for the maximum allowable frequency of such crashes with ADAS engaged. This statistic can help us formulate a baseline acceptance criterion: “The AEB-equipped vehicle should have no more than one unintended emergency braking incident per 100,000 miles”. Using a GAMAB risk framework, an AEB system should demonstrate at least the same performance. This can also be phrased as a probability or rate (e.g., probability of a rear-end collision < 1 per 1E5 miles). In SOTIF terms, this is an acceptance criterion on residual risk – it quantifies the maximum risk we are willing to accept as a society for that hazard. It’s important to remember that SOTIF focuses on hazardous behaviors arising from functional insufficiencies, not random hardware failures – so these criteria relate to things like the frequency of false positive detections of obstacles or inappropriate system response in complex driving scenarios.
As we noted in Part 1 of this blog series, not all acceptance criteria need to be expressed purely as probabilities or rates. Some acceptance criteria might remain qualitative (e.g., “the vehicle’s responses should be perceived as intuitive by expert drivers”). However, even qualitative goals must get translated into quantitative proxies for SOTIF verification. For instance, “intuitive responses” might be partially verified by measuring the frequency of critical interventions by safety drivers or the frequency of disengagements during vehicle testing. The ISO 21448 standard encourages quantification wherever feasible, as quantitative targets enable clear evidence of achievement of safe performance. When quantitative acceptance criteria are chosen, they must come with a rationale – typically based on the principles and data sources we discussed (regulations, human performance, GAMAB/ALARP/MEM, field statistics).
At SRES, we use the following thought-starters while specifying acceptance criteria:
- Have we systematically identified all potential hazards arising from the intended functionality at the vehicle level?
- Have we considered both functional insufficiencies and reasonably foreseeable misuse?
- What parameters define hazardous behavior for the system, and how can we quantify them?
- What is an acceptable level of residual risk, considering regulations, market standards, and the potential for harm?
- What existing data or benchmarks can inform our acceptance criteria?
- Are there specific accident statistics or traffic analyses/ data we should be referencing?
Setting SOTIF acceptance criteria is not guesswork, but rather a careful choice informed by established safety principles and lots and lots of real-world data. These criteria help us as an engineering community to draw a line in the sand for what is considered “safe enough” in terms of residual risk. Validation targets make these acceptance criteria actionable – they specify how much evidence (in terms of test miles, scenario coverage, statistical confidence, etc.) we need to collect to convince ourselves and our users that our ADAS/ADS system meets the safety benchmark.
In our next entry, we will perform a similar deep dive into the specification of validation targets.