07/25/25

Demystifying SOTIF Acceptance Criteria and Validation Targets – Part 4

This article offers an in-depth look at topics related to Autonomous Systems. It continues from Part 1, Part 2 and Part 3 of this series.

For expert-level training—including certification-based programs—on these topics and more, explore our Automotive trainings. To learn how we support product development, compliance, and organizational safety goals with consulting support, visit our Autonomous Product Development page—or contact us directly.

Mathematical Underpinnings

The acceptance criteria is one side of the coin – the target we set. The other side is figuring out how to measure or demonstrate that the system meets that target. This is where SOTIF’s validation targets and the associated mathematical frameworks come into play. They help bridge the gap between a top-level criteria (e.g., “no more than N accidents per hour”) and what needs to be verified or validated at the system level (e.g. how often a hazardous event may occur, or how much testing is needed to gain confidence in meeting the criteria).

One fundamental concept introduced in ISO 21448 (in Annex C) is the idea of breaking down an acceptable harm rate (A_H) into an acceptable hazardous-behavior rate (R_HB). Not every occurrence of a hazardous behavior leads to an accident or harm – sometimes the scenario conditions are such that no one gets hurt. For example, an unintended emergency brake application (hazardous behavior) might be benign if there is no trailing vehicle close behind. But the same hazardous behavior in heavier traffic could lead to an injury-causing crash. Therefore, if we allow ourselves a certain maximum rate of harm (accidents), the automated driving system could potentially exhibit a higher rate of the hazardous behavior itself, as long as most of those events do not result in harm.

Aligning on Terminologies

Before we start discussing the mathematical underpinnings of acceptance criteria and validation targets, we must get an understanding of the following terms:

A_H: The acceptance criterion for the rate of harm (e.g., accidents or injuries per hour), related to a particular hazardous event. This value is derived from original acceptance criteria in combination with the safety margin.
R_HB: The acceptable rate of the hazardous behavior (due to a functional insufficiency) that we can tolerate while still meeting A_H.
P_E|HB: The conditional probability that if the hazardous behavior happens, the scenario is conducive to a bad outcome. (e.g., if an unintended braking occurs, what fraction of those events happen with a trailing vehicle following close behind?)
P_C|E: The probability that given that hazardous situation, it is not controllable (i.e., the driver or system cannot mitigate the harm).
P_S|C: The probability that given loss of control in that scenario, a certain severity of harm occurs. (e.g., X % of the involved persons are heavily injured and Y % of the involved persons are at least slightly injured).

As we can recognize, the identified parameters P_E|HB, P_C|E, and P_S|C, can be checked for consistency with the parameters Exposure (E), Controllability (C), and Severity (S).

Deriving Validation Targets and Validation Effort

The acceptable harm rate equals the hazardous behavior rate times the chance that, given the hazardous behavior occurs, the situation is such that the vehicle is exposed to danger (E), times the chance that this situation is uncontrollable (C), times the chance that it leads to severe outcome (S). We can represent this as:

A_H = R_HB x P_E|HB x P_C|E x P_S|C

Each conditional probability term represents a piece of the puzzle from hazard to harm.

Using this equation, we can solve for the allowable hazardous behavior rate R_HB that corresponds to a chosen harm criterion A_H.

R_HB = A_H / ( P_E|HB x P_C|E x P_S|C )

R_HB can be used to derive an applicable validation target. It translates the top-level acceptance criterion into a more measurable requirement.

Let’s plug in some numbers to see how this works, shall we?

Suppose our acceptance criterion for a particular harm is A_H = 2 x 10^-7 per hour – meaning we aim for at most one harmful event in 20 million hours of ADS fleet operation. Say we also make the following assumptions:

P_E|HB is 0.1 (unitless entity) i.e., 10% of the time the hazard occurs in a scenario where the occurrence of hazardous behavior can lead to harm.

P_C|E is 0.2 (unitless entity) i.e., the hazardous behavior leading to this harm is not controllable in 20% of the cases.

P_S|C is 0.04 (unitless entity) i.e., the severity addressed by the acceptance criteria is reached in 4% of the cases.

Plugging these in:

R_HB = A_H / ( P_E|HB x P_C|E x P_S|C ) = 2 x 10^-7 /h / ( 0.1 x 0.2 x 0.04 ) = 2.5 x 10^-4 /h

In more intuitive terms, this R_HB equates to one occurrence every 4,000 hours on average. So, based on those inputs, our ADS would be allowed to exhibit the hazardous behavior about once in 4,000 hours and still satisfy the top-level acceptance criteria. This hazardous behavior rate (about 1 in 4,000 hours) becomes a target for validation: we would need to verify, with high statistical confidence, that the ADS’s actual hazardous behavior rate is below this number.

It is generally much easier to observe hazardous behavior (like an unnecessary emergency brake) than to wait for an actual accident to happen, especially if we deliberately create challenging scenarios during testing.

For this particular use case, let’s assume that human drivers experience an average of X miles between incidents – which is our benchmark (B). For safety, an additional margin Y>1 is specified.

The acceptance criterion for the SOTIF applicable system selected is B × Y average miles between potentially hazardous behaviours or a target incident rate of A_H = 1 / (B × Y). Assuming that the incidents have a Poisson distribution, we can represent the validation target (T) as follows, assuming a statistical confidence α.

T = – ln(1 – α) / A_H

Using the validation target T, the system can be shown to have an incident rate lower than or equal to A_H with a confidence α, if there is T quantity of driving (miles or hours) with no potentially hazardous behaviour.

In fact, 0 occurrences in 4,000 hours corresponds to about 63% confidence (α = 0.63) that the rate is under the target in this example.

At 90% confidence level, i.e., α = 0.9, the validation effort will be 2.3 x 4000 = 9200 hours i.e., 0 occurrences in 9200 hours of operation.

How can we prove an ADS is safe if serious accidents are rare? We can’t just wait for a one-in-a-million event to happen. In this blog entry, we explored the mathematical framework for developing tangible validation targets and validation effort from a human-driven vehicle benchmark, described in Annex C of the SOTIF standard.

Refer to the previous blog entries below:

Part 1

Part 2

Part 3