Demystifying SOTIF Acceptance Criteria and Validation Targets – Part 3
In our previous blog entry, we explored the building blocks of acceptance criteria in detail. To make these actionable, we need to know how many hours or miles of virtual and/or real-world testing is needed to be convinced that our Advanced Driver Assistance System (ADAS) or Automated Driving System (ADS) is “safe enough” for large-scale real-world deployment.
In this entry, we will work to translate the SOTIF acceptance criteria at the vehicle level into concrete, testable validation targets at the lower abstraction levels. In upcoming Part 4, we will explore the mathematical frameworks with the help of an example.
From Acceptance Criteria to Validation Targets: Designing Tests for Known and Unknown Scenarios
Once acceptance criteria are defined and broken down into specific quantitative targets, the next challenge is planning how to validate that the system meets them. This is where validation targets come into play. A validation target is essentially a concrete goal for testing or analysis that, if met, gives confidence that the acceptance criteria are satisfied. ISO 21448:2022 makes it very clear – validation targets are the bridge between establishing theoretical criteria and showcasing real-world evidence.
Defining Validation Targets
The SOTIF standard does not prescribe one universal approach to set validation targets – it varies depending on the validation method used. For each method (whether it’s virtual simulation, controlled track testing, real-world driving, analytical modeling, etc.), you should determine an appropriate scope and depth/ effort of testing. For instance, if virtual simulation is a chosen method, a validation target might be running a certain number of randomized simulations covering diverse operational scenarios. If track testing is leveraged, the target might be a specific number of hours of testing under pre-defined scenarios/ maneuvers. The key is that there must be a rationale connecting that effort to the acceptance criteria. It’s simply not enough to say “we’ll test our ADS-equipped vehicle for 100 hours” – you need to explain why 100 hours is sufficient given the acceptance criteria (perhaps based on statistical confidence arguments like the one above, or based on coverage of scenarios).
We must carefully consider factors like the number of relevant scenarios, the distribution of test cases, and distance or time driven while testing when formulating validation targets. For instance, if our acceptance criterion is about the rate of a certain false positive (FP) event like unintended AEB activation, our validation target might be something along the lines of “demonstrate fewer than X false positives in Y miles of driving”. This ties directly to evidence – drive Y miles (in real or simulated environments) and count the occurrences of that hazardous event/ behavior.
Known vs Unknown Hazard Scenarios
In Part 1, we introduced the concept of SOTIF “Scenario Area 2” (known hazardous scenarios) and “Area 3” (unknown hazardous scenarios). These areas influence how we approach validation targets:
- For known hazardous scenarios (Area 2), the acceptance criteria for those scenarios are explicitly defined. We know what can go wrong and how often we’ll accept it. Validation targets here typically involve designing specific test cases to replicate those hazardous scenarios and show the system can handle them or that their occurrence is below an acceptable threshold. For instance, if a known hazardous scenario is “a pedestrian crossing at night on an unlit road in front of the ego vehicle” and our acceptance criterion is that our vehicle must avoid collision in 99 out of 100 such encounters, then our validation target might be to actually test 100 instances of that scenario (or use appropriate statistical tests) and confirm at most 1 collision. Essentially, you challenge the ADAS/ADS system directly with the known challenging situations and see if it meets the acceptance criteria.
- For unknown hazardous scenarios (Area 3), by definition we don’t have a complete list – there are going to be edge case situations or rare combinations of environmental/ operational factors that weren’t identified during development. Here, acceptance criteria have to naturally be broader (e.g., “the residual risk from encountering unknown scenarios must be sufficiently low”) without a specific numeric target per scenario. Instead of a preset threshold per scenario, the validation target often takes the form of a testing campaign objective. An approach described in Annex C of the ISO 21448 standard is to iteratively expand the test space until the rate of finding new unique hazardous scenarios drops to an acceptably low level. The idea is that by aggressively testing varied scenarios (through virtual simulation, fuzzing sensor inputs, performing combinatorial variation of environmental factors, etc.), we gain confidence that if there were easy-to-trigger unknown hazards, we would have found them. If we don’t find any after extensive testing, we infer that remaining unknown risks are very unlikely to occur in real-world operation.
In practice, a combination of structured (deterministic) tests and stochastic exploration is used. Structured tests target the known scenario triggering conditions and associated functional insufficiencies, while stochastic or varied testing searches for unknown problems. For example, we could use a large-scale simulation setup to generate thousands of random traffic scenarios within our Operational Design Domain (ODD). A validation target might be set like: “Run 1 million miles worth of randomized simulation covering varied ODD conditions and observe no hazardous events beyond a frequency of Z”.
Coming Back to Acceptance Criteria
No matter the approach we pursue, every validation target must ultimately support an argument that the acceptance criterion is met. Let’s say our acceptance criterion was the earlier example “The AEB-equipped vehicle should have no more than one unintended emergency braking incident per 100,000 miles”. Then, a corresponding validation strategy might be — use a combination of field testing and simulation to accumulate, say, 200,000 miles of driving exposure across diverse scenarios, including those most likely to cause crashes with trailing vehicles, and observe zero such collisions. Statistically, this outcome would provide strong evidence that the true collision rate is below 1 per 100,000 miles (with some confidence level).
Researchers from RAND have noted that demonstrating extremely low collision rates by driving alone requires enormous mileage – potentially hundreds of millions of miles – if we wait for actual crashes to measure safety. Therefore, validation targets often leverage accelerated testing in simulation or controlled proving grounds to accumulate the equivalent exposure needed. By focusing on the hazardous behaviors (like unintended braking or missed detections) rather than actual collisions, we can drastically reduce the testing time and mileage required.
It’s also common to break down validation targets by subsystem or component. For example, one acceptance criterion might be at the vehicle level (“less than X hazardous events per hour of driving”), but validation targets to support it could include things like: “the perception system must detect pedestrians with at least 99.9% accuracy, the braking system must respond within Y milliseconds 99.99% of the time”, etc. Each of these component-level validation targets provides evidence that, in combination, the overall system meets the top-level acceptance criteria.
Importance of Iterations
Safety assurance is always an iterative process. As product development progresses, new insights or incident data will emerge, requiring updates to acceptance criteria and/or validation targets. The SOTIF process workflow is meant to be repeated several times for successful outcomes.
Verification and Validation results in functional modifications (to be interpreted as design improvements) per ISO 21448:2022, Clause 8. All the while, keeping solid documentation of the rationales, the achieved test coverage, and the results is crucial. In the end, a SOTIF safety argument for release will reference these acceptance criteria and validation outcomes to show that the residual risk is as low as required and no unreasonable risk remains.
At SRES, we use the following thought-starters while specifying validation targets:
- What are specific and measurable objectives that will demonstrate our system meets defined acceptance criteria?
- What types of scenarios will we use for testing, and how can we ensure comprehensive coverage of the operational design domain (ODD)?
- How can we define test scenarios that expose potential limitations of the system?
- How will we utilize both real-world and simulation environments for validation? What is an appropriate split?
- How will we allocate the test effort across different validation activities?
- For unknown scenarios, how can we ensure that the residual risk meets acceptance criteria with sufficient confidence?
In this and the previous blog entry, we only briefly touched on the mathematical frameworks that underpin the correct specification of acceptance criteria and validation targets. In Part 4, we will discuss the mathematical approaches used in the standard for developing defensible validation targets to meet acceptance criteria with the help of a simplified example.
Want to learn more?
Register for our SOTIF (ISO 21448) training: https://sres.ai/training/iso-214482022-safety-of-the-intended-functionality-sotif-training-sgs-tuv-saar/
Reach out to us about our hands-on consulting offerings: https://sres.ai/consulting/automotive/autonomous-product-development/