
Building Defensible AI Assurance Arguments in ISO/PAS 8800 (Part 2): Dataset Governance and Validation
This article continues our 3-part series summarizing key points discussed in our recent Fireside Chat with SRES Partners Jody Nelson, Gokul Krithivasan and Bill Taylor, along with Eduard Dojan of SGS-TÜV Saar on ISO/PAS 8800 and early implementation experience across OEMs and suppliers deploying machine learning in safety-critical systems.
Watch the full discussion here: What Auditors Really Look For: A Fireside Chat on ISO/PAS 8800 and Its Next Evolution
Dataset Governance, Validation, and the Shift from R&D to Production
In Part 1, we examined how ISO/PAS 8800 extends ISO 26262 and ISO 21448 (SOTIF) by mapping system-level triggering conditions and functional insufficiencies into AI triggering conditions and integrating machine learning artifacts into the broader safety case.
In Part 2 we move from structural integration to implementation reality — particularly dataset governance, validation strategy, and the shift from deterministic decomposition toward statistical assurance that underpins a defensible AI assurance argument.
Data Is Central to AI Safety
Data is central to AI systems. Leaders in the AI space tend to have more data. However, safety is not determined by quantity alone. It depends on:
- How representative the data is
- How accurate it is
- How complete it is
ISO/PAS 8800 reflects this through defined expectations around dataset lifecycle activities and dataset properties. In Clause 11, the standard defines a structured dataset lifecycle, including collection, augmentation, safety analysis, and validation activities. The standard identifies dataset properties such as:
- Accuracy
- Completeness
- Independence between training, validation, and test datasets
- Traceability to the source of the data
Maintaining real independence between datasets is challenging in practice. Preventing training data from leaking into validation or test sets requires deliberate controls. Weak independence can undermine confidence in performance results and weaken the overall assurance argument. Dataset characteristics directly influence model performance and, therefore, system behavior.
Representativeness and Coverage
Strong performance in nominal conditions does not guarantee adequate performance across the full operational context.
A model may perform well on straight highways in clear weather while failing to address rare or degraded conditions. Validation must demonstrate acceptable behavior across the defined Operational Design Domain (ODD), including:
- Environmental variation
- Rare scenarios
- Edge cases
- Adversarial examples
Dataset construction and validation strategy must connect back to identified hazards and AI triggering conditions. This maintains the traceability structure established in Part 1.
Labeling and Process Controls
Labeling and data curation introduce additional risk. Machine learning systems may rely on human labeling, automated labeling pipelines, or hybrid approaches that combine both. Errors in annotation, systematic tool limitations, inconsistent labeling rules, or insufficient review mechanisms can directly affect model performance. We are seeing an accelerated shift toward automated labeling, which brings software tool evaluation and qualification topics into view.
In R&D environments, labeling workflows may be informal. In certification-oriented environments, they require defined policies, quality criteria, and review mechanisms.
Concepts familiar to functional safety engineers reappear at the dataset level. Process-level analyses — including process FMEA-style approaches — are increasingly applied to data collection, labeling workflows, and dataset management activities as organizations transition from exploratory R&D to scalable production environments.
ISO/PAS 8800 does not invent these practices. Many experienced ML teams already perform robustness testing and edge-case evaluation. The standard formalizes expectations and integrates them into a lifecycle structure with defined objectives and work products.
From Deterministic Decomposition to Statistical Validation
Traditional functional safety relies on deterministic requirement refinement. High-level safety goals are translated into lower-level requirements and verified through defined implementation and testing activities.
For machine learning models, direct translation to model-level requirements is typically not feasible.
Assurance therefore becomes more validation-focused, scenario-based, and statistical in nature — relying on representative scenario coverage and aggregated validation evidence rather than strict requirement decomposition.
Instead of deriving low-level deterministic requirements from vehicle-level hazards, organizations must demonstrate acceptable behavior across representative datasets and scenarios.
Traceability remains essential, but the connected artifacts differ. The assurance argument links:
- Hazards
- Triggering conditions
- Datasets
- Validation results
- Integration into the overall safety case
From an assessment perspective, the focus shifts from verifying requirement decomposition to evaluating whether validation evidence is sufficient and aligned with the defined scope and Operational Design Domain (ODD).
The R&D to Production Transition
A recurring challenge is the transition from research-oriented development to production-grade maturity.
R&D environments prioritize:
- Iteration speed
- Performance improvement
- Experimental flexibility
Certification environments require:
- Defined processes
- Documented work products
- Evidence that defined processes were followed
- Repeatability
Many organizations were already performing substantial portions of these activities before ISO/PAS 8800 was published. However, those activities were not always structured or documented in a manner suitable for formal assessment.
ISO/PAS 8800 introduces lifecycle structure. It aligns machine learning development with established functional safety process expectations and formalizes documentation and traceability requirements.
The transition is often less about introducing new technical methods and more about adopting a process-oriented approach that supports auditability and defensible assurance arguments. In practice, ISO/PAS 8800 formalizes what strong ML teams were already attempting — but it requires that those practices be structured, documented, and defensible under audit.
Next in the Series
In Part 3 of this series, we examine how these lifecycle elements translate into audit expectations, certification pathways, and the applicability of ISO/PAS 8800 beyond automotive systems. [Read Part 3]
Need support applying this in practice? Explore our ISO 8800 training or connect with us about consulting support.
Have insights or questions? Send us an email at info@sres.ai or leave a comment below—we welcome thoughtful discussion from our technical community.
Interested in learning more about our approach? Explore why teams choose SRES training and how we help automotive organizations with consulting support across functional safety, cybersecurity, autonomy safety, and EV development.



