03/05/26

Building Defensible AI Assurance Arguments in ISO/PAS 8800 (Part 2): Dataset Governance and Validation

This article continues our 3-part series summarizing key points discussed in our recent Fireside Chat with SRES Partners Jody Nelson, Gokul Krithivasan and Bill Taylor, along with Eduard Dojan of SGS-TÜV Saar on ISO/PAS 8800 and early implementation experience across OEMs and suppliers deploying machine learning in safety-critical systems.

Watch the full discussion here: What Auditors Really Look For: A Fireside Chat on ISO/PAS 8800 and Its Next Evolution

Dataset Governance, Validation, and the Shift from R&D to Production

In Part 1, we examined how ISO/PAS 8800 extends ISO 26262 and ISO 21448 (SOTIF) by mapping system-level triggering conditions and functional insufficiencies into AI triggering conditions and integrating machine learning artifacts into the broader safety case.

In Part 2 we move from structural integration to implementation reality — particularly dataset governance, validation strategy, and the shift from deterministic decomposition toward statistical assurance that underpins a defensible AI assurance argument.

Data Is Central to AI Safety

Data is central to AI systems. Leaders in the AI space tend to have more data. However, safety is not determined by quantity alone. It depends on:

How representative the data is
How accurate it is
How complete it is

ISO/PAS 8800 reflects this through defined expectations around dataset lifecycle activities and dataset properties. In Clause 11, the standard defines a structured dataset lifecycle, including collection, augmentation, safety analysis, and validation activities. The standard identifies dataset properties such as:

Accuracy
Completeness
Independence between training, validation, and test datasets
Traceability to the source of the data

Maintaining real independence between datasets is challenging in practice. Preventing training data from leaking into validation or test sets requires deliberate controls. Weak independence can undermine confidence in performance results and weaken the overall assurance argument. Dataset characteristics directly influence model performance and, therefore, system behavior.

Representativeness and Coverage

Strong performance in nominal conditions does not guarantee adequate performance across the full operational context.

A model may perform well on straight highways in clear weather while failing to address rare or degraded conditions. Validation must demonstrate acceptable behavior across the defined Operational Design Domain (ODD), including:

Environmental variation
Rare scenarios
Edge cases
Adversarial examples

Dataset construction and validation strategy must connect back to identified hazards and AI triggering conditions. This maintains the traceability structure established in Part 1.

Labeling and Process Controls

Labeling and data curation introduce additional risk. Machine learning systems may rely on human labeling, automated labeling pipelines, or hybrid approaches that combine both. Errors in annotation, systematic tool limitations, inconsistent labeling rules, or insufficient review mechanisms can directly affect model performance. We are seeing an accelerated shift toward automated labeling, which brings software tool evaluation and qualification topics into view.

In R&D environments, labeling workflows may be informal. In certification-oriented environments, they require defined policies, quality criteria, and review mechanisms.

Concepts familiar to functional safety engineers reappear at the dataset level. Process-level analyses — including process FMEA-style approaches — are increasingly applied to data collection, labeling workflows, and dataset management activities as organizations transition from exploratory R&D to scalable production environments.

ISO/PAS 8800 does not invent these practices. Many experienced ML teams already perform robustness testing and edge-case evaluation. The standard formalizes expectations and integrates them into a lifecycle structure with defined objectives and work products.

From Deterministic Decomposition to Statistical Validation

Traditional functional safety relies on deterministic requirement refinement. High-level safety goals are translated into lower-level requirements and verified through defined implementation and testing activities.

For machine learning models, direct translation to model-level requirements is typically not feasible.

Assurance therefore becomes more validation-focused, scenario-based, and statistical in nature — relying on representative scenario coverage and aggregated validation evidence rather than strict requirement decomposition.

Instead of deriving low-level deterministic requirements from vehicle-level hazards, organizations must demonstrate acceptable behavior across representative datasets and scenarios.

Traceability remains essential, but the connected artifacts differ. The assurance argument links:

Hazards
Triggering conditions
Datasets
Validation results
Integration into the overall safety case

From an assessment perspective, the focus shifts from verifying requirement decomposition to evaluating whether validation evidence is sufficient and aligned with the defined scope and Operational Design Domain (ODD).

The R&D to Production Transition

A recurring challenge is the transition from research-oriented development to production-grade maturity.

R&D environments prioritize:

Iteration speed
Performance improvement
Experimental flexibility

Certification environments require:

Defined processes
Documented work products
Evidence that defined processes were followed
Repeatability

Many organizations were already performing substantial portions of these activities before ISO/PAS 8800 was published. However, those activities were not always structured or documented in a manner suitable for formal assessment.

ISO/PAS 8800 introduces lifecycle structure. It aligns machine learning development with established functional safety process expectations and formalizes documentation and traceability requirements.

The transition is often less about introducing new technical methods and more about adopting a process-oriented approach that supports auditability and defensible assurance arguments. In practice, ISO/PAS 8800 formalizes what strong ML teams were already attempting — but it requires that those practices be structured, documented, and defensible under audit.

Next in the Series

In Part 3 of this series, we examine how these lifecycle elements translate into audit expectations, certification pathways, and the applicability of ISO/PAS 8800 beyond automotive systems. [Read Part 3]

Need support applying this in practice? Explore our ISO 8800 training or connect with us about consulting support.

Have insights or questions? Send us an email at info@sres.ai or leave a comment below—we welcome thoughtful discussion from our technical community.

Interested in learning more about our approach? Explore why teams choose SRES training and how we help automotive organizations with consulting support across functional safety, cybersecurity, autonomy safety, and EV development.

Building Defensible AI Assurance Arguments in ISO/PAS 8800 (Part 2): Dataset Governance and Validation

Dataset Governance, Validation, and the Shift from R&D to Production

Data Is Central to AI Safety

Representativeness and Coverage

Labeling and Process Controls

From Deterministic Decomposition to Statistical Validation

The R&D to Production Transition

Next in the Series

What Auditors Really Look For: A Fireside Chat on ISO/PAS 8800 and Its Next Evolution

Building Defensible AI Assurance Arguments in ISO/PAS 8800 (Part 3): Certification, Audits, and Applicability Beyond Automotive

Leave a Reply Cancel reply

Services

Resources

Legal

© Copyright 2026 SecuRESafe, LLC. All rights reserved.