
This blog is a continuation of our previous post, Is the SW-FMEA Busy-work? – A SW-FMEA guide, and provides an SW-FMEA example. The previous blog will be referred to as SW-FMEA guide.
The purpose of the SW-FMEA is to systematically and comprehensively evaluate the safety-related software architecture and its functions.
Let’s reflect on key differences between the commonly known FMEA (DFMEA, PFMEA) and the SW-FMEA
SW-FMEA is a part of safety-oriented software analysis and interacts with software architecture and safety requirements, as illustrated in the figure below.
The primary input to SW-FMEA is the software architecture, which includes static and dynamic aspects.Â
Static architecture
Dynamic architecture
A software block diagram is typically sufficient for the SW-FMEA. For complex software, multiple diagrams at different abstraction levels may be required.
Identified gaps lead to corrective actions such as enhancing safety requirements, modifying the software architecture, or adding safety measures.
Let’s assume an ASIL C safety goal of preventing unintended acceleration or deceleration for a traction inverter in an electric drive powertrain.
The defined safe state is active 3-phase short, which involves shorting the motor coils to each other to prevent overvoltage on the DC voltage rail.
Example of a Software Safety Requirement:
If the torque output exceeds TORQUE_REQUEST ± ACCEPTABLE_RANGE_THRESHOLD Nm for more than PERCEPTION_THRESHOLD ms, the system shall trigger a safe state request.
The E-Gas structure is a well-established trusted design architecture for inverters. It separates nominal functions from safety functions, allowing for advantages such as easier updates to nominal functions without compromising safety mechanisms.
It is important to note that the E-Gas principle fundamentally applies to fail-safe systems – systems where turning off or ceasing operation results in a safe state. In contrast, fail-operational systems must continue functioning to maintain safety even in the presence of a failure.
A traction inverter is generally considered fail-safe, as shutting it down can prevent hazardous situations. However, certain fail-operational aspects exist, such as 3-phase short, which requires parts of the control logic and inverter power stage to remain at least partially operational to ensure system safety.
E-gas consists of three levels:
The inverter software architecture is structured as follows:
Software can only exhibit systematic failures as per ISO 26262. The only failure possible in software is a “bug”, meaning coding error or design insufficiency, introduced during development. When a bug is found, the root cause must be addressed, and corrective actions taken. However, speculating about root causes in the SW-FMEA is unnecessary. That said, design changes or additional safety measures resulting from the SW-FMEA may require a root cause investigation. For example, additional safety measures to ensure transmission of safety signal over a bus might require a root cause analysis of an existing high bus load.Â
A historical example is NASA’s Mars Climate Orbiter, which was lost in 1998 due to a software bug that failed to correctly convert imperial to metric units. This exemplifies how a small software error can lead to catastrophic results. Fault tolerance means to continue to deliver a function in the presence of faults for fail-operational systems. Even if the metric unit is wrong, a hazardous event needs to be prevented. Â
The SW-FMEA focuses on ensuring robustness against erroneous inputs and fault tolerance . Instead of analyzing why a bug exists, we evaluate how the system handles incorrect signals and whether redundancies or plausibility checks are needed. For example:
The following table proposes guidewords to be used for possible malfunctions. The essence of the guidewords is to provide a model of categorizing malfunctions irrespective of their actual root cause.
Timing-related aspects are allocated to function calls, while signal errors are analyzed using the following fault model inspired by hardware fault categories in ISO 26262 part 5.
This section contains multiple screenshots of an SW-FMEA against the software architecture shown in section 3.2.
The below snippet shows the first SW-FMEA columns for the functions ‘CAN E2E’ and ‘AC Current’. Both functions are part of the module ‘Sensor & Hw inputs’. The functions qualify signals coming from the hardware to ASIL C signals.
The column ‘Potential Malfunctions’ contains the elements from the mental model explained in the previous section.
The SW-FMEA is continued in the next screenshot showing potential malfunctions, effects, and safety mechanisms.
The program flow monitor ensures correct execution sequence and duration of the monitored functions. A minimum and maximum execution duration is defined for each function.
CAN E2E refers to the end-to-end protection of the communication on the CAN Bus. This includes CAN message CRC (in principle similar to a checksum), CAN sequence counter, CAN timeout, CAN value range check.
As an SW-FMEA example of another function, the next screenshot shows ‘calculate torque’ of the module ‘L2 Function Monitoring’
Signals previously qualified to the appropriate ASIL level can be used within the same software without another plausibility check, i.e. if one function stores a value in memory, then another function in the same software context can read it without another range or plausibility check as long as Freedom From Interference is ensured.
If you’re ready to conduct an SW-FMEA but need additional support, resources, or expert assistance, contact us at SRES!