In functional safety we consider risks stemming from two main buckets of failures: systematic and random hardware failures. Systematic failures are faults generally introduced by human error, thereupon we reduce our risks by improving processes and procedures, or by improving the design with various levels of review (verification). In standards like the ISO 26262 standard, we don’t have quantitative measures to signify the reduction of risk due to systematic failures. However, for random hardware failures, which follow a probabilistic distribution, we can apply quantitative safety analysis with target metrics to quantify our risk reduction. The most used quantitative safety analysis in the automotive industry is the Failure Modes, Effects and Diagnostic Analysis (FMEDA).
When the FMEDA is done well, it can be a very powerful tool to identify the weaknesses in your hardware architecture and circuit. Additionally, it improves the safety mechanisms needed to protect hardware failures, therefore often improving the software safety requirements, and is used as a verification method for overall completeness and correctness of your safety requirements. The FMEDA provides a clearer answer to the question, “have we done enough?”, since there is a prescribed method to calculate the random hardware failure metrics, and there are target values to design towards. This is not possible with systematic failures, where we need to provide qualitative safety justification for when we’ve “done enough”.
From our years of experience moderating, training, overseeing, and conducting FMEDAs, the following are the most important areas to understand, so that the FMEDA doesn’t go wrong:
- This is a team exercise: often organizations think this is done by just one team or person, but that is the wrong mindset. A functional safety team can’t do it without the expertise and knowledge of the hardware developers. Understanding failure modes and what can or cannot violate safety often requires isolated simulations and a detailed understanding of the hardware. Hardware developers can’t do it without understanding how the software is implemented. The best opportunity for doing an FMEDA well is having the functional safety team moderate and drive the FMEDA, but this requires getting sufficient support from both the hardware and software teams. Management needs to be involved so that all required team members can have time allocated to support the FMEDA activity.
- Keep it simple and conservative in the beginning: often organizations get discouraged by the complexity and effort required to conduct the FMEDA. But it is better to take an iterative approach, starting out simpler using conservative assumptions (i.e., handbooks with higher failure rates, assuming lower diagnostic coverages, etc.), and then in later iterations add in the required details and fine tuning. Use the conservative approach to identify the top 10 portions of the circuit that drive up the random hardware metrics. Spend time improving them by the various methods prescribed by the FMEDA (i.e., lower FIT components, increase diagnostics, strategically design for fault tolerance, etc.). After improving these top 10 weaknesses, then go back for another iteration, continually improving and tuning the FMEDA.
- Intentionally create a hardware architecture: partitioning is not only a powerful tool for software architectures but can also be very productive for hardware designs. How do microcontroller manufacturers reduce their FMEDA effort in future generations? By partitioning and having a good architecture with modular blocks. We’ve been told by some semiconductor manufacturers that a full FMEDA in the first generation took two person-years of effort, but the following generation was reduced to six person-months of effort: 25% of the original effort. For items that have multiple safety goals, this also helps to identify the hardware blocks that can impact each specific safety goal, which can greatly reduce time when the FMEDA is needed for multiple safety goals. With a proper hardware architecture, each block can be calculated in detail at the hardware part level, but then the subsequent safety goals or hardware generations can perform the FMEDA at the block level, rather than at the hardware part level.
- Do it even if not required: the quantitative safety analysis is only highly recommended for ASIL C and D. However, Part 5 (Product development at the hardware level) of the ISO 26262 standard doesn’t provide specific guidance on the hardware design, outside of the FMEDA. For that reason, we highly recommend organizations to perform the FMEDA for ASIL A and B development, even if it is a reduced variant of a fully compliant quantitative safety analysis. Without the FMEDA, it is difficult to determine if enough has been done for safety.
- Get it reviewed: although the FMEDA is a quantitative analysis, we can still make systematic mistakes when conducting it. One of our tools to reduce risks to systematic failures is through reviews, and specifically independent reviews. If you conduct the FMEDA, make sure to plan to have it independently reviewed by a group or organization familiar with the detailed requirements and processes of performing the FMEDA. Confirmation reviews are a requirement for safety analysis from ASIL A to ASIL D.
Need more help or guidance? Contact us at SRES as we can support you in moderating, training, overseeing, and conducting the FMEDA. We have a two-day detailed FMEDA training with real automotive circuits that can be found on our website.