12/16/25

Conversations on Randomness of Software

This article was written by an SRES functional safety expert and examines why software failures are treated as systematic—not random—under standards such as ISO 26262 and IEC 61508, and how this perspective changes when considering complex and ML-enabled software systems.

Looking to go deeper? SRES provides expert-led functional safety training, including certificate-based programs, as well as hands-on consulting support to help organizations implement ISO 26262 and related functional safety requirements across the product lifecycle.

Ask enough technology users and you will find stories of software UI glitches, websites showing something one second and something different another second, and behavior that most users will just say “oh, it was behaving randomly.” The traditional engineering view is that software cannot fail “randomly” – except, of course, when the hardware running it has a failure.

As modern software increases in complexity, especially with the rise of non-deterministic software incorporating machine learning, there is a nascent movement to apply probabilistic analyses to software when building arguments for safe operation of such software. Is modern software now a blend of systematic and truly random behavior?

Standards such as IEC 61508 and ISO 26262, however, take the stance that software does not, and cannot, have “random” failures in the same sense as hardware. These standards therefore treat all software issues as systematic issues which can be addressed only by design or process efforts and assign no failure rate to software.

What are the implications of considering software to have a “failure rate?” For hardware systems, failure rates are rooted in physics and are ultimately due to quantum uncertainty. But what about software? How does software “fail”? For that matter – what does “random” mean?

What is Random?

Without going into a treatise on probability, probability in engineering is used to model our inability to perfectly measure every state in a system. We cannot feasibly measure the order of 10²⁰ or so atoms in a resistor to establish its individual unique properties and predict the exact moment it might fail (and even if we could, pesky quantum mechanics rears its head). Instead we evaluate population characteristics and have robust predictive models that tell us, for a given manufacturing process and usage profile, how likely it is that the total population behaves in a certain way. We can then use statistical tools to make claims about a single member of such a population. In the case of failure analyses, the parameter of interest is “the portion of the population that will have failed (or survived) as a function of time after entry to service” which, when plotted, gives us the oft-cited bathtub curve of “failure rates” for hardware components.

Unlike hardware, software failures can’t be characterized by a bathtub-curve of failure rate over time: software doesn’t have infant mortality, doesn’t have a period of constant failure rate, and doesn’t wear out due to fatigue. There is no “population” of software components to sample in the same sense as hardware. Hardware failures arise from several sources: the inherent uncertainty in the properties of a macroscopic physical device, the inherent variability across a population of devices, and the variability in the environmental and usage profiles for each member of a population. Software has no “variability” or “uncertainty” in its implementation, nor is there any “part-to-part” variability in a software executable image from one part to another — every copy can be verified to be digitally identical. The only remaining source of variability in software behavior, then, is in the variation in environmental and usage profiles.

Common Examples

Let’s consider some examples: race conditions, performance degradation, and ML statistical performance.

Consider a typical software structure that is often described as having “random” behavior: deadlocks. Deadlocks may happen when two tasks in a multithreaded application “line up their requests just right.” We may not know the exact reasons why deadlocks occur, and we might even be able to make measurements like “we see one deadlock per 100 hours of operation.” So we have assigned a “failure rate” to this concern – but why would we argue that this isn’t a random failure? This type of failure can theoretically be eliminated by design, discovered in testing by intentionally varying the relative timing between tasks, or by choosing a different algorithm that is deadlock- free. There are even programming languages which can be provably “deadlock free” – with zero uncertainty. Therefore, this type of failure is squarely in the scope of standards like ISO 26262. In theory, these failures can be detected and eliminated “at design time.”

Another example is software appearing to degrade over time. We have all experienced our computers “getting slower as they age.” While this performance changes over time, the software itself doesn’t degrade or wear-out; any changes in performance over time are (we claim) exclusively due to algorithm scaling characteristics, not a “failure” of the software: as a data set increases in size, any algorithm that processes each element of that data will necessarily take longer. This behavior can be addressed by changing the algorithm to have different scaling characteristics. As an example, an O(n2) algorithm “randomly slowing down” is likely not a random slowdown at all, but merely an increase in n.

Software algorithms may also have performance characteristics which can be characterized using “failure rate” language. A machine learning algorithm for evaluating traffic lights may have a 1-in-10 thousand false negative rate. But this is not due to a “malfunction” of the algorithm but rather, as defined in ISO 21448, an insufficiency in the algorithm or data. This is perhaps the closest a software system can have a “failure rate” in the sense of hardware failure rates. In these situations, the combinations of pixel values that result in unwanted behavior start to become large enough – just like the combinations of atom states – that they are infeasible to analyze completely.

Unlike hardware systems, however, where we know that we can improve performance by increasing the dimension of a part, or changing a manufacturing process, we do not yet have the science to know what parts of an ML system can be “increased” to reliably change performance. We are left with a state-of-the-art that proclaims “increase training dataset” but we are left not knowing what training data we need, or how many more images are required, or if we need to add one or two or five convolution layers to our model. We are then left with a way to describe the performance of a system after we’ve built it – that is, we can say “this model and data set has this false negative rate; this other model has this rate.”

Applied Philosophy

Where does this leave us? Is there benefit in thinking about “software failure rate?” If we are writing new software systems (or guiding AI systems to write software for us), we should avoid the trap of accepting “random” software failures and instead strive for the ideals of deterministic processes like ISO 26262 and DO-178 to eliminate all the design-time software errors. If we are writing or training ML systems for object detection, we should be diligent in characterizing the performance of our systems, so that systems integrators can intelligently account for the known performance of those systems. We can also look for ways to improve our science for ML systems by identifying the “knobs” we can turn to improve that performance. A reasonable approach is summarized in the following table.

Software Behavior	Approach to dealing with “randomness”
Sporadic behaviors (deadlocks, resource contention, etc.)	• Modify algorithm structure to eliminate possible contention • Add runtime checks to monitor for contention as diagnostic and debug support
Performance degradation over time / data set size	• Modify algorithm to address big-O scaling • Modify algorithm to partition or prioritize data to ensure bounded execution time • Runtime monitoring of execution time to support debugging
Statistical performance of complex features (e.g., ML-based performance)	• Treat performance as a statistical failure rate in analyses such as FTA or FMEDA to evaluate acceptable risk and define mitigation measures • Ensure documentation and treatment are tied to a specific software version and data set; respect that this is descriptive, not predictive

So there is solace that we can indeed address many software problems, those that are just related to “bugs” that can be weeded out in design and coding, through traditional testing. We can respect the present state of understanding that we cannot predict the performance of our ML systems, but we can faithfully describe them once we have made them. And we can anticipate the time when we gain sufficient understanding to engineer, ahead of time, ML systems with targeted “reliability” characteristics.

Have insights or questions? Send us an email at info@sres.ai or leave a comment below—we welcome thoughtful discussion from our technical community.

Interested in learning more about our approach? Explore why teams choose SRES training and how we help automotive organizations with consulting support across functional safety, cybersecurity, autonomy safety, and EV development.

Conversations on Randomness of Software

What is Random?

Common Examples

Applied Philosophy

CES Wrap-Up 2026: The Humanoid Robot Safety Question

ISO 26262 Edition 3: Part 3 and Part 4 – Item and System Level

Leave a Reply Cancel reply

Services

Resources

Legal

© Copyright 2026 SecuRESafe, LLC. All rights reserved.