Approximately Equal: Drawbacks of Unit Testing

ARTICLE SERIES

Jun 05, 2022

A lot of companies preach about the benefits of unit testing, although an alarming amount of them do it in an utterly mindless way; i.e., for the sake of saying they do unit testing, and ironically testing only the most harmless parts of their code.

But we are not here today to demonize unit testing: do it, and do it well. But, in general, unit testing seems to have been silently based on a few simplistic assumptions. First, the belief that the outcome of testing subjects—or units—are deterministically-valued: a solid 0, 1, 10, or 2,147,483,647. Second, and perhaps more important, that the arbitrary selection of testing units can be done unaware of the architecture.

The issue is semantic right from the naming. Thinking of “units” automatically sets the mental image of an entity which is taken from somewhere and put under the magnifier only to be scrutinized in isolation. Here’s an uncomfortable truth: most software out there can’t be split in parts—or discretized—for testing purposes. In other words, the fact that an arbitrarily defined set of chunks of a software system can pass a test does not necessarily mean the whole thing can pass a test. This is a typical characteristic of highly coupled systems.

Could a doctor examine the heart condition of a patient by taking it away and putting it on a table? Of course not. The only way to examine the condition of a living heart is by keeping it connected to the rest of the system, which in turn depends on said heart being connected. Our bodies can’t be partitioned in isolated testing units. We are walking “testing units” in the software sense of the word.

Imagine now that you go to the doctor with your hypothetical twin. The doctor runs a general examination, including weighing you both. You are, say, 65.2kg, and your twin is 66.2kg. Who’s healthier? Clearly, those numbers do not say much. Here’s the thing: you can’t test complex things only by observing numerical outputs and comparing them to arbitrary thresholds. Testing needs to be more, way more nuanced than that.

When your software is, for instance, some inventory management system, where quantities of stock are integer numbers, and software units are mostly about performing database queries, tables refresh and routines based on boolean states and integer values, you can more easily split the architecture in a set of relevant chunks to be put under the spotlight. You sure want to avoid your database connector to fail inelegantly if the remote server is down.

But when it comes to software related to control systems, a routine may (and will) output slightly different numbers at every run, and still be correct. Moreover, numerically-intensive routines are closed-loop, that is, they will only converge to correct ranges of values if they are fed with the right ranges of values from routines which also depend on the output of your testing unit.

Furthermore, when it comes to testing such kinds of systems, we are more interested in observing that the underlying dynamics of the process the function implements is sound and robust, more than testing if a switch/case is lacking a default clause, or if some unsigned variable was supposed to be signed. For example, how do you easily set the pass/fail criteria when unit testing a function which implements a Finite-Impulse-Response (FIR) filter? The function is only the “vessel” so to speak, but the convolution sum is the real testing subject. Also, performance do matter; the filter may have roll-off, cut-off frequency, and passband ripple requirements. Is the implementation complying to those? Same goes for a PID controller: overshoot requirements, settling time, transients, stability, etc. There’s absolutely no way of testing a PID controller in isolation: the amount of artificial stimuli to create around it in order to set up a sound “test bench” would be as complex as keeping the controller connected to the rest of the original system.

To truly test highly coupled systems controlled by software, it’s just not enough to go turn the knobs in an individual function’s input arguments and see what makes the function go NaN. Testing highly coupled control systems requires a good understanding of the underlying architecture and the dynamics involved. A TDD purist would say: “write your code in testing units”, this means, design your software to be natively broken down in testable units instead of breaking down the software later on, which sounds nice but the problem with that is that you may easily lose focus of the primary objective of any software meant to deal with dynamical systems such as spacecraft, aircraft, drones, power plants, etc: devising an architecture which can serve well the—vintage term alarm—business logic. Granted, the “I am done with the coding. It is up to the testing team to find if there are any defects” mindset is not right either. Coding cannot be a mindless pursue of achieving compilation success, nor should the absence of segfaults be a measure of quality.

How to complement the drawbacks of unit testing? Further stages of the verification process (integration testing, system testing) are the most relevant stages for verifying highly coupled systems, which requires industrial amounts of simulation capabilities to properly stimulate the system under test. Fidelity of simulation environments is instrumental to create the conditions for a bug to reveal itself. As it goes, testing only proves the existence of bugs, but never their absence.

Follow ReOrbit on LinkedIn and Twitter for regular updates!

Pedro J. Aphalo

Jun 10, 2022

Very true! I would add that what you explain is why many failures seem to be related to programing and software design being in the hands of persons or teams lacking a good understanding of the complex systems in which software is only a component. In my own field of plant biology research, I see too frequently the same problem: lack of awareness of that because organisms are complex systems, trying to study only small "units" within them independently of the whole introduces "bugs" in our knowledge and too frequently even catastrofic failures of envisioned practical applications. Provisionally ignoring complexity helps when we are stuck trying to solve a problem, but any answer we produce in this mode is also provisional, and needs validation with all the complexity in place. My own view is that this applies all the way from everyday problems to building satellites and beyond.

1 reply by ReOrbit

1 more comment...

ReOrbit

Discussion about this post

Ready for more?