Testing Foundations
|
Note: This essay is a lightly revised version of one written in early 1992. These ideas have been developed in more detail in my book, The Craft of Software Testing, Prentice Hall, 1995, ISBN 0-13-177411-5.
Test design occurs in two stages, be they implicit or explicit. In the first stage, test requirements are created. A test requirement is some condition that at least one test case must satisfy.
Examples:
In the second stage, the test requirements are combined into test cases that precisely describe the inputs to the program and its expected results.
Example:
A=-1, B=-1 Expected value: 13
This test satisfies two of the test requirements. The rules by which test requirements are combined into test cases are irrelevant to the argument of this note.
Where do the test requirements come from? Most obviously, they come from knowledge of what the program is supposed to do. For example, if we're testing the implementation of a memory allocator, malloc(amount), we might get these test requirements:
In this case, there's a test requirement to see if the program handles invalid inputs correctly and two test requirements for the two kinds of return values.
Test requirements also come from an understanding of the types of errors the programmer likely made while writing the program. Code is built from clichés [Rich90], such as "searching a list", "sorting a list", "decomposing a pathname into a directory and a file", "hash tables", "strings" (null terminated character arrays), and so on. Notice that both operations and data types can be clichés. Programmers often make clichéd errors when implementing or using clichés [Johnson83]. Thus, you can create a catalog of test requirements, indexed by cliché. One such catalog is given in my book.
There are two ways clichés may be manifest in programs:
The test requirements for a cliché-using program depend on the manifestation. If the cliché is implemented inline, you must test that implementation. For a vector search loop, you'll use a test requirement that probes for off-by-one errors: "element found in last position". But if the search is a call to a well-tested subroutine, that test requirement would likely be useless. Instead, you would restrict your attention to plausible errors in the program's use of the cliché -- such as faults where bsearch is called correctly but the caller fails to handle the result properly. Here, a test requirement might be "element not found", since programmers sometimes fail to think about that possibility. Another example would be testing uses of the write() routine with "write fails", since programs that assume writes always succeed abound.
Of course, not all reusable subroutines implement clichés, but all reusable code can generate the same sort of test requirements, test requirements that probe likely misuses. In the absence of more information, some general rules are:
But there can also be test requirements peculiar to a reused routine or datatype. For example, suppose a collection that can grow and shrink is provided as a datatype. However, the user must reinitialize the collection whenever the last element is removed. Programmers will surely often forget the reinitialization; this likely error can be captured in a test requirement like "an element is removed from a single-element collection, then a new element is added".
Henceforth, these test requirements will be called producer test requirements. Consumers (users of the reusable code) find these producer test requirements, since they make the common mistakes. The people best suited for circulating this information to all consumers are the producers. (Of course, in this particular case, they would also want to rework the design of the datatype to eliminate the cause of the common error, but this cannot always be done. Some abstractions are simply inherently complicated.)
Along with reusable software, producers should provide catalogs of producer test requirements. These are used in the same way as, and in conjunction with, the general-purpose cliché catalog mentioned earlier. For example, if a consumer is writing software that uses a stream of input records to modify a collection, the "empty, then add" producer test requirement can be combined with generic stream test requirements to produce good test cases for that software.
I'll first skim the basics of coverage. As an example, I'll use output from my freeware test coverage tool GCT.
A coverage tool is used in three phases:
"lc.c", line 256: if was taken TRUE 0, FALSE 11 times. "lc.c", line 397: operator < might be <=. (need left == right)
What good is that output? In some cases, it points to weaknesses in test design. For example, the second line may be a symptom of forgetting to test boundaries. The first line may point to an untested feature. In other cases, coverage points to mistakes in implementation, where your tests don't test what you thought they did. (This happens a lot.)
The tester of software that reuses would find coverage more useful if it were derived from the reused code. Suppose the tester forgot about the "last element removed and new one added" test requirement. Or suppose it was tested, but the test's input was handled by special-case code that never exercised the reused code at all. Conventional coverage very well might miss the omission. What the tester wants is a coverage report that looks like this:
"lc.c", line 218: collection.add never applied to reinitialized collection. "lc.c", line 256: if was taken TRUE 0, FALSE 11 times. "lc.c", line 397: operator < might be <=. (need left == right) "lc.c", line 403: bsearch never failed.
The last requirement prompts us to write a test to detect omitted error-handling code. Branch coverage, for instance, would not tell us we need to write such a test - it can't generate a coverage condition for an IF statement that ought to be there but isn't. Such faults of omission are common in fielded systems [Glass81].
To allow such coverage, a producer must provide two things: