Testing Software That Reuses

Note: This essay is a lightly revised version of one written in early 1992. These ideas have been developed in more detail in my book, The Craft of Software Testing, Prentice Hall, 1995, ISBN 0-13-177411-5.

1. The Argument

Conditions to test can be found by examining past programmer errors.
Often, errors are due to misuse of an abstraction (a datatype, operation, or object).
Sometimes, the abstraction is provided in the form of reusable software.
Its producer could provide information about likely misuses in a way useful for testing.
The user of the reusable software ("the consumer") could use this information when building test cases. These test cases target potential mistakes in the consumer's code.
A coverage tool could measure how thoroughly the consumer's tests have probed likely misuses.

2. Test Design

Test design occurs in two stages, be they implicit or explicit. In the first stage, test requirements are created. A test requirement is some condition that at least one test case must satisfy.

Examples:

Argument A is negative.
Argument A is 0.
Arguments A and B are equal.

In the second stage, the test requirements are combined into test cases that precisely describe the inputs to the program and its expected results.

Example:

        A=-1, B=-1
        Expected value:  13

This test satisfies two of the test requirements. The rules by which test requirements are combined into test cases are irrelevant to the argument of this note.

Where do the test requirements come from? Most obviously, they come from knowledge of what the program is supposed to do. For example, if we're testing the implementation of a memory allocator, malloc(amount), we might get these test requirements:

amount is negative.
amount memory is available.
Not enough memory is available.

In this case, there's a test requirement to see if the program handles invalid inputs correctly and two test requirements for the two kinds of return values.

Test requirements also come from an understanding of the types of errors the programmer likely made while writing the program. Code is built from clichés [Rich90], such as "searching a list", "sorting a list", "decomposing a pathname into a directory and a file", "hash tables", "strings" (null terminated character arrays), and so on. Notice that both operations and data types can be clichés. Programmers often make clichéd errors when implementing or using clichés [Johnson83]. Thus, you can create a catalog of test requirements, indexed by cliché. One such catalog is given in my book.

There are two ways clichés may be manifest in programs:

They may be implemented inline. For example, a searching cliché may be implemented as a loop over a vector.
They may be reusable code. On UNIX, for example, searching might be done by a call to a subroutine named bsearch().

The test requirements for a cliché-using program depend on the manifestation. If the cliché is implemented inline, you must test that implementation. For a vector search loop, you'll use a test requirement that probes for off-by-one errors: "element found in last position". But if the search is a call to a well-tested subroutine, that test requirement would likely be useless. Instead, you would restrict your attention to plausible errors in the program's use of the cliché -- such as faults where bsearch is called correctly but the caller fails to handle the result properly. Here, a test requirement might be "element not found", since programmers sometimes fail to think about that possibility. Another example would be testing uses of the write() routine with "write fails", since programs that assume writes always succeed abound.

Of course, not all reusable subroutines implement clichés, but all reusable code can generate the same sort of test requirements, test requirements that probe likely misuses. In the absence of more information, some general rules are:

A test requirement for each error return.
A test requirement for each distinct type of "normal return". For example, if a routine returns five status codes, the calling code isn't well tested unless it has shown that it can handle all five possibilities.

But there can also be test requirements peculiar to a reused routine or datatype. For example, suppose a collection that can grow and shrink is provided as a datatype. However, the user must reinitialize the collection whenever the last element is removed. Programmers will surely often forget the reinitialization; this likely error can be captured in a test requirement like "an element is removed from a single-element collection, then a new element is added".

Henceforth, these test requirements will be called producer test requirements. Consumers (users of the reusable code) find these producer test requirements, since they make the common mistakes. The people best suited for circulating this information to all consumers are the producers. (Of course, in this particular case, they would also want to rework the design of the datatype to eliminate the cause of the common error, but this cannot always be done. Some abstractions are simply inherently complicated.)

Along with reusable software, producers should provide catalogs of producer test requirements. These are used in the same way as, and in conjunction with, the general-purpose cliché catalog mentioned earlier. For example, if a consumer is writing software that uses a stream of input records to modify a collection, the "empty, then add" producer test requirement can be combined with generic stream test requirements to produce good test cases for that software.

3. Test Coverage

I'll first skim the basics of coverage. As an example, I'll use output from my freeware test coverage tool GCT.

A coverage tool is used in three phases:

The program is instrumented by adding code to check whether coverage conditions are satisfied. Coverage conditions are simple test requirements derived mechanically from the code. Example: one coverage condition might require that an IF on line 256 be taken in the true direction, while another would require that the < on line 397 be evaluated with its left-hand side equal to its right-hand side. (This boundary condition helps discover off-by-one errors.)
At runtime, the program executes and updates a log of which coverage conditions have been satisfied.
After the program runs, reporting tools produce output that looks like this:

     "lc.c", line 256: if was taken TRUE 0, FALSE 11 times.
     "lc.c", line 397: operator < might be <=. (need left == right)

What good is that output? In some cases, it points to weaknesses in test design. For example, the second line may be a symptom of forgetting to test boundaries. The first line may point to an untested feature. In other cases, coverage points to mistakes in implementation, where your tests don't test what you thought they did. (This happens a lot.)

The tester of software that reuses would find coverage more useful if it were derived from the reused code. Suppose the tester forgot about the "last element removed and new one added" test requirement. Or suppose it was tested, but the test's input was handled by special-case code that never exercised the reused code at all. Conventional coverage very well might miss the omission. What the tester wants is a coverage report that looks like this:

    "lc.c", line 218: collection.add never applied to reinitialized collection.
    "lc.c", line 256: if was taken TRUE 0, FALSE 11 times.
    "lc.c", line 397: operator < might be <=. (need left == right)
    "lc.c", line 403: bsearch never failed.

The last requirement prompts us to write a test to detect omitted error-handling code. Branch coverage, for instance, would not tell us we need to write such a test - it can't generate a coverage condition for an IF statement that ought to be there but isn't. Such faults of omission are common in fielded systems [Glass81].

To allow such coverage, a producer must provide two things:

An "applet" that communicates with the coverage tool during instrumentation. For appropriate function calls or method calls, the applet tells the tool how many log entries to allocate and what the reporting tools should report for each.
Testing versions of reusable software that mark when producer test requirements are satisfied.

REFERENCES

[Glass81]: Robert L. Glass, "Persistent Software Errors", Transactions on Software Engineering, vol. SE-7, No. 2, pp. 162-168, March, 1981.
[Johnson83]: W.L Johnson, E. Soloway, B. Cutler, and S.W. Draper. Bug Catalogue: I. Yale University Technical Report, October, 1983.
[Rich90]: C. Rich and R. Waters. The Programmer's Apprentice. New York: ACM Press, 1990.

Services

Writings

Weblog

Tools

Agile Testing

[an error occurred while processing this directive]