Here's my response, which I decided to preserve.
I have to disagree, and I do it with my classical tester hat
on. For two reasons.
First, let's suppose that TDD code is less well tested than the
classical ideal. Doesn't matter, since the classical tester almost
never sees code that comes close to that ideal. Testers spend much of
their day finding, laboriously, bugs that programmers could have found
easily. Typically, those are bugs where the programmer intended
something, but didn't check that she had in fact achieved her
intention (or didn't check that her intention stayed achieved as the
program changed). TDD certainly does that.
Next, I argue that TDD code isn't all that far from the classical
ideal. There are really three kinds of classical ideal.
1) The test suite meets some structural coverage criterion. Generally,
the minimal criterion considered acceptable is that all logical tests
evaluate both true and false. For if (a && b)
, that
requires three tests.
I believe a competent set of TDD tests will come awfully close to that
criterion. It might not reach more stringent ones, such as those based
on dataflow coverage, but I argue that those criterion don't have much
to recommend them
(The Craft of Software Testing).
2) The code meets a bug coverage criterion: some variant of mutation
testing. (Jester measures
a kind of mutation coverage.) Generally, mutation testing works by
making one-token changes to the program and checking whether any test
notices that change. The idea is that a test suite that can't tell
whether the code should be a < b
or a >
b
must be missing lots of other bugs. That seems plausible.
What doesn't seem plausible to me is the assumption that a test suite
that can detect one-token bugs is necessarily good at detecting more
complicated bugs. As far as I know, there's no hard evidence for that,
and I suspect it's false (because of faults of
omission). There are other problems with mutation testing - it can
require a lot of tests, and my (limited)
personal experience is that
it has a low "Aha!" factor. That is, when I designed what I considered
to be good tests, then checked them with a
mutation testing tool I'd
written, the work rarely resulted in my saying, "Oh, that's a good
test."
So I wouldn't worry about that criterion.
3) Note that I said "pretty good tests" above. Where did those pretty
good tests come from? Experience. I have in my head (and, sometimes,
on paper) a catalog of plausible bugs and the kinds of test cases that
find them. I've built this by, first, standing on the shoulders of
others and, second, by paying attention when I see bugs.
TDD asks programmers to pay attention. In notes they've written to this list, John Arrizza and Bill Wake have shown that they are. Ron Jeffries has spoken of learning how to test better by paying attention to what bugs slip past. As paying attention becomes part of the received wisdom, all TDD programmers will write tests that look pretty darn sufficient.