An alternative to business-facing TDD
The value of programmer TDD is well established. It’s natural to extrapolate that practice to business-facing tests, hoping to obtain similar value. We’ve been banging away at that for years, and the results disappoint me. Perhaps it would be better to invest heavily in unprecedented amounts of built-in support for manual exploratory testing.
In 1998, I wrote a paper, “When should a test be automated?“, that sketched some economics behind automation. Crucially, I took the value of a test to be the bugs it found, rather than (as was common at the time) how many times it could be run in the time needed to step through it manually.
My conclusions looked roughly like the following:
Scripted tests, be they automated or manual, are expensive to create (first column). Manual scripts are cheaper, but they still require someone to write steps down carefully, and they likely require polishing before they can truly be followed by someone else. (Note: height of bars not based on actual data.)
In the second column, I assume that a particular set of steps has roughly the same chance of finding a bug whether executed manually or by a computer, and whether the steps were planned or chosen on the fly. (I say “roughly” because computers don’t get bored and miss bugs, but they also don’t notice bugs they weren’t instructed to find.)
Therefore, if the immediate value of a test is all that matters, exploratory manual testing is the right choice. What about long-term value?
Assume that exploratory tests are never intentionally repeated. Both their long-term cost and value are zero. Both kinds of scripted tests have quite substantial maintenance costs (especially in that era, when testing was typically done through an unmodified GUI). So, to pull ahead of exploratory tests in the long term, scripted tests must have substantial bug-finding power. Many people at that time observed that, in fact, most tests either found a bug the first time they were run or never found a bug at all. You were more likely to fix a test because of an intentional GUI change than to fix the code because the test found a bug.
So the answer to “when should a test be automated?” was “not very often”.
Programmer TDD changes the balance in two ways:
-
New sources of value are added. Extremely rapid feedback reduces the cost of debugging. (Most bugs strike while what you did to create them is fresh in your mind.) Many people find the steady pace of TDD allows them to go faster, and that the incremental growth of the code-under-test makes for easier design. And, most importantly as it turns out, the need to make tests run fast and reduce maintenance cost leads to designs with good properties like low coupling and high cohesion. (That is, properties that previously were considered good in the long term—but were routinely violated for short-term gain—now had powerful short-term benefits.)
-
Good design and better programmer tools dramatically lowered the long-term cost of tests.
So, much to my surprise, the balance tipped in favor of automation—for programmer tests. It’s not surprising that many people, including me, hoped the balance could also tip for business-facing tests. Here are some of the hoped-for benefits:
-
Tests might clarify communication and avoid some cases where the business asks for something, the team thinks they’ve delivered it, and the business says “that’s not what I wanted.”
-
They might sharpen design thinking. The discipline of putting generalizations into concrete examples often does.
-
Programmers have learned that TDD supports iterative design of interfaces and behavior. Since whole products are also made of interfaces and behavior, they might also benefit from designers who react to partially-finished products rather than having to get it right up front.
-
Because businesses have learned to mistrust teams who show no visible progress for eight months (at which point, they ask for a slip), they might like to see evidence of continuous progress in the form of passing tests.
-
People often need documentation. Documentation is often improved by examples. Executable tests are examples. Tests as executable documentation might get two benefits for less than their separate costs.
-
And, oh yeah, tests could find regression bugs.
So a number of people launched off to explore this approach, most notably with Fit. But Fit hasn’t lived up to our hopes, I think. The things that particularly bother me about it are:
-
It works well for business logic that’s naturally tabular. But tables have proven awkward for other kinds of tests.
-
In part, the awkwardness is because there are no decent HTML table editors. That inhibits experimentation: if you don’t get a table format right the first time, you’re tempted to just leave it.
Note: I haven’t tried ZiBreve. By now, I should have. I do include Word, Excel, and their OpenOffice equivalents among the ranks of the not-decent, at least if you want executable documentation. (I’ve never tried treating .doc files as the real tests that are “compiled” into HTML before they’re executed.)
-
Fit is not integrated into programmer editors the way xUnit is. For example, you can’t jump from a column name to the Java method that defines it. Partly for this reason, programmers tend to get impatient with people who invent new table formats—can’t they just get along with the old one?
With my graphical tests, I took aim at those sources of friction. If I have a workflow test, I can express it as boxes and arrows:
I translate the graphical documents into ordinary xUnit tests so that I can use my familiar tools while coding. The graphical editor is pretty decent, so I can readily change tests when I get better ideas. (There are occasional quirks where test content has changed more than it looks like it has. That aspect of using Fit hasn’t gone away entirely.)
I’ve been using these tests, most recently on wevouchfor.org—and they don’t wow me. While I almost always use programmer TDD when coding (and often regret skipping it when I don’t), TDD with these kinds of tests is a chore. It doesn’t feel like enough of the potential value gets realized for the tests to be worth the cost.
-
↓ Writing the executable test doesn’t help clarify or communicate design. Let me be careful here. I’m a big fan of sketching things out on whiteboards or paper:
That does clarify thinking and improve communication. But the subsequent typing of the examples into the computer is work that rarely leads to any more design benefits.
-
↔ Passing tests do continuously show progress to the business, but… Suppose you demonstrate each completed story anyway, at an end-of-iteration demo or (my preference) as soon as it’s finished. Given that, does seeing more tests pass every day really help?
-
↑ Tests do serve as documentation (at least when someone takes the time to surround them with explanatory text, and if the form and content of the test aren’t distorted to cram a new idea into existing test formats).
-
↑ The word I’m hearing is that these tests are finding bugs more often than I expected. I want to dig into that more: if they’re the sort of “I changed this thing over here and broke that supposedly unrelated thing over there” bugs that whole-product regression tests are traditionally supposed to find, that alone may justify the expense of test automation—unless I can find a way to blame it on inadequate unit tests or a need to rejigger the app.
-
↓ (This is the one that made me say “Eureka!”) Tests alone fail at iterative product design in an interesting way. Whenever I’ve made significant progress implementing the next chunk of workflow or other GUI-visible change, I just naturally check what I’ve done through the GUI. Why? This checking makes new bugs (ones the automated tests don’t check for) leap out at me. They also sometimes make me slap my forehead and say, “What I intended here was stupid!”
But if I’m going to be looking at the page for both bugs and to change my intentions, I’m really edging into exploratory testing. Hmm… What if an app did whatever it could to aid exploratory testing? I don’t mean traditional testability features like, say, a scripting interface; I mean a concerted effort to let exploratory testers peek and poke at anything they want within the app. (That may not be different than my old motto “No bug should be hard to find the second time,” but it feels different.)
So, although features of Rails like not having to restart the server after most code changes are nice, I want more. Here’s an example.
The following page contains a bug:
Although you can’t see it, the bottom two links are wrong. They are links to /certifications/4
instead of /promised_certifications/4
.
-
Unit tests couldn’t catch that bug. (The two methods that create those types of links are tested and correct; I just used the wrong one.)
-
One test of the action that created the page could have caught the bug, but did not. (To avoid maintenance problems, that test checked the minimum needed to convince me that the correct “certifications” had been displayed. I assumed that if they were displayed at all, the unit tests meant they were displayed correctly. That was actually almost right—every character outside the link’s
href
value was correct.) -
I missed the bug when I checked the page. (I suspect that I did click one of the links, but didn’t notice it went to the wrong place. If so, I bet I missed the wrongness because I didn’t have enough variety in the test data I set up—ironic, because I’ve been harping on the importance of “irrelevant” variety since 1994.)
-
A user had no trouble finding the bug when he tried to edit one of his promised certifications and found himself with a form for someone else’s already-accepted certification. (Had he submitted the form, it would have been rejected, but still.)
That’s my bug: a small error in a big pile of HTML the app fired and forgot.
Suppose, though, that the app created and retained an object representing the page. Suppose further that an exploration support app let you switch to another view of that object/page, one that highlights link structure and downplays text:
To the eyes of someone who just added promised certifications to that page, the wrong link targets ought to jump out.
There’s more that I’d like, though. The program knows more about those links than it included in the HTTP Response body. Specifically, it knows they link to a certain kind of object: a PromisedCertification
. I should be able to get a view of that object (without committing to following the link). I should be able to get it in both HTML form and in some raw format. (And if the link-to-be-displayed were an object in its own right, I would have had a place to put my method, and I wouldn’t have used the wrong one. Testability changes often feed into error prevention.)
And so on… It’s easy enough for me to come up with a list of ways I’d like the app to speak of its internal workings. So what I’m thinking of doing is grabbing some web framework, doing what’s required to make it explorable, using it to build an app, and also building an exploration assistant in RubyCocoa (allowing me to kill another bird with this stone).
To be explicit, here’s my hypothesis:
An application built with programmer TDD, whiteboard-style and example-heavy business-facing design, exploratory testing of its visible workings, and some small set of automated whole-system sanity tests will be cheaper to develop and no worse in quality than one that differs in having minimal exploratory testing, done through the GUI, plus a full set of business-facing TDD tests derived from the example-heavy design.
We shall see, I hope.
March 24th, 2008 at 11:07 pm
I feel less alone in my thinking about testing after reading this post! :) Very much looking forward to what you come up with.
March 29th, 2008 at 2:00 am
In our organization, Test Automation is quite a hot topic, and we are struggling with one item in the Done Definition - “Test case 100% automated (excluding exploratory testing, and testing requiring direct HW operation)” for functional level or business facing tests.
In a department wide retrospective, one accordant comment is “Test Automation is low valued”. Main complaint is exactly the maintenance effort. However I still insist on it’s needed, and valueable. While my suspicion is many testers do not have good automation skill, after improved their skill and help them to work in the right way, situation will be changed.
April 1st, 2008 at 10:56 am
For the most part we don’t have time to make our “executable” graphical tests, well, err, actually executable, but the tests do serve nicely as scripts for developers on the team to use as a starting point for exploratory testing each others’ code.
Also, in keeping w/ convention over code, we’ve been able to come up w/ templates and idioms for workflows in our graphics so it is quite fast to develop a suite of graphical tests (although it took a while to get critical mass of templates.)
Management’s also been a lot more comfortable that are our testing is validating what is supposed to as well, now that we have a graphical, common, (mostly) non-technical language to discuss testing.
June 15th, 2008 at 11:27 am
[…] A lot less mention of Fit this year. That’s due to my decision to tone down my emphasis on automated business-facing tests in favor of exploring other ideas. […]
June 18th, 2008 at 6:31 pm
We are developing e-Commerce application and are using FIT for automate business logic that does involve a lot of calculation. So far it is working very well for us. We are using FitPro instead of Fitnesse that has Eclipse plug-in and store tests as files not as Wiki pages. This makes an integration with version control and automated build very easy.
We have started with building automated tests for existing code, so there is not much maintenance involved. This allowed us to cut down on regression test time significantly.
I still see FIT adding value that you cannot get with just unit tests
August 10th, 2008 at 12:10 pm
Does your hypothesis assume there will be lots of exploratory testing involved? I do see that your approach could work, although I worry it will be used as an excuse for programmers to only worry about their unit level test automation and not help with automation that could speed up unit tests. In fact, I’m going to include your hypothesis in our book. It seems a reasonable alternative - would like to talk to teams who’ve tried it.
August 13th, 2008 at 11:07 am
Lisa: yes, I assume exploratory testing.
And I would be wise to wonder what could go wrong - what the most likely failure modes are - rather than just worry about what’s needed to make it go right.
August 24th, 2008 at 4:11 pm
Abstracting one layer up … it’s more critical that the feedback be given than the mode that it is given in?
Seems to bode well for the ‘micro-context’ that I believe each user story/feature/requirement has. The micro-context is what makes us decide the balance between exploratory/scripted-manual/scripted-automated/static testing that we do. As long as we provide/acquire the feedback, we can work with the economics and the micro-context to optimize.
Or something like that.
A.
October 13th, 2008 at 8:50 pm
Hi Brian,
I have several comments I want to make after reading this blog post:
1) Your indication that functional tests seem like too much work may be because you have become very unit test infected. Functional tests are there to test end-to-end behavior which unit tests can’t (shouldn’t be) catch(ing). Since they are “less efficient” at finding specific bugs, I think they are better written by testers than programmers. Testers that are trained to understand what the unit tests do and to craft tests that truly exercise the integration. Programmers still don’t seem to have the testing skills to do this effectively, so I’d propose this is still a good area where more traditional testers can add value to the test safety net.
2) During my tenure as a release engineer at Hitachi, I learned the value of exploratory tests, and I use this approach much as you are advocating: as a substitute for planning too early and creating test-cases that aren’t relevant. In recent agile test work, I’ve delayed developing my test procedure until I get enough of a feature to test. Then I’d exploratory test to learn what’s there. Then I’d write a procedure that defines appropriate test-cases for repeat and regression testing. I think you need to adapt your “minimal sanity testing combined with exploratory testing”, into the same plus feeding the results of exploratory tests into regression tests. These regression tests might act as a substitute for customer acceptance tests.
3) Good exploratory testers are in short supply, so your approach of using exploratory tests in place of customer defined tests may not be “scalable” without training more people in exploratory testing.
4) I’ve concluded that much of my exploratory testing has been a coping strategy to deal with less than healthy communication between the business and programmers. I’d like to practice the same skills earlier to define test-cases for stories during an “example test-case writing workshop”. This is Gojko Adzic’s idea and it will be explained in his upcoming book (see http://www.acceptancetesting.info/). I get the feeling that I’ll still identify more end to end tests during my own exploration when testing each feature, but I believe having end-to-end customer tests in advance will allow me even more time for exploration. I believe I’ll want to keep my additional tests separate from the customer ones, and I believe I’ll want to refactor these down into unit tests where they truly don’t test integration of units.
5) I believe FIT will encourage programmers to write software that is more testable at integration points (such like xUnit encourages more testability in modules). I don’t think the approach in your hypothesis encourages the same. In my opinion, this may cause increased technical debt at the integration level. This would be a hidden cost that could result from the test methodology you are proposing.
Sincerely,
–
Bob Clancy
9 Lives Software Engineering
December 19th, 2008 at 10:03 am
Hi Brian,
Your experience mostly mirrors mine with large FIT suites. It did turn out to be valuable in finding regression bugs - but I don’t know if the maintenance cost justified that.
The other thing you hit on is editor support. That seamless transition from FIT to Fixture code and back isn’t there yet. I’ve got some ideas next time I run into Corey Haines to pair with, but can’t wait to see where this leads.
December 19th, 2008 at 11:54 am
[…] development is asserted as potentially more valuable than automated acceptance testing in this article from Brian […]
July 25th, 2010 at 8:19 pm
[…] Brian Marick wrote a lovely essay on An Alternative to Business-Facing TDD. […]
January 1st, 2012 at 12:46 pm
[…] Yesterday Brian Marick made an blog entry with some thoughts considering alternatives to business-facing TDD. Here is an abstract: Abstract: The value of programmer TDD is well established. It’s natural to extrapolate that practice to business-facing tests, hoping to obtain similar value. We’ve been banging away at that for years, and the results disappoint me. Perhaps it would be better to invest heavily in unprecedented amounts of built-in support for manual exploratory testing. An Alternative To Business-Facing TDD […]
July 17th, 2013 at 12:29 pm
[…] Brian Marick pointed me in that direction, I started the investigation of Exploratory Testing driven […]