Archive for the 'TDD' Category

An alternative to business-facing TDD

The value of programmer TDD is well established. It’s natural to extrapolate that practice to business-facing tests, hoping to obtain similar value. We’ve been banging away at that for years, and the results disappoint me. Perhaps it would be better to invest heavily in unprecedented amounts of built-in support for manual exploratory testing.

In 1998, I wrote a paper, “When should a test be automated?“, that sketched some economics behind automation. Crucially, I took the value of a test to be the bugs it found, rather than (as was common at the time) how many times it could be run in the time needed to step through it manually.

My conclusions looked roughly like the following:

test tradeoffs in general

Scripted tests, be they automated or manual, are expensive to create (first column). Manual scripts are cheaper, but they still require someone to write steps down carefully, and they likely require polishing before they can truly be followed by someone else. (Note: height of bars not based on actual data.)

In the second column, I assume that a particular set of steps has roughly the same chance of finding a bug whether executed manually or by a computer, and whether the steps were planned or chosen on the fly. (I say “roughly” because computers don’t get bored and miss bugs, but they also don’t notice bugs they weren’t instructed to find.)

Therefore, if the immediate value of a test is all that matters, exploratory manual testing is the right choice. What about long-term value?

Assume that exploratory tests are never intentionally repeated. Both their long-term cost and value are zero. Both kinds of scripted tests have quite substantial maintenance costs (especially in that era, when testing was typically done through an unmodified GUI). So, to pull ahead of exploratory tests in the long term, scripted tests must have substantial bug-finding power. Many people at that time observed that, in fact, most tests either found a bug the first time they were run or never found a bug at all. You were more likely to fix a test because of an intentional GUI change than to fix the code because the test found a bug.

So the answer to “when should a test be automated?” was “not very often”.

Programmer TDD changes the balance in two ways:

Test tradeoffs for TDD

  1. New sources of value are added. Extremely rapid feedback reduces the cost of debugging. (Most bugs strike while what you did to create them is fresh in your mind.) Many people find the steady pace of TDD allows them to go faster, and that the incremental growth of the code-under-test makes for easier design. And, most importantly as it turns out, the need to make tests run fast and reduce maintenance cost leads to designs with good properties like low coupling and high cohesion. (That is, properties that previously were considered good in the long term—but were routinely violated for short-term gain—now had powerful short-term benefits.)

  2. Good design and better programmer tools dramatically lowered the long-term cost of tests.

So, much to my surprise, the balance tipped in favor of automation—for programmer tests. It’s not surprising that many people, including me, hoped the balance could also tip for business-facing tests. Here are some of the hoped-for benefits:

  • Tests might clarify communication and avoid some cases where the business asks for something, the team thinks they’ve delivered it, and the business says “that’s not what I wanted.”

  • They might sharpen design thinking. The discipline of putting generalizations into concrete examples often does.

  • Programmers have learned that TDD supports iterative design of interfaces and behavior. Since whole products are also made of interfaces and behavior, they might also benefit from designers who react to partially-finished products rather than having to get it right up front.

  • Because businesses have learned to mistrust teams who show no visible progress for eight months (at which point, they ask for a slip), they might like to see evidence of continuous progress in the form of passing tests.

  • People often need documentation. Documentation is often improved by examples. Executable tests are examples. Tests as executable documentation might get two benefits for less than their separate costs.

  • And, oh yeah, tests could find regression bugs.

So a number of people launched off to explore this approach, most notably with Fit. But Fit hasn’t lived up to our hopes, I think. The things that particularly bother me about it are:

  • It works well for business logic that’s naturally tabular. But tables have proven awkward for other kinds of tests.

  • In part, the awkwardness is because there are no decent HTML table editors. That inhibits experimentation: if you don’t get a table format right the first time, you’re tempted to just leave it.

    Note: I haven’t tried ZiBreve. By now, I should have. I do include Word, Excel, and their OpenOffice equivalents among the ranks of the not-decent, at least if you want executable documentation. (I’ve never tried treating .doc files as the real tests that are “compiled” into HTML before they’re executed.)

  • Fit is not integrated into programmer editors the way xUnit is. For example, you can’t jump from a column name to the Java method that defines it. Partly for this reason, programmers tend to get impatient with people who invent new table formats—can’t they just get along with the old one?

With my graphical tests, I took aim at those sources of friction. If I have a workflow test, I can express it as boxes and arrows:

a workflow test

I translate the graphical documents into ordinary xUnit tests so that I can use my familiar tools while coding. The graphical editor is pretty decent, so I can readily change tests when I get better ideas. (There are occasional quirks where test content has changed more than it looks like it has. That aspect of using Fit hasn’t gone away entirely.)

I’ve been using these tests, most recently on wevouchfor.org—and they don’t wow me. Sad While I almost always use programmer TDD when coding (and often regret skipping it when I don’t), TDD with these kinds of tests is a chore. It doesn’t feel like enough of the potential value gets realized for the tests to be worth the cost.

  • Writing the executable test doesn’t help clarify or communicate design. Let me be careful here. I’m a big fan of sketching things out on whiteboards or paper:

    A whiteboard

    That does clarify thinking and improve communication. But the subsequent typing of the examples into the computer is work that rarely leads to any more design benefits.

  • Passing tests do continuously show progress to the business, but… Suppose you demonstrate each completed story anyway, at an end-of-iteration demo or (my preference) as soon as it’s finished. Given that, does seeing more tests pass every day really help?

  • Tests do serve as documentation (at least when someone takes the time to surround them with explanatory text, and if the form and content of the test aren’t distorted to cram a new idea into existing test formats).

  • The word I’m hearing is that these tests are finding bugs more often than I expected. I want to dig into that more: if they’re the sort of “I changed this thing over here and broke that supposedly unrelated thing over there” bugs that whole-product regression tests are traditionally supposed to find, that alone may justify the expense of test automation—unless I can find a way to blame it on inadequate unit tests or a need to rejigger the app.

  • (This is the one that made me say “Eureka!”) Tests alone fail at iterative product design in an interesting way. Whenever I’ve made significant progress implementing the next chunk of workflow or other GUI-visible change, I just naturally check what I’ve done through the GUI. Why? This checking makes new bugs (ones the automated tests don’t check for) leap out at me. They also sometimes make me slap my forehead and say, “What I intended here was stupid!”

But if I’m going to be looking at the page for both bugs and to change my intentions, I’m really edging into exploratory testing. Hmm… What if an app did whatever it could to aid exploratory testing? I don’t mean traditional testability features like, say, a scripting interface; I mean a concerted effort to let exploratory testers peek and poke at anything they want within the app. (That may not be different than my old motto “No bug should be hard to find the second time,” but it feels different.)

So, although features of Rails like not having to restart the server after most code changes are nice, I want more. Here’s an example.

The following page contains a bug:

an ordinary web page

Although you can’t see it, the bottom two links are wrong. They are links to /certifications/4 instead of /promised_certifications/4.

  1. Unit tests couldn’t catch that bug. (The two methods that create those types of links are tested and correct; I just used the wrong one.)

  2. One test of the action that created the page could have caught the bug, but did not. (To avoid maintenance problems, that test checked the minimum needed to convince me that the correct “certifications” had been displayed. I assumed that if they were displayed at all, the unit tests meant they were displayed correctly. That was actually almost right—every character outside the link’s href value was correct.)

  3. I missed the bug when I checked the page. (I suspect that I did click one of the links, but didn’t notice it went to the wrong place. If so, I bet I missed the wrongness because I didn’t have enough variety in the test data I set up—ironic, because I’ve been harping on the importance of “irrelevant” variety since 1994.)

  4. A user had no trouble finding the bug when he tried to edit one of his promised certifications and found himself with a form for someone else’s already-accepted certification. (Had he submitted the form, it would have been rejected, but still.)

That’s my bug: a small error in a big pile of HTML the app fired and forgot.
Suppose, though, that the app created and retained an object representing the page. Suppose further that an exploration support app let you switch to another view of that object/page, one that highlights link structure and downplays text:

The same page, highlighting link hrefs

To the eyes of someone who just added promised certifications to that page, the wrong link targets ought to jump out.

There’s more that I’d like, though. The program knows more about those links than it included in the HTTP Response body. Specifically, it knows they link to a certain kind of object: a PromisedCertification. I should be able to get a view of that object (without committing to following the link). I should be able to get it in both HTML form and in some raw format. (And if the link-to-be-displayed were an object in its own right, I would have had a place to put my method, and I wouldn’t have used the wrong one. Testability changes often feed into error prevention.)

And so on… It’s easy enough for me to come up with a list of ways I’d like the app to speak of its internal workings. So what I’m thinking of doing is grabbing some web framework, doing what’s required to make it explorable, using it to build an app, and also building an exploration assistant in RubyCocoa (allowing me to kill another bird with this stone).

To be explicit, here’s my hypothesis:

An application built with programmer TDD, whiteboard-style and example-heavy business-facing design, exploratory testing of its visible workings, and some small set of automated whole-system sanity tests will be cheaper to develop and no worse in quality than one that differs in having minimal exploratory testing, done through the GUI, plus a full set of business-facing TDD tests derived from the example-heavy design.

We shall see, I hope.

Google talk references

One thing I meant to say and forgot: Just as the evolution of amphibians didn’t mean that all the fish disappeared, the creation of a new kind of testing to fit a new niche doesn’t mean existing kinds are now obsolete.

Context-driven testing:

Testing Computer Software, Kaner, Falk, and Nguyen
Lessons Learned in Software Testing, Kaner, Bach, and Pettichord
http://www.context-driven-testing.com
“When Should a Test Be Automated?”, Marick

Exploratory testing:

James Bach
Michael Bolton
Elisabeth Hendrickson
Jonathan Kohl

Left out:

The undescribed fourth age

An occasional alternative to mocks?

I’m test-driving some Rails helpers. A helper is a method that runs in a context full of methods magically provided by Rails. Some of those methods are of the type that’s a classic motivation for mocks or stubs: if you don’t want them to blow up, you have to do some annoying behind-the-scenes setup. (And because Rails does so much magic for you, it can be hard for the novice to have a clue what that setup is for helpers.)

Let’s say I want a helper method named reference_to. Here’s a partial “specification”: it’s to generate a link to one of a Certification's associated users. The text of the link will be the full name of the user and the href will be the path to that user’s page. I found myself writing mocks along these lines:

mock.should_receive(:user_path).once.
     with(:id=>@originator.login).
     and_return("**the right path**")
mock.should_receive(:link_to).once.
     with(@originator.full_name, "**the right path**").
     and_return("**correct-text**")

But then it occurred to me: The structure I’m building is isomorphic to the call trace, so why not replace the real methods with recorders? Like this:

  def user_path(keys)
    "user_path to #{keys.canonicalize}"
  end

  def link_to(*args)
    "link to #{args.canonicalize}"
  end

  def test_a_reference_is_normally_a_link
    assert_equal(link_to(@originator.full_name, user_path(:id => @originator.login)),
                 reference_to(@cert, :originator))
  end

This test determines that:

  • the methods called are the right ones to implement the specified behavior. There’s a clear correspondence between the text of the spec (”generate a link to”) and calls I know I made (link_to).

  • the methods were called in the right order (or in an order-irrelevant way).

  • they were called the right number of times.

  • the right arguments were given.

So, even though my fake methods are really stubs, they tell you the same things mocks would in this case. And I think the test is much easier to grok than code with mocks (especially if I aliased assert_equal to assert_behaves_like).

What I’m wondering is how often building a structure to capture the behavior of the thing-under-test will be roughly as confidence-building and design-guiding as mocks. The idea seems pretty obvious (even though it took me forever to think of it), so it’s probably either a bad idea or already widely known. Which?

Alternately, I’m still missing the point of mocks.

P.S. For tests to work, you have to deal with the age-old problems of transient values (like dates or object ids) and indeterminate values (like the order of elements in a printed hash). I’m fortunate in that I’m building HTML snippets out of simple objects, so this seems to suffice:

class Object
  def canonicalize; to_s; end
end

class Array
  def canonicalize
    collect { | e | e.canonicalize }
  end
end

class Hash
  def canonicalize
    to_a.sort_by { | a | a.first.object_id }.canonicalize
  end
end

Code emphasis for tests that teach

In product code, not repeating yourself is almost always a good idea. In tests, it’s not so clear-cut. Repeating yourself has the same maintenance dangers as it does for code, but not repeating yourself has two additional downsides:

  • A common knock against well-factored object-oriented code is that no object does anything; they all just forward work on to other objects. That structure turns out to be useful once you’re immersed in the system, but it does make systems harder to learn.

    One purpose of tests is to explain the code to the novice. Remove duplication too aggressively, and the tests do a poor job of that.

  • Another purpose of tests is to make yourself think. One way to do that is to force yourself to enumerate possibilities and ask “What should happen in this case?” That’s one of the reasons that I, when acting as a tester, will turn a state diagram into a state table. A state diagram doesn’t make it easy to see whether you’ve considered the effect of each possible event in each possible state; a state table does. (It’s not as simple as that, though: it’s hard to stay focused as you work through a lot of identical cases looking for the one that’s really different. It’s like the old joke that ends “1 is a prime, 3 is a prime, 5 is a prime, 7 is a prime, 9 is a prime…”)

    If you factor three distinct assertions into a single master assertion, it’s easy to overlook that the second shouldn’t apply in some particular case. When you factor three distinct setup steps into one method, you can more easily fail to ask what should happen when the second setup step is left out.

So as I balance the different forces, I find myself writing test code like this:

  # Guard against manufactured URLs.
  def test_cannot_update_a_user_unless_logged_in
    new_profile = NewProfile.for(’quentin‘).is_entirely_different_than(users(:quentin))

    put :update,
        {:id => users(:quentin).login, :user => new_profile.contents}
        # Nothing in session
    assert_redirected_to(home_path)
    assert_hackery_notice_delivered
    new_profile.assert_not_installed
  end

  def test_cannot_update_a_user_other_than_self
    new_profile = NewProfile.for(’quentin‘).is_entirely_different_than(users(:quentin))

    put :update,
        {:id => users(:quentin).login, :user => new_profile.contents},
        {:user => users(:aaron).id}
    assert_redirected_to(home_path)
    assert_hackery_notice_delivered
    new_profile.assert_not_installed
  end

There’s duplication there. In an earlier version, I’d in fact reflexively factored it out, but then decided to put it back. I think the tests are better for that, and I’m willing to take the maintenance hit.

Nevertheless, there’s a problem. It’s not obvious enough what’s different about the two tests. What to do about that?

Consider explaining the evolution of a program over time in a book. Authors don’t usually show a minimal difference between before and after versions. Instead, they show both versions with a fair amount of context, somehow highlighting the differences. (When I write, I tend to bold changed words.) I wish I could highlight what’s special about each test in my IDE, so that it would look like this:

  # Guard against manufactured URLs.
  def test_cannot_update_a_user_unless_logged_in
    new_profile = NewProfile.for(’quentin‘).is_entirely_different_than(users(:quentin))

    put :update,
        {:id => users(:quentin).login, :user => new_profile.contents}
        # Nothing in session
    assert_redirected_to(home_path)
    assert_hackery_notice_delivered
    new_profile.assert_not_installed
  end

  def test_cannot_update_a_user_other_than_self
    new_profile = NewProfile.for(’quentin‘).is_entirely_different_than(users(:quentin))

    put :update,
        {:id => users(:quentin).login, :user => new_profile.contents},
        {:user => users(:aaron).id}
    assert_redirected_to(home_path)
    assert_hackery_notice_delivered
    new_profile.assert_not_installed
  end

Something for IDE writers to implement.

More mock test samples

Here are two more examples of what might become my style for writing tests that use mocks. They add a “because” clause that separates what comes into the mock from what comes out of it.


  def test_checker_checks_again_after_grace_period_in_case_of_error 
    checker = testable_checker(:grace_period => 32.minutes) 
      
    during { 
      checker.check(”url”) 
    }.behold! { 
      @network.will_receive(:ok?).with(”url”) 
      and_then @timesink.will_receive(:sleep_for).with(32.minutes) 
      and_then @network.will_receive(:ok?).with(”url”) 
      but @listener.will_not_receive(:inform) 
    }.because { 
      @network.returned(false).from(:ok?) 
      but_then @network.returned(true).from(:ok?) 
    } 
  end

As before, the test first describes the activity of interest in a “during” clause. The point of the test is how the object poked in the “during” clause pokes the rest of the system. That’s in the “behold!” clause. The next question is the connection between the two: why did the object do what it did? There’s a good chance you know why from the name of the test and from reading previous tests (which I tend to arrange in a tutorial sequence). If not, you can dive into the “because” clause, which details how the rest of the system responded to the poking.


  def test_checker_finally_tells_the_listener 
    checker = testable_checker 
  
    during { 
      checker.check(”url”) 
    }.behold! { 
      @listener.will_receive(:inform).with(/url is down/) 
    }.because { 
      @network.returned(false).from(:ok?) 
      and_then @network.returned(false).from(:ok?) 
    } 
  end 

Talk you like mock

UPDATE: Judging from email, I wrote this badly. Ignore the bit about assertEquals. My point is that I want to write mock tests the way I talk, which is “when this method executes, it calls these other methods in such and so a way” rather than “expect these other methods to be called in such and so a way when this method executes”. Since, in my speech, I describe the action before its effects, I want mock tests to do that too. So I made up a notation and seek comments.

I think we all know that the order of arguments is wrong in JUnit’s assertEquals. If you talked like JUnit wants you to, you’d say things like “After the calculation I’ve just described, 23 is the resulting azimuth.”

It’s not so big a deal that your order of thinking about a truth statement doesn’t match the order in which you have to type it. Probably no more than one in a billion people is driven homicidally insane by the accumulated weight of such grating little annoyances. And there’s nothing we can do about it now.

Let’s move on to mocks. Here’s a test method (with some long lines edited to fit):

  def test_checker_normally_checks_network_once
    checker = testable_checker # creates checker
                               # using mocks.
    @network.should_receive(:ok?).once.
             with(”url”).
             and_return(true)
    checker.check(”url”)
  end

Reading that aloud would sound something like this:

“The network gets asked if some URL is OK once when…

… wait for it…

 

… wait for it…

 

… wait for it…

 

… the checker is invoked with that URL.”

This is, again, not the way people talk. You might not be able to do anything about the problem in Java, but in a Real Language (meaning: one with lambda), you can. So I’ve today started writing my mock tests like this:


  def test_checker_normally_checks_network_once
    checker = testable_checker
    during {
      checker.check(”url”)
    }.behold! {
      @network.should_receive(:ok?).once.
               with(”url”).
               and_return(true)
    }
  end

The “behold!” is a little hokey; I couldn’t think of a good connecting word. But I do think this reads better. Does it read better to you?

Notes:

  • I don’t use mocks that much. This style may not work in complicated situations.

  • Surely someone’s done this before? Who?

X-driven design

From Keith Braithwaite: a discussion of gauges as a metaphor:

http://peripateticaxiom.blogspot.com/2007_07_01_archive.html
http://peripateticaxiom.blogspot.com/2007/09/gauges.html

It’s nice.

I still like examples as a metaphor, but if it hasn’t caught on in four years and a month, it’s not gonna.

Graphical workflow tests for Rails - alpha version released

For many web apps, it’s important to get the user’s workflow right. Consider a Rails app that uses some variant of acts as authenticated to handle user registration and login. Here’s one way of describing an example of normal registration:

John visits the home page.
John follows the “sign up” link.
(John is now on the signup page.)
John signs up as “john”, email “john@example.com”, password “sloop”.
(John is now on the welcome page.)
John waits for activation email…
John clicks on a URL ending in an activation code.
(John is now on the login page.)
John logs in as “john”, password “sloop”.
(John is now on the member page.)

Here’s another:

A registration workflow

Which is better? If I were trying to design the workflow—get it so that it’s right for the users—I’d much rather use a picture (whether drawn in OmniGraffle, as this was, or on a napkin or whiteboard.) Why?

  1. It’s easier to “zoom out” from a picture, to ignore the details. When I do that, I’m more likely to notice missing workflows or stupidities in the ones that I have.

  2. As a designer, I’ll soon have to explain my thinking to others. It’s easier to explain a picture because it’s sitting right there in front of both of you, visible in two-dimensional space. It’s simple to refer to something you already mentioned: point at it. A wealth of context gets recalled by that simple gesture. If it’s not readily recalled, you can decorate the graphic with something that jogs the memory. Pictures make for a much more fluid way of communicating.

  3. Our minds are in bodies; and bodies have minds of their own. We think kinesthetically and visually, not (just) by banging propositions together according to the rules of some kind of logic. The more ways you think about something, the fewer mistakes you’ll make.

But there’s one big disadvantage of pictures from the point of view of the test-driven (behavior-driven, example-driven) programmer: they’re not executable.

Until now

I’ve released an alpha version of an open-source library that converts such pictures into Ruby tests. Below the fold, I show how the workflow picture becomes a Rails integration test.

✂——✂——✂——✂——✂——✂——✂——✂——✂——✂——
(more…)

A workbook for practicing test-driven design (draft)

Summary: I wrote a workbook to help people learn test-driven design. Want to try it out?
(more…)

Need C code for demo

I’m going to give a TDD demo at a C shop. I’d rather do it in C than in Java or Ruby. It’d be best to build on an existing suite. Do you have any well-unit-tested, open source C code to suggest? To match what they do, a library would be better than an app; data-heavy would be better than processing-heavy. Objective C would be OK too; I can just ignore the objective part. C++, too. Thanks. Add a comment or mail me.