FlexMock and RubyCocoa

I struggled using mocks in RubyCocoa. Here’s what I learned, using FlexMock.

I’m writing some tests to understand and explain the RubyCocoa interface to key-value observing. Key-value observing is yet another way for one object to observe another. (I’ll call these two objects the @watcher and the @observed.)

If I were describing a key-value observing test over the phone, I’d describe it as having four steps:

  1. Set up the relationship between the two objects.

  2. Change an attribute of @observed.

  3. During that change, expect some @watcher method to be called. One of the arguments will be an NSDictionary that contains the values before and after the change.

  4. Also expect that the value really truly has been changed.

Here’s how I’d describe it in code:

  def test_setting_in_ruby_style
    newval = 5
    during {
      @observed.value = newval
    }.behold! {
      this_change(:old => @observed.value, :new => newval)
    }
    assert_equal(newval, @observed.value)
  end

I have an idiosyncratic way of using mocks in Ruby. Notice the order in my English description above. I say what the test is doing before I say what happens during and after that action. But mock expectations have to be set up before the action. In Java (etc.) that requires steps 2 and 3 to be reversed. In Ruby, I can use blocks to put the steps in the more normal order. (You can see the definitions of during and behold! in the complete code.)

Since each of the tests sets up almost the same expectations, I hived that work off into this_change:

  def this_change(expected)
    @watcher.should_receive(:observeValueForKeyPath_ofObject_change_context).
             once.
             with(’value‘,
                  @observed,
                  on { | actuals |
                       actuals[:old] == expected[:old] &&
                       actuals[:new] == expected[:new]
                  },
                  nil)
  end

That’s a fairly typical use of FlexMock. I specify three arguments exactly, and I use code to check the remaining one. (There are components in the actuals NSDictionary that I don’t care about.)

Now I have to start to do something special. Here’s setup:

  def setup
    super
    @observed = Observed.alloc.init
    @watcher = flexmock(Watcher.alloc.init)
    @observed.addObserver_forKeyPath_options_context(
                @watcher, value‘,
                OSX::NSKeyValueObservingOptionNew | OSX::NSKeyValueObservingOptionOld,
                nil)
  end

Normally, an argument to flexmock just provides a name for the mock, useful in debugging. However, key-value observing only works when the @watcher object inherits from NSObject. So I pass in an already-initialized object for FlexMock to “edit” and add mockish behavior to. It’s of class Watcher, which inherits from NSObject. But why not just use an NSObject?

I could, except for a helpful feature of key-value observing. Before each call into the @watcher, key-value observing checks if it responds to the method observeValueForKeyPath_ofObject_change_context. If not, it raises an exception.

If I just passed in an NSObject and gave FlexMock an expectation that it should_receive an observeValueForKeyPath_ofObject_change_context message, FlexMock would just attach that expectation to a list of expectations. It would not create any method of that name; rather, method_missing traps such calls to the method and then checks off the appropriate expectation.

Therefore, I have to create a Watcher class for no reason other than to contain the method:

class Watcher <  OSX::NSObject
  def observeValueForKeyPath_ofObject_change_context(
             keyPath, object, change, context)
  end
end

The method doesn’t have to actually do anything; it’s never called. Instead, FlexMock redefines it to trampoline a call to it over to message_missing.

You can find the rest of the code here: mock-example.rb.

Hope this helps.

Everyone needs a cryptic slogan

I’ve been talking to Jason and Jonathan, the post-Agile guys, about post-Agile. I don’t get it, but that’s OK. What it made me realize is that a properly obscure name for the kind of software development I want to do is…

Retro-Futurist
Micro-Scale
Anarcho-Syndicalism

It is supposed to connote:

  • Back to giddy optimism!

  • Technology! Shiny!

  • Don’t fret about “Them”

  • Goals and preferences over rules

  • We’re all in this together

Perhaps there will be t-shirts and posters by Agile2008.

P.S. I use the term “anarcho-syndicalism” pretty loosely. Don’t forget the two lines of qualifiers.

Conference of the Association of Software Testing 2008

The third annual Conference of the Association of Software Testing (CAST) 2008 in Toronto, July 14-16. Early bird registration ends May 30. Here’s what Michael Bolton has written about it:

A colleague recently pointed out that an important mission of our community is to remind people–and ourselves–that testing doesn’t have to suck.

Well, neither do testing conferences. CAST 2008 is the kind of conference that I’ve always wanted to attend. The theme is “Beyond the Boundaries: Interdisciplinary Approaches to Software Testing”, and the program is incredibly eclectic and diverse. Start with the keynotes: Jerry Weinberg on Lessons from the Past to Carry into the Future; Cem Kaner on The Value of Checklists and the Danger of Scripts: What Legal Training Suggests for Testers; Robert Sabourin (with Anne Sabourin) on Applied Testing Lessons from Delivery Room Labor Triage (there’s a related article in this month’s Better Software magazine); and Brian Fisher on The New Science of Visual Analytics. Track sessions include talks relating testing to improv theatre (Adam White), to music (yours truly and Jonathan Kohl), to finance and accounting (Doug Hoffman), to wargaming and Darwinian evolution (Bart Brokeman, author of /Testing Embedded Software/ and one of the co-authors of the /TMap Next/ book); to civil engineering (Scott Barber), to scientific software (Diane Kelly and Rebecca Sanders), to magic (Jeremy Kominar), to file systems (Morven Gentleman), and to data warehousing (Steve Richardson and Adam Geras), and to data visualization (Martin Taylor)… to four-year-olds playing lacrosse (Adam Goucher). There will be lightning talks and a tester competition. Jerry Weinberg will be doing a one-day tutorial workshop, as will Hung Nguyen, Scott Barber, and Julian Harty.

Yet another feature of the conference is that Jerry is launching his book on testing, /Perfect Software and Other Testing Myths/. I read an early version of it, and I’m waiting for it with bated breath. It’s a book that we’ll all want to read, and after we’re done, we’ll want to hand to people who are customers of testing. For some, we’ll want to tie them to a chair and /read it to them/.

The conference hotel is inexpensive, the food in Toronto is great, the nightlife is wonderful, the music is excellent…

More Information
================
You can find details on the program at http://www.cast2008.org/Program.

You can find information on the venue and logistics at http://www.cast2008.org/Venue.

Those from outside Canada should look at http://www.associationforsoftwaretesting.org/drupal/CAST2008/Venue#customs.

You can get registration information at http://www.cast2008.org/Registration.

Paying the Way
==============

If you need help persuading your company to send you to the conference, check out this: http://www.ayeconference.com/Articles/Mycompanywontpay.html.

And if all that fails, you can likely write off the cost of the conference against your taxes, even if you’re an employee. (I am not a tax professional, but INC magazine reports that you can write off expenses to “maintain or improve skills required in your present employment”. Americans should see IRS Publication 970 (http://www.irs.gov/publications/p970/ch12.html), Section 12, and ask your accountant!)

Come Along and Spread The Word!

===============================

So (if necessary) get your passports in order, take advantage of early bird registration (if you register in the next two weeks), and come join us. In addition (and I’m asking a favour here), please please /please/ tell your colleagues, both in your company and outside, about CAST. We want to share some great ideas on testing and other disciplines, and we want to make this the best CAST ever. And the event will only be improved by your presence.

So again, please spread the word, and come if you can.

First review draft of RubyCocoa book

I’m ready for people to review the first draft of my RubyCocoa book. Here’s the goal of the book:

With any complex framework, there’s a moment when you realize you finally have a feel for it—when faced with a new problem, you’re able to guess whether the framework addresses it, roughly how it will address it, and where to look to find the details. The aim of this book is to give you that feel for Cocoa and RubyCocoa. […]

You can fetch the most recent draft from http://www.exampler.com/tmp/drafts/current.dmg. It’s 80-some pages and contains the first six chapters:

  • Introduction

  • How Do We Get This Thing Started?: A fast introduction to RubyCocoa.

  • Part 1: Building a Useful App

    • Working with Interface Builder and XCode: Drawing a user interface, putting code behind it, and debugging that code. Text fields and text views.

    • One Good App Observes Another: Extending the app to use interprocess communication to listen to another app. The Cocoa Notification system. Reopening Objective-C classes.

    • A Better User Interface: Buttons (and the default button), combo boxes (and first responders), and a highly decoupled app architecture.

    • Taking Advantage of Ruby: A DSL to make it easier to define actions, delegates, and methods that post or receive notifications.

Here’s what I’d like to hear about:

  • Any parts you have to struggle to understand.

  • Parts that move too slowly or too quickly.

  • Whether the book is taking the right approach, or started with the right approach and has veered off track.

  • Problems with the code samples. Don’t struggle to fix them—let me know and let me fix them.

  • Whether you’d buy the book if the rest is like what you see.

Don’t bother with typos, grammatical errors, or layout problems (like highlighted words that run together or long identifiers that stick past the margins - those will get fixed in due time).

You can send comments to me directly or via the book’s mailing list.

Thank you.

Education

When I was in Middle School (~12 years old, around 1971), we did a murder mystery exercise in class. The teacher passed out slips of papers with clues and then shut up. The children milled around aimlessly for a while, comparing slips. Finally, I got fed up, got everyone’s attention, and said, “OK. Everyone with clues about the murder weapon, go over there. Everyone with clues about the victim, over there.” Then I went from group to group, and we quickly solved the mystery.

I learned three things that day.

  1. Sometimes people need external organization to get things done.

  2. I can do that organizing.

  3. People will get mad at me when I do.

The lesson has affected my consulting. I suspect that my daughter’s homework today—identifying the parts of a knight’s armor—will not affect her future in the slightest way. Social studies no longer appears to involve the study of society. Not surprising, I suppose: too much risk of children drawing conclusions that will get their parents mad at the school, and who needs the hassle?

And the New Math rocked, by the way. It was good that I learned simple set theory so young, and the idea that there could be number bases other than ten was a good lesson in questioning the verities.

Security mindset

A continual debate on the agile-testing mailing list is to what degree testers think differently than programmers and are therefore able to find bugs that programmers won’t. Without ever really resolving that question, the conversation usually moves onto whether the mental differences are innate or learnable.

I myself have no fixed opinion on the matter. That’s probably because, while vast numbers of testers are better than I am, I can imagine myself being like them and thinking like them. The same is true for programmers. In contrast, I simply can’t imagine being the sort of person who really cares who wins the World Cup or whether gay people I don’t know get married. (I’m not saying, “I couldn’t lower myself to their level” or anything stupid like that—I’m saying I can’t imagine what it would be like. It feels like trying to imagining what it is like to be a bat.)

However, I’ve long thought that security testers are a different breed, though I can’t articulate any way that they’re different in kind rather than degree. It’s just that the really good ones are transcendentally awesome at seeing how two disparate facts about a system can be combined and exploited. (A favorite example)

Bruce Schneier has an essay on security testers that I found interesting, though it doesn’t resolve any of my questions. Perhaps that’s because he said something I’ve been thinking for a while:

The designers are so busy making these systems work that they don’t stop to notice how they might fail or be made to fail, and then how those failures might be exploited. Teaching designers a security mindset will go a long way toward making future technological systems more secure.

The first sentence seems to make the second false. When I look back at the bugs I, acting as a programmer, fail to prevent and then fail to catch, an awful lot of the time their root cause wasn’t my knowledge. It’s that I have a compulsive personality and also habitually overcommit. As a result, there’s a lot of pressure to get done. The problem isn’t that I can’t flip into an adequate tester mindset, it’s that I don’t step back and take the time.

So, I suspect the interminable and seemingly irresolvable agile-testing debate should be shelved until we solve a more pressing problem: few teams have the discipline to adopt a sustainable pace, so few teams are even in a position to know if programmers could do as well as dedicated testers.

Good customer test story

From Andy Pols: the customer who wouldn’t deploy without a test (via Keith Braithwaite).

Agile Alliance academic research programme

This program aims to encourage researchers to focus on research questions and issues concerned with agile software development. Researchers are encouraged to apply for small grants to support activities such as conducting a series of visits to practitioner sites, performing interviews, supporting a researcher for a short time to extend existing work into agile development, running workshops, and so on.

More here.

The next deadline for submissions is May 31.

An alternative to business-facing TDD

The value of programmer TDD is well established. It’s natural to extrapolate that practice to business-facing tests, hoping to obtain similar value. We’ve been banging away at that for years, and the results disappoint me. Perhaps it would be better to invest heavily in unprecedented amounts of built-in support for manual exploratory testing.

In 1998, I wrote a paper, “When should a test be automated?“, that sketched some economics behind automation. Crucially, I took the value of a test to be the bugs it found, rather than (as was common at the time) how many times it could be run in the time needed to step through it manually.

My conclusions looked roughly like the following:

test tradeoffs in general

Scripted tests, be they automated or manual, are expensive to create (first column). Manual scripts are cheaper, but they still require someone to write steps down carefully, and they likely require polishing before they can truly be followed by someone else. (Note: height of bars not based on actual data.)

In the second column, I assume that a particular set of steps has roughly the same chance of finding a bug whether executed manually or by a computer, and whether the steps were planned or chosen on the fly. (I say “roughly” because computers don’t get bored and miss bugs, but they also don’t notice bugs they weren’t instructed to find.)

Therefore, if the immediate value of a test is all that matters, exploratory manual testing is the right choice. What about long-term value?

Assume that exploratory tests are never intentionally repeated. Both their long-term cost and value are zero. Both kinds of scripted tests have quite substantial maintenance costs (especially in that era, when testing was typically done through an unmodified GUI). So, to pull ahead of exploratory tests in the long term, scripted tests must have substantial bug-finding power. Many people at that time observed that, in fact, most tests either found a bug the first time they were run or never found a bug at all. You were more likely to fix a test because of an intentional GUI change than to fix the code because the test found a bug.

So the answer to “when should a test be automated?” was “not very often”.

Programmer TDD changes the balance in two ways:

Test tradeoffs for TDD

  1. New sources of value are added. Extremely rapid feedback reduces the cost of debugging. (Most bugs strike while what you did to create them is fresh in your mind.) Many people find the steady pace of TDD allows them to go faster, and that the incremental growth of the code-under-test makes for easier design. And, most importantly as it turns out, the need to make tests run fast and reduce maintenance cost leads to designs with good properties like low coupling and high cohesion. (That is, properties that previously were considered good in the long term—but were routinely violated for short-term gain—now had powerful short-term benefits.)

  2. Good design and better programmer tools dramatically lowered the long-term cost of tests.

So, much to my surprise, the balance tipped in favor of automation—for programmer tests. It’s not surprising that many people, including me, hoped the balance could also tip for business-facing tests. Here are some of the hoped-for benefits:

  • Tests might clarify communication and avoid some cases where the business asks for something, the team thinks they’ve delivered it, and the business says “that’s not what I wanted.”

  • They might sharpen design thinking. The discipline of putting generalizations into concrete examples often does.

  • Programmers have learned that TDD supports iterative design of interfaces and behavior. Since whole products are also made of interfaces and behavior, they might also benefit from designers who react to partially-finished products rather than having to get it right up front.

  • Because businesses have learned to mistrust teams who show no visible progress for eight months (at which point, they ask for a slip), they might like to see evidence of continuous progress in the form of passing tests.

  • People often need documentation. Documentation is often improved by examples. Executable tests are examples. Tests as executable documentation might get two benefits for less than their separate costs.

  • And, oh yeah, tests could find regression bugs.

So a number of people launched off to explore this approach, most notably with Fit. But Fit hasn’t lived up to our hopes, I think. The things that particularly bother me about it are:

  • It works well for business logic that’s naturally tabular. But tables have proven awkward for other kinds of tests.

  • In part, the awkwardness is because there are no decent HTML table editors. That inhibits experimentation: if you don’t get a table format right the first time, you’re tempted to just leave it.

    Note: I haven’t tried ZiBreve. By now, I should have. I do include Word, Excel, and their OpenOffice equivalents among the ranks of the not-decent, at least if you want executable documentation. (I’ve never tried treating .doc files as the real tests that are “compiled” into HTML before they’re executed.)

  • Fit is not integrated into programmer editors the way xUnit is. For example, you can’t jump from a column name to the Java method that defines it. Partly for this reason, programmers tend to get impatient with people who invent new table formats—can’t they just get along with the old one?

With my graphical tests, I took aim at those sources of friction. If I have a workflow test, I can express it as boxes and arrows:

a workflow test

I translate the graphical documents into ordinary xUnit tests so that I can use my familiar tools while coding. The graphical editor is pretty decent, so I can readily change tests when I get better ideas. (There are occasional quirks where test content has changed more than it looks like it has. That aspect of using Fit hasn’t gone away entirely.)

I’ve been using these tests, most recently on wevouchfor.org—and they don’t wow me. Sad While I almost always use programmer TDD when coding (and often regret skipping it when I don’t), TDD with these kinds of tests is a chore. It doesn’t feel like enough of the potential value gets realized for the tests to be worth the cost.

  • Writing the executable test doesn’t help clarify or communicate design. Let me be careful here. I’m a big fan of sketching things out on whiteboards or paper:

    A whiteboard

    That does clarify thinking and improve communication. But the subsequent typing of the examples into the computer is work that rarely leads to any more design benefits.

  • Passing tests do continuously show progress to the business, but… Suppose you demonstrate each completed story anyway, at an end-of-iteration demo or (my preference) as soon as it’s finished. Given that, does seeing more tests pass every day really help?

  • Tests do serve as documentation (at least when someone takes the time to surround them with explanatory text, and if the form and content of the test aren’t distorted to cram a new idea into existing test formats).

  • The word I’m hearing is that these tests are finding bugs more often than I expected. I want to dig into that more: if they’re the sort of “I changed this thing over here and broke that supposedly unrelated thing over there” bugs that whole-product regression tests are traditionally supposed to find, that alone may justify the expense of test automation—unless I can find a way to blame it on inadequate unit tests or a need to rejigger the app.

  • (This is the one that made me say “Eureka!”) Tests alone fail at iterative product design in an interesting way. Whenever I’ve made significant progress implementing the next chunk of workflow or other GUI-visible change, I just naturally check what I’ve done through the GUI. Why? This checking makes new bugs (ones the automated tests don’t check for) leap out at me. They also sometimes make me slap my forehead and say, “What I intended here was stupid!”

But if I’m going to be looking at the page for both bugs and to change my intentions, I’m really edging into exploratory testing. Hmm… What if an app did whatever it could to aid exploratory testing? I don’t mean traditional testability features like, say, a scripting interface; I mean a concerted effort to let exploratory testers peek and poke at anything they want within the app. (That may not be different than my old motto “No bug should be hard to find the second time,” but it feels different.)

So, although features of Rails like not having to restart the server after most code changes are nice, I want more. Here’s an example.

The following page contains a bug:

an ordinary web page

Although you can’t see it, the bottom two links are wrong. They are links to /certifications/4 instead of /promised_certifications/4.

  1. Unit tests couldn’t catch that bug. (The two methods that create those types of links are tested and correct; I just used the wrong one.)

  2. One test of the action that created the page could have caught the bug, but did not. (To avoid maintenance problems, that test checked the minimum needed to convince me that the correct “certifications” had been displayed. I assumed that if they were displayed at all, the unit tests meant they were displayed correctly. That was actually almost right—every character outside the link’s href value was correct.)

  3. I missed the bug when I checked the page. (I suspect that I did click one of the links, but didn’t notice it went to the wrong place. If so, I bet I missed the wrongness because I didn’t have enough variety in the test data I set up—ironic, because I’ve been harping on the importance of “irrelevant” variety since 1994.)

  4. A user had no trouble finding the bug when he tried to edit one of his promised certifications and found himself with a form for someone else’s already-accepted certification. (Had he submitted the form, it would have been rejected, but still.)

That’s my bug: a small error in a big pile of HTML the app fired and forgot.
Suppose, though, that the app created and retained an object representing the page. Suppose further that an exploration support app let you switch to another view of that object/page, one that highlights link structure and downplays text:

The same page, highlighting link hrefs

To the eyes of someone who just added promised certifications to that page, the wrong link targets ought to jump out.

There’s more that I’d like, though. The program knows more about those links than it included in the HTTP Response body. Specifically, it knows they link to a certain kind of object: a PromisedCertification. I should be able to get a view of that object (without committing to following the link). I should be able to get it in both HTML form and in some raw format. (And if the link-to-be-displayed were an object in its own right, I would have had a place to put my method, and I wouldn’t have used the wrong one. Testability changes often feed into error prevention.)

And so on… It’s easy enough for me to come up with a list of ways I’d like the app to speak of its internal workings. So what I’m thinking of doing is grabbing some web framework, doing what’s required to make it explorable, using it to build an app, and also building an exploration assistant in RubyCocoa (allowing me to kill another bird with this stone).

To be explicit, here’s my hypothesis:

An application built with programmer TDD, whiteboard-style and example-heavy business-facing design, exploratory testing of its visible workings, and some small set of automated whole-system sanity tests will be cheaper to develop and no worse in quality than one that differs in having minimal exploratory testing, done through the GUI, plus a full set of business-facing TDD tests derived from the example-heavy design.

We shall see, I hope.

Google talk references

One thing I meant to say and forgot: Just as the evolution of amphibians didn’t mean that all the fish disappeared, the creation of a new kind of testing to fit a new niche doesn’t mean existing kinds are now obsolete.

Context-driven testing:

Testing Computer Software, Kaner, Falk, and Nguyen
Lessons Learned in Software Testing, Kaner, Bach, and Pettichord
http://www.context-driven-testing.com
“When Should a Test Be Automated?”, Marick

Exploratory testing:

James Bach
Michael Bolton
Elisabeth Hendrickson
Jonathan Kohl

Left out:

The undescribed fourth age