Mocking and exploratory testing
Mock object seem to require more whole-product (or otherwise large-scale) automated tests. I’m hoping not, especially if manual exploratory has some tool support. My reasons are below the jump.
I’m writing some RubyCocoa code to drag-and-drop a filename into a table cell. Drag and drop has little complexities: the drop point is in the window’s coordinate system, so you have to transform it into the table’s, the information about the drop point is a dumb object that begs for Demeter violations, etc. So I’ve split the problem up into various objects.
Here’s a test that a table row is highlighted when the cursor is hovering over a cell that can be dropped on. The object under test is an PreferencesTableView
. It is a client of an info
object that appears to be something like the NSDraggingInfo
objects that come with Cocoa.
context “checking a location where a drop is possible“ do should “select the current row“ do info = rubycocoa_flexmock(”drop info“) during { @table.evaluate_location_smartly(info) }.behold! { info.should_receive(:drop_would_work?, 0).at_least.once. and_return(true) info.should_receive(:row, 0).at_least.once. and_return(1) } assert { @table.selectedRow == 1 } end
(The actual test is slightly different, but not in a way that matters.)
I don’t bother to implement and wire together all the objects that normally surround the PreferencesTableView
because that’s tedious and not relevant to this behavior. Similarly, I don’t construct a real NSDraggingInfo
object because I’d have to make an NSPasteboard
and put a pathname into it—not just a string, mind you, but a pathname to a real file because the pasteboard will check. Instead, I’ve just invented the need for an PrefsTableDropInfo
object that responds to the two methods drop_would_work?
and row
.
And here I am, starting to test drop_would_work?
into existence:
should “reject when pathname doesn’t name a Ruby file“ do @info.should_receive(:pathname, 0).once. and_return(”/path/to/foorb“.to_ns) deny { @info.drop_would_work? } end
… and so on.
So what do I now believe?
-
Assuming the helper class
PrefsTableDropInfo
works, and assuming thatPreferencesTableView
’sevaluate_location_smartly
uses it correctly,evaluate_location_smartly
highlights the row when it should. -
Assuming
PrefTableDropInfo
’s superclassCellOrientedDropInfo
(which definedpathname
) works, and assuming thatPrefsTableDropInfo
’sdrop_would_work?
uses it correctly,drop_would_work?
rejects non-Ruby files. -
Assuming…
There’s a chain of assumptions of the form “this class … if that one works” assumptions. That doesn’t worry me because I believe in inductive proofs. But there’s a chain of “this class… if it uses that class correctly” assumptions as well. Since there is no test in which a PreferencesTableView
actually uses a PrefsTableDropInfo
, there’s a weakness in the chain. After all, it’s hardly unknown for one side of an interface to be thinking, say, “metric units” while the other side is thinking “English units”. And there have been many bugs caused by one class failing to set up state that another class implicitly depends upon.
In RubyCocoa’s case, it’s possible for a Ruby method to return a true
that another Ruby method ends up seeing as a 1
, or for a client to be tested with a Ruby string but be called with a Cocoa NSString
in the real app.
What can be done about that? One answer is to have realistic tests that exercise the whole product or significant subsystems. Those tests will (you hope) exercise all the interfaces you mocked out. So mocking increases the need for business-facing-style automated tests, right?
Well, maybe. Many people, me among them, think it’s a good idea to demonstrate a feature to your client when you’ve finished it. Even if you correctly implemented exactly what she said she wanted, she’ll learn by seeing it in action. This manual demonstration is a good opportunity to do some exploratory testing, perhaps discovering that the feature behaves badly in cases no one anticipated.
Done well, this testing ought to exercise the interfaces as thoroughly as would automated tests (which, remember, are not magically endowed with an unerring ability to exercise even the most deeply-buried interfaces).
But what about repeatability? Yes, when you change the feature, you’ll exercise the change manually. But the entirety of the feature—and the unexercised-by-unit-tests interfaces—will almost certainly be exercised much less well than the first time, whereas whole-product automated tests will unfailingly do what they did before and therefore exercise the interfaces just as well as before. Right?
Actually, not right. Automated tests can decay. Even the most dutiful people will often update them only when they break (intentionally or no). And the changes to—especially—the unintentionally-broken tests are often minimalist—gotta get them working again. As a result, tests can drift from their purpose, no longer testing well what they were originally written for. And I suspect they get especially weak at testing what they weren’t written for, including those deep internal interfaces that weren’t specifically in mind when the tests were originally written.
Still, if I were forced to bet, I’d bet automated tests would do a better job of finding newly-induced interface bugs than manual exploratory testing would. But that just seems to me a reason to see what can be done to improve the exploratory testing.
Consider: what happens if I change the behavior of PrefsTableDropInfo
? That’s at least a semantic change to the PreferencesTableView <-> PrefsTableDropInfo
interface. That should direct me to concentrate some exploratory testing effort around what what the change was and how it could affect use of that particular view. In effect, knowledge of the change is a hint to improve exploratory testing. (It could also be a hint to improve automated tests, but exploratory testing is cheaper: the more change, the bigger the cost-of-change advantage of exploratory testing. Here, I’m keeping that advantage constant while chipping away at automated tests’ thoroughness advantage.)
There are two objections:
-
There’s no way the exploratory tester can know what a change to
PrefsTableDropInfo
could mean at the user interface; she could never realistically use “PrefsTableDropInfo
changed” information to guide her tests.I don’t think that’s so true any more. I am assuming the person with exploratory testing skills is working in the team bullpen with everyone else. A programmer and tester can collaborate on understanding the implications of an interface change.
-
A change to some deep-buried class would affect its clients and then their clients and then their clients… There would be too much to test. You might have to test practically everything—which means you’re back to needing automated tests.
I wonder. First, I wonder how far the effect of a change does spread. Does it fan out to all paths through the called/caller tree? Or does it follow just a few paths to the “surface” of the app? And, given that the whole point of exploratory testing (and testing in general) is skillfully winnowing down possibilities to the highest-value ones, will that skill apply?
You do have to know what interfaces changed and why. I doubt programmers would remember well enough. But the unit tests and the version control system could. I imagine some tool noting changes to tests and putting up the changes as a checklist to inspire an exploratory testing session or three.
Further research is needed.
January 15th, 2009 at 7:58 pm
On certain software projects I have found that using mocks often leads to many tests that are fragile because they are over specified or suffer from behavior sensitivity. This has lead me to want to design more decoupled software but it seems like overkill for something as simple as a CRUD web app. In those cases I tend to want to write more component style tests. What are your feelings about that?
January 16th, 2009 at 7:29 am
Well presented!
In the scenario you describe, my guess is that the typical automated acceptance test would be written in terms of the customer’s language and goals — for example, “When I drag and drop ‘Buy groceries’ into the ‘Today’s Tasks’ list, it should appear at the top of the list.” This ends up exercising not only the drag and drop code, but other parts of the system — there exists a “Today’s Task” list, I’ve created a task called “Buy groceries”, both are visible on the screen, I have permissions to update this list, etc. The tests will probably have many concerns in the air.
What would happen if the only automated “acceptance” test for this scenario would not be to prove that the customer’s story works, but that the framework behaves as expected? In other words, write a test to prove that a real PreferencesTableView and PrefsTableDropInfo object act as our mocks expect?
These tests would still have the same maintenance and set up problems (including using ugly real file paths, etc.) but there would be fewer. They’d be there primarily to demonstrate how the framework behaves, and help sanity check the framework when upgrading. In fact, you’d never expect them to break — I’m not sure if that’s a smell. When they did break, it could help target the exploratory tests needed.