Archive for the 'programming' Category

Changing basic assumptions in apps

One of the hardest things for me as an app developer is changing basic assumptions in a safe, gradual way. Here’s an example.

Critter4Us is an app that reserves animals for demonstrations and student practice in a veterinary college. There are business rules like “an animal can be used to practice injections no more than twice a week”.

In its original form, Critter4Us only made reservations for one point in time (the morning, afternoon, or evening of a particular day). I’m now making it take reservations for “timeslices”, where a timeslice has a start date, an end date, and any subset of {morning, afternoon, evening}. Doesn’t seem like a huge change, but it turns out to be pretty fundamental:

  • A lot of questions vaguely like “is this point within this range?” are now “do these two ranges overlap?” There’s an existing Timeslice object (which, despite the name, was about points in time). It has some time-related behavior, but responsibility for some other behavior leaked out of it because the data was so simple.

  • Some of the database operations were annoyingly slow after being hard to get right, quite likely because my SQL-fu is weak. Since the questions are becoming more complex, I want to do less calculating when questions are asked and more stashing partial answers in the database. So adding this feature requires more database work than just a simple schema migration.

    (This feels like premature optimization of a feature I don’t know wouldn’t be fast enough, but my main motivation is that I’m more confident of getting the right answers if I stash partial answers. This is the most important business logic in the app.)

  • The UI needs to change. Should that be done by upgrading the existing reserve-a-point-in-time page (adding clutter for a case that’s used seldom) or adding a new page?

I want to make all these changes in such a way that (1) the tests are all passing most of the time and (2) the app is deployable most of the time. I devised the following strategy after writing a partial spike and throwing it away:

  1. DONE Change the current Cappuccino front end. It used to deliver

    ?date=2010-09-02&time=morning
    

    to the Sinatra backend. Now it delivers

    timeslice={'firstDate':'2010-09-02', 'lastDate':'2010-09-02', 'times':['morning']}
    

    This isn’t hard because the front end doesn’t do any calculations on the date.

  2. DONE Have the backend controller that receives the data quickly convert the new format into the old.

  3. DONE Move conversion of new-format-to-old into the Timeslice object. (Only two classes make Timeslice objects, so that’s easy.)

  4. DONE Change the Reservations table to add new columns. Change the Reservation object to allow it to be constructed using either the old or new format. Change all the non-test code to use the new constructor. (It’s convenient to keep the terser form around for the tests that use it.)

  5. IN PROGRESS
    Reservation and Timeslice aren’t completely dumb objects — they probably get told more than they get asked — but they do have accessors for date and time. Those accessors are still meaningful because (at this point) the first_date and last_date are always the same and the set of times can only ever contain one element.

    However, change their names to faked_date_TODO_replace_me and faked_time_TODO_replace_me. Run the tests. For each no-such-method failure,

    • If the purpose of the call can easily be expressed in terms of the new interface, rewrite the call to use it.

    • If not, use the convoluted, soon-to-be-replaced name.

  6. Replace the old code that answers the question “for which procedures may this animal be used at this moment?” with code that assumes cached partial answers. Mock out the cached partial answers and thereby design the partial answer table.

  7. Add the cached table to the database. Change the code that creates reservations to cache partial answers.

  8. Working from the controllers down, examine each method that uses faked_date_TODO_replace_me and faked_time_TODO_replace_me. What does the method do? What makes sense in a world where reservations and uses of animals are not for a single point in time? Where does the method really belong? Fix them.

  9. Now that all uses are fixed, delete the methods with silly names.

  10. Generalize the question-answering code to answer questions about timeslices more complicated than single points. Much careful testing here.

  11. Change the UI to collect more complicated timeslices.

It’ll be interesting to see what the final structure of the code looks like. A lot of this code was written early in the project, so I’m sure it’ll improve a lot.

If you want to see the code, it’s on GitHub.

A parable about mocking frameworks

Somewhere around 1985, I introduced Ralph Johnson to a bigwig in the Motorola software research division. Object-oriented programming was around the beginning of its first hype phase, Smalltalk was the canonical example, and Ralph was heavily into Smalltalk, so I expected a good meeting.

The bigwig started by explaining how a team of his had done object-oriented programming 20 years before in assembly language. I slid under the table in shame. Now, it’s certainly technically possible that they’d implemented polymorphic function calls based on a class tag–after all, that’s what compilers do. Still, the setup required to do that was surely far greater than the burden Smalltalk and its environment put on the programmer. I immediately thought that the difference in the flexibility and ease that Smalltalk and its environment brought to OO programming made the two programming experiences completely incommensurable. (The later discussion confirmed that snap impression.)

I suspect the same is true of mocking frameworks. When you have to write test doubles by hand, doing so is an impediment to the steady cadence of TDD. When you write a statement in a mocking framework’s pseudo-language, doing so is part of the cadence. I bet the difference in experience turns into a difference in design, just as Smalltalk designs were different from even the most object-oriented assembler designs (though I expect not to the same extent).

Mocks, the removal of test detail, and dynamically-typed languages

Simplify, simplify, simplify!
Henry David Thoreau

(A billboard I saw once.)

Part 1: Mocking as a way of removing words

One of the benefits of mocks is that tests don’t have to build up complicated object structures that have nothing essential to do with the purpose of a test. For example, I have an entry point to a webapp that looks like this:

get /json/animals_that_can_be_taken_out_of_service‘, :date => 2009-01-01

It is to return a JSON version of something like this:

{ unused animals => [’jake‘] }

Jake can be taken out of service on Jan 1, 2009 because he is not reserved for that day or any following day.

In typical object-oriented fashion, the controller doesn’t do much except ask something else to do something. The code will look something like this:

  get /json/animals_that_can_be_taken_out_of_service do
    # Tell the “timeslice” we are concerned with the date given.

    # Ask the timeslice: What animals can be reserved on/after that date?
    # (That excludes the animals already taken out of service.) 

    # Those animals fall into two categories:
    # - some have reservations after the timeslice date. 
    # - some do not.
    # Ask the timeslice to create the two categories.

    # Return the list of animals without reservations. 
    # Those are the ones that can be taken out of service as of the given date. 
  end

If I were testing this without mocks, I’d be obliged to arrange things so that there would be examples of each of the categories. Here’s the creation of a minimal such structure:

  jake = Animal.random(:name => jake‘)
  brooke = Animal.random(:name => brooke‘)
  Reservation.random(:date => Date.new(2009, 1, 1)) do
    use brooke
    use Procedure.random
  end

The random methods save a good deal of setup by defaulting unmentioned parameters and by hiding the fact that Reservations have_many Groups, Groups have_many Uses, and each Use has an Animal and a Procedure. But they still distract the eye with irrelevant information. For example, the controller method we’re writing really cares nothing for the existence of Reservations or Procedures–but the test has to mention them. That sort of thing makes tests harder to read and more fragile.

In constrast to this style of TDD, mocking lets the test ignore everything that the code can. Here’s a mock test for this controller method:

    should return a list of animals with no pending reservations do
      brooke = Animal.random(:name => brooke‘)
      jake = Animal.random(:name => jake‘)

      during {
        get /json/animals_that_can_be_taken_out_of_service‘, :date => 2009-01-01
      }.behold! {
        @timeslice.should_receive(:move_to).once.with(Date.new(2009,1,1))
        @timeslice.should_receive(:animals_that_can_be_reserved).once.
                   and_return([brooke, jake])
        @timeslice.should_receive(:hashes_from_animals_to_pending_dates).once.
                   with([brooke, jake]).
                   and_return([{brooke => [Date.new(2009,1,1), Date.new(2010,1,1)]},
                               {jake => []}])
      }
      assert_json_response
      assert_jsonification_of(’unused animals => [’jake‘])
    end

There are no Reservations and no Procedures and no code-discussions of irrelevant connections amongst objects. The test is more terse and–I think–more understandable (once you understand my weird conventions and allow for my inability to choose good method names). That’s an advantage of mocks.

Part 2: Dynamic languages let you remove even more irrelevant detail

But I’m starting to think we can actually go a little further in languages like Ruby and Objective-J. I’ll use different code to show that.

When the client side of this app receives the list of animals that can be removed from service, it uses that to populate the GUI. The user chooses some animals and clicks a button. Various code ensues. Eventually, a PersistentStore object spawns off a Future that asynchronously sends a POST request and deals with the response. It does that by coordinating with two objects: one that knows about converting from the lingo of the program (model objects and so forth) into HTTP/JSON, and a FutureMaker that makes an appropriate future. The real code and its test are written in Objective-J, but here’s a version in Ruby:

should coordinate taking animals out of service do
  during {
    @sut.remove_from_service(”some animals“, an effective date“)
  }.behold! {
    @http_maker.should_receive(:take_animals_out_of_service_route).at_least.once.
                and_return: some route
    @http_maker.should_receive(:POST_content_from).once.
                with(:date => an effective date‘,
                     :animals => some animals“).
                and_return(’post content‘)
    @future_maker.should_receive(:spawn_POST).once.
                  with(’some route‘, post content‘)
  }
end

I’ve done something sneaky here. In real life, remove_from_service will take actual Animal objects. In Objective-J, they’d be created like this:

  betsy = [[Animal alloc] initWithName: betsy kind: cow‘];

But facts about Animals–that, say, they have names and kinds–are irrelevant to the purpose of this method. All it does is hand an incoming list of them to a converter method. So–in such a case–why not use strings that describe the arguments instead of the arguments themselves?

    @sut.remove_from_service(”some animals“, an effective date“)

In Java, type safety rarely lets you do that, but why let the legacy of Java affect us in languages like Ruby?

Now, I’m not sure how often these descriptive arguments are a good idea. One could argue that integration errors are a danger with mocks anyway, and that not using real examples of what flows between objects only increases that danger. Or that the increase in clarity for some is outweighed by a decrease for others: if you don’t understand what’s meant by the strings, there’s nothing (like looking at how test data was constructed) to help you. I haven’t found either of those to be a problem yet, but it is my own code after all.

(I will note that I do add some type hints. For example, I’m increasingly likely to write this:

    @sut.remove_from_service([”some animals“], an effective date“)

I’ve put “some animals” in brackets to emphasize that the argument is an array.)

If you’ve done something similar to this, let’s talk about it at a conference sometime. In the next few months, I’ll be at Speakerconf, the Scandinavian Developer Conference, Philly Emerging Tech, an Agile Day in Costa Rica, and possibly Scottish Ruby Conference.

Some preliminary thoughts on end-to-end testing in Growing Object-Oriented Software

I’ve been working through Growing Object-Oriented Software (henceforth #goos), translating it into Ruby. An annoyingly high percentage of my time has been spent messing with the end-to-end tests. Part of that is due to a cavalcade of incompatibilities that made me fake out an XMPP server within the same process as the app-under-test (named the Auction Sniper), the Swing GUI thread, and the GUI scraper. Threading hell.

But part of it is not. Part of it is because end-to-end tests just are awkward and fragile (which #goos is careful to point out). If such tests are worth it, it’s because some combination of these sources of value outweighs their cost:

  • They help clarify everyone’s understanding of the problem to be solved.

  • Trying to make the tests run fast, be less fragile, be easier to debug in the case of failure, etc. makes the app’s overall design better.

  • They detect incorrect changes (that is, changes in behavior that were not intended, as distinct from ones you did intend that will require the test to be changed to make it an example of the newly-correct behavior).

  • They provide a cadence to the programming, helping to break it up into nicely-sized chunks.

In working through #goos so far (chapter 16), the end-to-end tests have not found any bugs, so zero value there. I realized last night, though, that what most bugged me about them is that they made my programming “ragged”–that is, I kept microtesting away, changing classes, being happy, but when I popped up to run the end-to-end test I was working on, it or another one would break in a way that did not feel helpful. (However, I should note that it’s a different thing to try to mimic someone else’s solution than to conjure up your own, so some of the jerkiness is just inherent to learning from a book.)

I think part of the problem is the style of the tests. Here’s one of them, written with Cucumber:

   Scenario: Sniper makes a higher bid, but loses
       Given the sniper has joined an ongoing auction
       When the auction reports another bidder has bid 1000 (and that the next increment is 98)
       Then the sniper shows that it's bidding 1098 to top the previous price
           And the auction receives a bid of 1098 from the sniper

       When the auction closes
       Then the sniper shows that it's lost the auction

This test describes all the outwardly-observable behavior of the Sniper over time. Most importantly, at each point, it talks about two interfaces: the XMPP interface and the GUI. During coding, I found that context switching unsettling (possibly because I have an uncommonly bad short- and medium-term memory for a programmer). Worse, I don’t believe this style of test really helps to clarify the problem to be solved. There are two issues: what the Sniper does (bid in an auction) and what it shows (information about the known state of the auction). They can be talked about separately.

What the Sniper does is most clearly described by a state diagram (as on p. 85) or state table. A state diagram may not be the right thing to show a non-technical product owner, but the idea of the “state of the auction” is not conceptually very foreign (indeed, the imaginary product owner has asked for it to be shown in the user interface). So we could write something like this on a blackboard:

Just as in #goos, this is enough to get us started. We have an example of a single state transition, so let’s implement it! The blackboard text can be written down in whatever test format suits your fancy: Fit table, Cucumber text, programming language text, etc.

Where do we stand?

At this point, the single Cucumber test I showed above is breaking into at least three tests: the one on the blackboard, a similar one for the BIDDING to LOSING transition, and something as yet undescribed for the GUI. Two advantages to that: first, a correct change to the code should only break one of the tests. That breakage can’t be harder to figure out than breaking the single, more complicated test. Second, and maybe it’s just me, but I feel better getting triumphantly to the end of a medium-sized test than I do getting partway through a bigger end-to-end one.

The test on the blackboard is still a business-facing test; it’s written in the language of the business, not the language of the implementation, and it’s talking about the application, not pieces of it.

Here’s one implementation of the blackboard test. I’ve written it in my normal Ruby microtesting style because that shows more of the mechanism.

context pending state do

  setup do
    start_app_at(AuctionSnapshot.new(:state => PENDING))
  end

  should respond to a new price by counter-bidding the minimum amount do
    during {
      @app.receive_auction_event(AuctionEvent.price(:price => 1000,
                                                    :increment => 98,
                                                    :bidder => someone else“))
    }.behold! {
      @transport_translator.should_receive(:bid).once.with(1098)
      @anyone_who_cares.should_receive_notification(STATE_CHANGE).at_least.once.
                        with(AuctionSnapshot.new(:state => BIDDING,
                                                 :last_price => 1000,
                                                 :last_bid => 1098))
    }
  end
end

Here’s a picture of that test in action. It is not end-to-end because it doesn’t test the translation to-and-from XMPP.

In order to check that the Sniper has the right internal representation of what’s going on in the auction, I have it fling out (via the Observer or Publish/Subscribe pattern) information about that. That would seem to be an encapsulation violation, but this is only the information that we’ve determined (at the blackboard, above) to be relevant in/to the world outside the app. So it’s not like exposing whether internal data is stored in a dictionary, list, array, or tree.

At this point, I’d build the code that passed this test and others like it in the normal #goos outside-in style. Then I’d microtest the translation layer into existence. And then I’d do an end-to-end test, but I’d do it manually. (Gasp!) That would involve building much the same fake auction server as in #goos, but with some sort of rudimentary user interface that’d let me send appropriately formatted XMPP to the Sniper. (Over the course of the project, this would grow into a more capable tool for manual exploratory testing.)

So the test would mean starting the XMPP server, starting the fake auction and having it log into the server, starting the Sniper, checking that the fake auction got a JOIN request, and sending back a PRICE event. This is just to see the individual pieces fitting together. Specifically:

  • Can the translation layer receive real XMPP messages?
  • Does it hand the Sniper what it expects?
  • Does the outgoing translation layer/object really translate into XMPP?

The final question–is the XMPP message’s payload in the right format for the auction server?–can’t really be tested until we have a real auction server to hook up to. As discussed in #goos, those servers aren’t readily available, which is why the book uses fake ones. So, in a real sense, my strategy is the same as #goos’s: test as end-to-end as you reasonably can and plug in fakes for the ends (or middle pieces) that are too hard to reach. We just have a different interpretation of “reasonably can” and “too hard to reach”.

Having done that for the first test, would I do it again for the BIDDING to LOSING transition test? Well, yeah, probably, just to see a two-step transition. But by the time I finished all the transitions, I suspect code to pass the next transition test would be so unlikely to affect integration of interfaces that I wouldn’t bother.

Moreover, having finished the Nth transition test, I would only exercise what I’d changed. I would not (not, not, not!) run all the previous tests as if I were a slow and error-prone automated test suite. (Most likely, though, I’d try to vary my manual test, making it different from both the transition test that prompted the code changes and from previous manual tests. Adding easy variety to tests can both help you stumble across bugs and–more importantly–make you realize new things about the problem you’re trying to solve and the product you’re trying to build.)

What about real automated end-to-end tests?

I’d let reality (like the reality of missed bugs or tests hard to do manually) force me to add end-to-end tests of the #goos sort, but I would probably never have anywhere near the number of end-to-end scenario/workflow tests that #goos recommends (as of chapter 16). While I think workflows are a nice way of fleshing out a story or feature, a good way to start generating tests, and a dandy conversation tool, none of those things require automation.

I could do any number of my state-transition tests, making the Sniper ever more competent at dealing with auctions, but I’d probably get to the GUI at about the same time as #goos.

What do we know of the GUI? We know it has to faithfully display the externally-relevant known state of the auction. That is, it has to subscribe to what the Sniper already publishes. I imagine I’d have the same microtests and implementation as #goos (except for having the Swing TableModel subscribe instead of being called directly).

Having developed the TableModel to match my tests, I’d still have to check whether it matches the real Swing implementation. I’d do that manually until I was dragged kicking and screaming into using some GUI scraping tool to automate it.

How do I feel?

Nervous. #goos has not changed my opinion about end-to-end tests. But its authors are smarter and more experienced than I am. So why do they love–or at least accept–end-to-end tests while I fear and avoid them?

Programming Cocoa with Ruby now shipping

You can get it from the publisher, from Amazon, from Powell’s, from O’Reilly

The publisher, by the way, says that probably the best marketing for the book is getting reviews up on Amazon.

NSF workshop: ideas wanted

I got invited to a National Science Foundation workshop whose…

goal broadly speaking will be to reflect on software
engineering research and practice of the past, to determine some areas
in which “rethinking” is needed, and to engage in some rethinking.

I have a position (strongly held, of course). But when I look at the list of attendees and their affiliations, I worry that I may be the only person there representing, well, you: the kind of person who reads this blog or (especially) follows me on twitter. So I’d like to hear what you think needs rethinking. (Please include a tiny bit about what kind of person you are. Something like “rails developer in boutique shop”.) You can reply here, via email, or by responding to my tweet on the subject.

My basic position goes something like this:

Twenty years ago, the self-conception of the software engineering researcher went something like this, I think:

I am a scientist, though partly of the applied sort, discovering important principles which engineers will then put into practice when building systems. I aspire to be something like the Charles Darwin of The Origin

I think the world has moved on from that. What was once an intellectual structure that could be considered built has since exploded into something more like a genuine ecosystem. There are vast numbers of people, communicating in all kinds of strange ways, assembling practically innumerable pieces of software in scrapheap ways, collaborating—quite often across organizational boundaries—in a way more reminiscent of a rhizome than a command-and-control tree.

The appropriate response to that is to go back in time before The Origin to the Charles Darwin who spent eight years studying barnacles, grasping their differences, and publishing the definitive work on them. There is a lot to study in the software development ecosystem (which, note especially, will require collaboration with researchers who know more about people and groups than context-free grammars).

Time to switch from Alan Kay’s glamorous “the way to predict the future is to create it” to something more in keeping with Kuhn’s normal science: the world is out there; mine it for deep knowledge and extensive data (against, I hope, the background of the “paradigms” used by practitioners).

InfoQ interview up

There’s an interview with me at InfoQ. It covers micro-scale retro-futurist anarcho-syndicalism and my hypothesis that we could chisel value away from automating business-facing examples and add it to cheaper activities.

Mocking and exploratory testing

Mock object seem to require more whole-product (or otherwise large-scale) automated tests. I’m hoping not, especially if manual exploratory has some tool support. My reasons are below the jump.

(more…)

Screen pairing

Corey Haines is making a tour of the US Midwest, pairing with people. I was happy to host him for the past two days. He introduced me to a style of pairing I’d never done before. What you see on the right is how we worked the whole time, each with our laptop, sharing the screen (via iChat).

I found myself preferring this style to pairing side-by-side in front of a screen. I even prefer it to sitting in front of two side-by-side screens with two keyboards. The work is more free-flowing and conversational. It’s easier to note the other person’s body language. It’s easier to stop coding, look up, and talk to each other. I found the switching between people more fluid, with fewer episodes where we were both going for the cursor at the same time.

On the downside: it may be a little harder to make sure that one one of you is filling the “navigator” role—stepping back and keeping track of the big picture. But I find that’s hard to ensure in any case.

I recommend you give it a try.

(The chair on the left, by the way, is where I do most all of my work. I recommend big bay windows with smallish panes for all kinds of hands-at-keyboard work, especially if you live on a nice old brick street with big trees.)

Software Craftsmanship mini-conference (London, Feb 26, 2009)

I’m on the review committee of Jason Gorman’s Software Craftsmanship mini-conference. If I can overlap it with a trip to France, I’ll definitely be there. Here’s the blurb:

This is a conference about the “hard skills” that programmers and teams require to deliver high quality working software.

From writing effective unit tests to managing dependencies, and from writing reliable multi-threaded code to building robust and dependable service-oriented architectures.

This conference is all about the principles and practices, and the disciplines and habits, that distinguish the best 10% of software professionals from the 90% who are failing their customers and failing their profession by taking no care or pride in their work and delivering buggy, unreliable and unmaintainable code.

This conference aims to showcase and champion the “hard skills”, and champion the idea of software craftsmanship and the ways in which it can be encouraged and supported in the workplace, in schools and colleges, and among the wider software development community.

This is a conference about building it right.