Exploration Through Example

Archive for the 'clojure' Category

TDD in Clojure, part 3 (one wafer-thin function; conclusions)

Posted in TDD, programming, clojure | Thursday, June 17th, 2010 | 5 Comments »

All that remains is to add the locations of bordering cells to a given sequence of locations. A small wrinkle is that a border location may be next to more than one of the input locations, so duplicates need to be prevented. I could write the test this way:

I don’t like that. When the test fails, it’ll likely be hard to discover how the expected and actual values differ. And I bet that a failure would more likely be due to a typo in the expected value than to an actual bug. The test is just awkward.

I like this version better:

It is a clearer statement of the relationship between two concepts: a location’s neighborhood and the border of a set of locations. More pragmatically, it’s less typing. (When I first started coding in this style, it surprised me how much test setup clutter went away. I had much less need for “factories” or “fixtures” or “object mothers“.)

Given either test, the code is straightforward:

Onward

No matter how satisfying the individually-tested pieces, the whole has to work, which is why everything ends by running at least one end-to-end test. A test like the one we started with:

Because I’m making up my test notation as I go, I’ve been running all the tests manually in the REPL. Now I can run the whole file:

If you look carefully, you’ll see that the test would fail because the locations are in the wrong order. But that’s a quick fix:

Done. Ship it!

Development order - Test-down or REPL-up?

It’s often said that the Lisps are bottom-up languages, that you test out expressions in the REPL, discover good functions, and compose them into programs. A lot of people do work that way. A lot of people who use TDD to write object-oriented code also work that way: when implementing a new feature, they start at low-level objects, add whatever new code the top-level feature seems to demand of them, then use those augmented objects in the testing of next-higher-level objects.

For I guess about a year now, I’ve been experimenting with being strictly top-down in some projects. I find that leads to less churn. Too often, when I go bottom-up, I end up discovering that those low-level changes are not in fact what the feature needs, so I have to revisit and redo what I did.

I get less disrupted by rework when I go from the top down (or from user interface in). It’s not that I don’t blunder—we saw one of those in the previous installment—but those blunders seem easier to recover from.

That’s not to say that I don’t use the REPL. We saw that a little bit in this program, when I was writing neighbors. It’s perfectly sensible to do even more in the REPL. I think of the REPL as a handy tool for what XP calls a spike solution and the Pragmatic Programmers call tracer bullets. When I’m uncertain what to do next, the REPL is a tool to let me try out possibilities. So I might be stuck in a certain function and go to the REPL to see how it feels to build up parts of what might lie under it. After I’m more confident, I can continue on with the original function, test-driven, reusing REPL snippets when they seem useful.

I don’t claim everyone should work that way. I do claim it’s a valid style that you’d be wise to try.

Notation

I’m something of an obsessive about test notation, and I’ve been endlessly fiddling with a Clojure mock notation. I implemented its first version as a facade on top of clojure.contrib.mock. As I experimented, though, I found that keeping my facade up to date with my notational variants slowed me down too much, so I put the code aside until the notation settled down.

I’m pretty happy with what you’ve seen here. Are you? If so, I may start on another mock package. I’ve got the most important parts: a name and sketch of a logo. (”Midje” and someone flying safely between the sun [of abstraction without examples] and the sea [of overwhelming detail].)

Tests and code together

You can see the completed program here. It mixes up tests and code. I’ve tried that on-and-off over the years and always reverted to separate test files. This time, it’s seemed to work better. I probably want an Emacs keystroke that lets me hide all tests, though. I’d also want alternate definitions of the test macros so that I can compile them out of the production system

What next?

I’ll write a web app in this style, using Compojure.

TDD in Clojure, part 2 (in which I recover fairly gracefully from a stupid decision)

Posted in TDD, programming, clojure | Wednesday, June 16th, 2010 | 3 Comments »

Part 1
Part 3

I ended Part 1 saying that my next step would be to implement a function that counts the number of living neighbors a cell has. Given that we’re already pretending (through stubbing) that a living? function exists, living-neighbor-count is pretty trivial if we also pretend we’ve got a neighbors function:

Following my “mapping, like accessors, is too simple to test” guideline, I almost didn’t write a test. But what the heck:

Once the test passes, we need to write neighbors. To implement it, we’re going to have to take cells apart (to get x and y coordinates) and put them back together (to create neighbors). So I don’t see any point to using stubs and dummy variables like ...cell... in this test:

Boldly, I will here use one test to define both cell-at and neighbors (as well as the test helper have-coordinates that checks a list of cells against a list of coordinates).

(If I were more sensitive to that small voice in my head that warns I’m going astray, I would have heard something around now, but I ignored it. So we will too.)

Enter the REPL

My thought about how to implement neighbors has three steps, so I’ll try them out in the REPL. First, I’ll make (x,y) pairs to add and subtract from the original cell’s coordinates:

That’s good, except (0, 0) shouldn’t be in there. (A cell can’t be its own neighbor.) So I need to delete that:

(remove #{[0 0]} product) is a Clojure idiom. remove returns its second (sequence) argument, omitting any element that the first argument (a function) returns truthy for. #{x} is the set containing x. In Clojure, sets act as functions that return something truthy iff their single argument is in the set. That is:

Finally, I need a function that shifts a cell by an offset. For the REPL, I’ll pretend the cell is just an [x y] vector. (We have yet to define what it really is.)

I can build neighbors from what I’ve tried out. To make the test pass, I’ll continue to use vectors for cells, hiding them behind a simple functional interface of cell-at, x, and y.

The concrete representation of the cell — and disaster

Here are the functions as yet undefined:

There’s no more escaping it. I’m going to have to decide what kind of thing border produces. That thing has to be a sequence for tick to map over:

border's result is also stored in world where living? will use it to decide whether a given cell is alive or dead.

My first thought was that I could use the set idiom I used above—the bordered world could just be the set of all living coordinates. Sneakily, any location not in the set would represent a dead cell. That would be great for implementing living?, but it wouldn’t work for tick, which has to process not only living cells, but also the dead cells that make up the border.

So my fallback was for border to produce a map, something like this:

Maps are sequences, so you can map over them. But I don’t think I’ve ever actually tried it. What happens?…

OH GREAT. If I go down this route, we’ll have three different ways of representing cells:

as the original location in inputs like *vertical-blinker*: [0 1]
as part of a living/dead map: {... [0 1] :dead ...}
as a living/dead vector: [ [0 1] :dead ]

That’s intolerable. And yes, I bet at least half of my two readers thought I was mistaken not to think about data structures at the very beginning. However, my strategy with Clojure TDD has been to put off thinking about data structure as long as I can, and I’ve been surprised and pleased by how often I ended up with simpler data than it seemed I would. I’ve found that, given the use of globally-available immutable “background” data, much of what might have been explicit data structure–vectors of maps of vectors of…–ends up in the implicit structure of the computation. More about that, though, will have to wait for another post.

A recovery plan

The problem is here:

When I wrote that, I remember that the still small voice of conscience objected to the way I was both stashing the bordered-world away as background and simultaneously picking it apart with map. That just felt weird, but I argued myself into thinking it was harmless. It was not.

Really, since my whole program takes input [x y] pairs (such as *vertical-blinker*) and turns them into a different set of [x y] pairs, most of my work ought to be done with those pairs. I should be thinking about locations of cells, not cells themselves. In that way of thinking, border shouldn’t produce “cells”. It should take locations of living cells and produce locations that point to both living cells and adjacent dead cells.

Further, I shouldn’t repeat those locations in a world function. Instead, I need something that can answer questions about cells, given their locations. It should be a… (I’m bad with names)… an oracle about cells. I first imagined this:

using-cell-oracles-from should produce any wise and oracular functions we need. So far, that’s just living?.

I realized something more. Locations are flowing into the pipeline, locations are flowing out, and in this version, locations won’t be transformed into cells anywhere within the pipeline. That makes unborder, which was originally supposed to convert a mixture of living and dead cells into only living locations, seem kind of stupid. If tick produces only living locations, unborder can go away. (The name unborder always bugged me, because it didn’t really describe what the function would have to do. Once again, I should have paid attention.)

That leads to this top-level function:

That wasn’t so bad…

As it turns out, changing my mind about such a fundamental decision was easy.

What did I have to do to the code? I had to write using-cell-oracles-from. Here’s a test.

I won’t show the code that passes this test—it’s a somewhat grotty macro (but a simple transformation of the earlier against-background). You can see it in the complete source for this post.

I did a quick global-replace of “cell” with “location” and tweaked a couple of the resulting names. Although both you and I know that locations are just pairs, I retained the functions make-location (formerly cell-at), x, and y to keep the code insulated from the potential of another change of mind.

I had to convert the successor function to dead-in-next-generation?. That was pretty simple. I had to change two lines in the test. Here’s one:

To make that test pass, I had to rewrite successor. It used to be this:

Now it’s this:

That was just a matter of inverting the logic and deleting killed and vivified. (Before I ever got around to writing them!)

The ease of this change makes me happy. Even though I blundered at the very beginning of my design, the way stub-heavy TDD lets me defer decisions—and forces me to encapsulate them so that I have something to stub—made the blunder a not-catastrophe. I wish I could say that I blundered deliberately to demonstrate that property of this style of TDD, but that would be a lie.

Enough for today

Only one function remains: add-border-to. That’ll be pretty easy, but this post is already too long. The next one will finish up the implementation and add whatever grand summary I can come up with.

TDD in Clojure: a sketch (part 1)

Posted in TDD, programming, clojure | Thursday, June 10th, 2010 | 6 Comments »

Part 2

I continue to use little experiments to help me think through TDD in Clojure. (I plan to begin a realistic experiment soon.) Right now, I’m mainly focused on three questions:

What would mocking or stubbing mean in a strict(ish) functional language?
What’d be a good mocking notation for Clojure?
How do you balance the outside-in style associated with mocks and the bottom-up style that the REPL (interpreter) encourages?

Here’s an example from Conway’s Game of Life. It begins with an implementation suggestion from Paul Blair and Michael Nicholaides at the Philly Code Retreat. Instead of thinking of the board as a 2×2 array of cells, with some of them dead and some alive, think instead only of living cells, each of which knows its coordinates. Here’s an example that shows how “blinkers” blink from generation to generation.

A couple of things have happened here:

This is my notation for a straightforward non-stubbing test. The value on the left is executed and it’s compared (for equality) to the value on the right.
I’ve started coding outside-in, and I’ve named the first function I need: next-world.

The Blair/Nicholaides approach advances the “world” to the next generation by (conceptually) adding dead cells around the edge of all the living cells, running the normal life rules that govern how cells change because of their neighbors, and then throwing away all the cells that end up dead. In other words:

The pending bit is just there because (sadly) Clojure makes you declare functions before mentioning them. pending just creates functions that print that they’ve not yet been implemented.
The rest of the code flows the world argument through a pipeline of three functions. If you’re not familiar with the -> macro, the result is the same as this:

I don’t feel the need to test this code now because it’s really declarative—it says what it means to produce a next world under this approach. (It will be tested in the very end by the “integration test” that shows a blinker working.)

I can now implement any of the three new functions. I’ll pick tick because it seems to be the heart of the matter. Here’s a first implementation:

There are two odd things going on here.

I think of the Clojure programs I’ve written this way: there’s a big stateful universe out there. At some moment, a snapshot of relevant bits of that universe is presented to my pure functional program. It churns away, presents the results back to the big stateful universe, and then disappears. Given that bias, I find it convenient to think of the universe-bits not as data to be passed around, but as the unchanging background behind my code’s calculations. Any of that code should be able to reach out and grab bits of the relevant background by calling dynamically bound functions of no arguments.

Because of against-background, any code operating inside successor (or inside functions it calls) can call (world) and magically get the original bordered-world. As you’ll see, this lets me put off committing to a representation for cells. (The representation I used for the blinkers isn’t enough after the border’s been added, since there’s now a need to distinguish between living and dead cells.)
This essay is supposed to be a sketch of TDD in Clojure, but here’s the second function I’ve written without tests. What’s up with that? I’ve tentatively concluded that testing functions that map one function over a sequence is the moral equivalent of testing getters and setters: not worth the trouble. But don’t fear, successor is worth testing, and we’ll do that next. Almost next.

First, stubbing function calls.

In object-oriented languages, I think of mock-driven-design as a way of teasing out collaborators for the object I’m building. I push responsibilities for work onto objects that I’ll implement later. Mocking lets me defer the implementation of those objects until I’m ready, and creating some examples of the API teaches me the (implicit) specification for the new object.

I’ve found that with pure functional programs that don’t modify state, it makes more sense to think of a function like (f 2) => 4 as a fact. What I’m doing as I test-drive a function is describing how facts about its inputs and outputs depend on other facts, in an almost Prolog-like way. For example, consider this code:

That says that, for any cell you care to provide, f of that cell will be 10, provided g of that cell is true and h is 2. If either of those latter two facts don’t apply to the cell, I’m not saying what f’s value is.

I use the funny ...cell... notation in the way that mathematicians use n to talk about any integer. (They call that universal quantification.) I don’t want to create a particular cell because I might need to specify properties that have nothing to do with the function I’m working on. This notation says that nothing about the cell is relevant except for what comes after the provided.

Here’s one way to write a Life rule in this notation:

The falsey bit in the first line is because Clojure has two distinct values that can mean “false”. falsey is a function that takes the result of the left-hand side and fails the test if that result is anything other than one of the two false values. I’m using it because I don’t want to overspecify living?. There’s no reason to care which of the two “false” values it returns.

There’s a problem with this test, though. Remember what I said above: the left-hand side gets evaluated and handed to falsey. That means living? has to have a definition—which means I’d have to settle on how the code knows whether a cell is alive or dead. I like doing one thing at a time and putting off decisions as long as I can, and right now I’d rather be focused on successor instead of cell representations.

Here’s a way to defer that decision:

Here I’m saying something subtly different than before. I’m saying that the result of successor is specifically that cell produced by calling killed on the original cell. The =means=> notation tells the framework to create a mock instead of evaluating the right-hand side for its value. In a more familiar mocking syntax (for Ruby), the whole test is equivalent to:

OK. The next figure gives the whole set of Life rules, expressed as executable tests. (Well, executable as soon as I implement the testing framework.) Notice that I called the outer wrapper know (a fact) instead of example. know seems more appropriate for rules. The two forms mean the same thing.

Notice also that I implemented a notation for saying “run this test for each value in a sequence”. The use of commas, as in [4,,,8], indicates that—conceptually—the fact is true for all values four through eight. Only the ones listed are actually tried. (Commas count as >white space in Clojure.)

This isn’t the tersest possible format—a table would be better—but it’ll do. I think it’s reasonably readable. Do you?

Here, for reference, is code that passes the test:

We now have an expanded choice of functions to write:

I could go breadth-first—with border and unborder—or go depth-first with one of the functions on the second line. In this particular case, I’d rather go depth first. I’ve avoided deciding on a representation, so I don’t know yet what border should do.

If this installment meets your approval, I’ll add another one that begins work on—oh—probably living-neighbor-count is the most complicated, so it’s a good one to chip away at.

Exploration Through Example

Search

About Brian Marick

RSS

Categories

Archives

Blogroll

Meta

Archive for the 'clojure' Category

TDD in Clojure, part 3 (one wafer-thin function; conclusions)

TDD in Clojure, part 2 (in which I recover fairly gracefully from a stupid decision)

TDD in Clojure: a sketch (part 1)