Friday, March 26, 2010

How deep do you test?

I've been talking a lot recently about what the right level of testing is on my current project. Previously, on the Java projects I worked on, I was a full blown mocker, anything but the class I was testing would be likely be mocked and I would try to do pure "unit tests", testing only the code in one class at a time. Mockito was my friend and the world was good.

Now I'm working on a WebOS app so I'm back in the land of JavaScript and dynamic languages. I've never done any real JS testing before (or any dynamic language for that matter), so I'm trying to figure this all out for the first time.

The question of how deep to test keeps coming up because we're running into some some significant test complexity. We're not mocking many of our own classes, only the WebOS classes, so our tests are really halfway between integration tests and true unit tests.

This has its advantages, especially in a dynamic language where parameters are only checked at runtime. If we were doing pure unit tests and mocking the world away, we could be creating a fantasy where our class thinks it's running just fine, but isn't passing the right parameters or could possibly even be calling mocked functions that don't exist.

On the other hand, without mocking, our test complexity goes way up, especially since we're dealing with a lot of async behavior. Consider the following test:
it("should call the callback delegate", function() {
  var foo = new Foo();
  var myCallback = function() {};

  foo.doWork(myCallback);

  expect(myCallback).wasCalled();
});
Here we have a simple case where we are testing a doWork function on the Foo class and expecting that it calls a delegate at some point in the execution.

However, in reality, doWork might call some other class that relies on an XHR and asynchronously loading something from a datastore. Now our test looks something like:
it("should call the callback delegate", function() {
  var foo = new Foo();
  var myCallback = function() {};

  foo.doWork(myCallback);

  Tests.AJAX.Requests.fakeResponseFor(SOME_XHR_REQUEST); // Succeed pending XHR, return stub data
  Tests.Datastore.get.succeedAll(); // Succeed pending async datastore get requests
  Tests.Datastore.add.succeedAll(); // Succeed pending async datastore add requests

  expect(myCallback).wasCalled();
});
As you can see, the test complexity goes up very fast. Almost half the lines of code don't even have to do with the class we're testing!

Mocking could solve this problem much more cleanly by mocking whatever helper Foo uses to do the async logic:
it("should call the callback delegate", function() {
  var foo = new Foo();
  foo.helper = mock('HelperClass');
  var myCallback = function() {};

  foo.doWork(myCallback);

  expect(myCallback).wasCalled();
});
The argument against mocking in the tests above is that it could hide some poor design decisions, especially in our app when dealing with async behavior. For example, you might need a whenReady delegate to execute some code when async data is ready, and if you're mocking everything, that might not be apparent. However, the tradeoff is a lot of test code that detracts from what you are really trying to test.

The option of creating a stub class that automatically succeeds or fails all async behavior was brought up, but the legitimate concern was you could be hiding bigger problems. You could mistakenly try to code it to work synchronously and it would pass in the mocked world, but not in reality.

Also, being true to TDD, we could setup doWork to return a value instead of use a callback. This code would likely succeed with auto-succeed stub classes, but would fail completely when run in the real world:
it("should get a value from #doWork", function() {
  var foo = new Foo();
  var result = foo.doWork(); // doWork will fail in an asynchronous setup

  expect(result).equals('myValue');
});
I could rename this post to "How do you test async behavior?", but I think that my async example is just one of many things that mocks/stubs either help or hurt for the reasons above.

So, how deep do you test your code? Does the language affect your decision? Do you have any general rules of thumb for when to mock or when to do integration tests?

2 comments:

  1. (tl;dr: Go with your second or third example depending on how worried you are about having to update the test if the AJAX call changes.)

    This is interesting, and for me very timely. I'm coming around to the idea that it's almost always best to call web services asynchronously even when the service API gives you a choice, and it's better to let that async-ness propagate through the program instead of covering with a sync wrapper. The result is that I'm writing a lot of C# code that looks like asynchronous JavaScript.

    To your broader question, I prefer to test deeply except when I can point to a concrete problem with doing so. I don't have a whole lot of sympathy for the purity argument, especially if there isn't an excellent integration test suite. It's far, far better for a bug in module A to break tests for module B than for it to not break any tests at all. I tend to mock things in two situations:

    1. If I don't, the test can fail for reasons other than a bug.
    2. If I don't, the test will be unreasonably complex or hard to write.

    #1 means that I'll always mock web services and almost always mock things that do database access. Beyond that, I let #2 guide me. For instance, suppose that I'm testing A which calls B which calls C, and C talks directly to some external dependency like a web service. My first choice is to mock C, but if doing that means that the tests have to know a lot about the way B works then I might mock B instead. Or if B actually has a whole bunch of external dependencies, I'll mock B. In practice this means that I write a fair number of "pure" unit tests where everything else is mocked, but it's not my first choice.

    As for testing things that depend on async calls, I think the test should behave as similarly to the real world as possible even if that adds a bit of complexity. By that standard, I think the choice is between your second and third examples. The tradeoffs between the two are interesting. Simulating the AJAX response itself gives you better coverage, and I don't mind the little bit of extra complexity. I might mind having to update the test if the format of the AJAX request or response changes, though. That's a concrete problem that might lead me to test shallower.

    If you need multiple tests you could compromise by writing one test that verifies that async calls work and mocking the thing that does the async call, as in your third example, in any additional tests. That would reduce the maintenance burden while still giving good coverage.

    For environments like .NET where faking the request and response itself isn't so easy, I'd go with your third example every time.

    ReplyDelete
  2. @Steve I think you make a lot of good points. We ended up going with 2 for all of our tests, although arguably your suggestion that a few tests in style 2 and the rest in style 3 would have made things a bit simpler.

    I have a followup post about this, but the brief summary is that exposing that complexity ended up being a good thing after all.

    ReplyDelete