How to write robust tests

Many unit tests are  brittle. As soon as the code is changed, the test breaks and has to be changed too. We don’t want that. We want robust tests.

A robust test is a test which does not have to change when the code it is testing is changed as long the change preserves the intended functionality.

A robust test does not break when the code is refactored. You don’t have to remove or change a robust unit test when you fix a bug. You just add a new one to cover the bug.

If you want to start writing more robust tests, here are a few things you can consider.

  • Test on a slightly higher level. Tests on a lower level often have to be removed or rewritten completely because there is much volatility in low-level class design. They require more significant changes when a large refactoring comes around, while higher-level classes tend to get by with smaller changes.
  • Choose which classes to test. Not every class needs its own test class. Especially, consider not writing separate tests for small private helper classes which are tightly coupled with a larger public class. If a certain class is very complex, selectively target that class with tests even though you don’t give its less complex sibling classes the same treatment.
  • Don’t fake or mock too much. Tests that fake or mock too much become less robust because they know too much about how the unit performs its work. If the unit finds another way to do the same work, the test will fail.
  • Focus on the important functionality. A robust test verifies functionality rather than implementation. It is focused on the parts of the unit’s interface which are truly important while it ignores the parts of the unit’s interface (or internals!) that should be allowed to change. Put differently, it knows the difference between “intentional” and “accidental” functionality.
  • Test in the language of the domain. By expressing your tests in the language of the domain, i.e. using concepts relevant to your business or application, you naturally create tests which depends on the wanted functionality, but not on too many implementation details.

Robust tests lead to “functionality unit” pattern

All of these guidelines together favor a certain type of design pattern. We can call it “functionality unit”. It means that any piece of (non-trivial) low level functionality is performed by a primary class, optionally supported by a few secondary helper classes. The primary class is often the only publicly visible one and acts as a façade for the functionality performed by the secondary classes. The tests focus their efforts on the primary class and seldom tests the helper classes individually, unless there is a special reason such as high algorithmic complexity. They are expressed in the language of the business functionality the primary class is supposed to perform.

Designing and testing in this way makes robust unit tests possible because it:

  • Focuses on a level low enough to unit test effectively while high enough to be reasonably stable.
  • Doesn’t require mocking since unit tests see the helper classes as internals of the primary class.
  • Focuses on functionality performed by the primary class rather than the secondary ones.
  • Creates tests which “make sense” because they are expressed in domain language.

Let us look at an example

In this example the functionality in question is to parse a certain type of document. We have a primary class Parser which is quite big. It has over 1000 lines of code and is rather hard to understand so we decide to split it up. The good part is that it is well unit tested with multiple test classes testing from different angles. To make the code clearer we figure out that extracting the two secondary classes Foo and Bar would be a good idea. It looks like this.

Depending on how you structure your test, they may be more or less robust.

Depending on how you structure your test, they may be more or less robust.

The question then becomes, what do we do with the tests?

First, we should note that the existing tests help us making the refactoring safely. They will (hopefully) break if we actually change the functionality of the Parser class. But what about after the refactoring? Should we keep the tests as they are or should we split them up into separate unit tests for each class? As always, the answer is “it depends”.

The alternative to the left represents keeping the tests more or less as they are. We save time by reusing existing tests. We test in the language of the domain. We avoid mocking because ParserTest doesn’t try to isolate Parser from Foo or Bar. To the right we have the other alternative where we rewrite most of the tests to test each individual class. This also has benefits. We follow the very straight-forward and intuitive pattern of having one test class per implementation class. Problems in the Foo or Bar classes might be even simpler to find with focused tests.

However, regarding robustness, we can ask one very important question. In which of the two alternatives would the tests survive a major implementation code refactoring? Say that we merge Bar back into Parser, or split Foo into Apple and Banana. Such a scenario would require much work with the tests in the right-hand alternative, while most likely none at all in the left-hand alternative. This is a major strength of the left-hand alternative, as well as the “functionality unit” pattern outlined above. By sometimes viewing a small group of highly related classes as a a unit rather than an individual class, we get more robust tests.

Don’t test private methods

A common question when it comes to unit testing is:

How do I test private methods?

There are in fact a number of possible ways to do this.

  • Create focused tests for public methods which are customized to exercise the private method we’re interested in, even though it is hard and creates difficult-to-understand tests.
  • Use a tool that allows you to test private methods, such as PowerMock or Java’s reflection API, even though your tests become tightly coupled to the implementation.
  • Increase the visibility to default or protected and call it from a normal unit test in the same package or through a subclass, even though you expose the class’ internal workings.

As you might imagine from the descriptions above, I believe that these strategies in most cases are wrong (even though they all work and could be useful once in a blue moon).

You get less maintainable code

Testing should generally test behavior rather than implementation – what the result is rather than how it is done. A private method is by definition an implementation detail. It should be up to the implementor to rearrange the internals of the class in any way she sees fit, including having as many or few private methods as she wants. Therefore, we should not have a test which looks into the implementation and makes the existence of a private method into a requirement.

Doing this not only violates the privacy of the object under test, it also couples the test more tightly with the implementation. This leads to more brittle tests and code which is harder to refactor. All in all, testing private methods is unnecessarily invasive and leads to less maintainable code.

It should be noted that “white box testing” (writing tests with the knowledge of how the code under tests works internally) does not mean that you must tightly couple your tests to the implementation. It just means that you can write clever tests which will precisely target critical points in the implementation code. You can (and should) still write your tests in terms of behavior.

An opportunity to improve the design

When you feel the need to test a private method, don’t ask “How do I test private methods?” Instead, ask “Why do I need to test this private method?” In many cases, wanting to test a private method indicates a design fault − a violation of the Single Responsibility Principle. The tests are often trying to tell you that the class under test is doing work enough for two. That the private method is complex enough to be worthy of a separate class.

The need to test a private method often indicates a new class waiting to get out.

My suggestion when you feel the need to test a private method is therefore to see if you can move the private method out of the current class in a way that not only makes it testable, but also improves the design.

The simplest way to do this is often to move the private method to a new class, along with any other private methods it uses, and make it public. Then make the original code use this new class to make the work of the private method. The image below illustrates such a case.

Extract a private method in need of testing to a separate class.

Extract a private method in need of testing into a separate class.

In some cases we don’t need to create a new class. Instead, we can sometimes make the private method into a public method on one of the classes it takes as arguments, especially if the method in question is static.

To summarize, if you have a hard time figuring out how to test some code (e.g. because it is private), it often means that the design is wrong. Fix the design issue rather than using brute force to test.

Don’t be afraid of long test names

When writing unit tests, I often write long test method names such as the following. The example is from a data access class.

aFooIsPersistedAsABar()

anExistingBazIsReusedWhenPersistingAFoo()

anExistingBarIsUpdatedWhenRecivingAnotherFooForSameQux()

anExistingBarIsRemovedWhenThereIsNoCorrespondingFoo()

To some people, method names as long as this may feel… wrong. Simply too long. I think this is taking a rule which is good in one context as an absolute law. Short method names are preferable in implementation code, because you typically make one or more calls to the method. If the method name is too long the calling code becomes awkwardly word-wrapped and hard to read.

This is not the situation with tests, however. You typically never make a call to one of the test methods manually – the testing framework does this for you.

Some people doesn’t even think very long method names are allowed. They can be as long as you want. (Almost.)

An identifier is an unlimited-length sequence of Java letters and Java digits, the first of which must be a Java letter.

Of course, for most rules there is an equally valid rule saying the opposite. An unnecessarily long name just gets hard to read for no benefit. You don’t want your test names to extend into short stories, and you most certainly want to stay within the allowed line length you are using. But if you are used to writing test method names that are 5-15 characters long, go ahead and be a bit extravagant. Explain to me what the test actually does. Tell me both the input and the output. Use as many letters as you need (but no more).

Don’t prefix your test names with “test”

Please don’t start every JUnit test method name with the word “test“.

@Test
public void testParsingAnEmptyStringShouldReturnAnEmptyDocument();

I know JUnit used to require this convention in order to figure out which methods to run. That was before JUnit4 came out… 6 years ago. There is no need to include it any more.

@Test
public void parsingAnEmptyStringShouldReturnAnEmptyDocument();

We know it is a test from the @Test annotation just before it. Having every test method start with “test” is about as useful as having every method start with “method“. If everyone has it, there is no need to have it at all. The same goes for other prefixes, such as “verify“.

The same goes for the setUp() and tearDown() methods of JUnit 3. Now, we use @Before and @After instead, so please tell me what the method is doing instead.

@Before
public void initializeObjectDatabase();

@After
public void releaseFileHandles();

How to unit test code calling a static method

Common questions from people trying to introduce unit tests in a legacy code base are

“How do I unit test code which calls a static method?”

or

“How can I mock a static method?”

The situation often is like the picture below. We have a class that we want to test. We try to isolate that class as much as possible, but fail to properly isolate it because it makes call to static methods which has some side effects in the system.

A test for a class which calls a static method.
A test for a class which calls a static method.

Adapting the test leads to poor tests

While there are mocking tools which may fake these static methods, I feel that is the wrong way to go. This technique makes the test harder coupled to the implementation, because a large part of the logic from the static method now gets copied into the test setup.

Another option is to initialize the data needed for that static method to run, removing the need to isolate the class under test from the static dependency. This often includes setting system properties, or in a HTTP-based system, putting things on the session or request. Doing this ranges from inconvenient to virtually impossible, depending on how complex setup is needed.

All in all, I don’t believe these situations are best solved by being clever in the test.

Try turning the static method into a non-static one

The simple truth is that using static methods is basically not compatible with unit testing – static methods produce code with low “testability”. Therefore, the “obvious” and often best solution is to turn the static method into a non-static one. Let the class under test create its own instance of it, or provide an instance as a dependency. How to do this depends much on what the method does. If it has no side effects this is often rather simple, and this may be the best way forward. In some cases, you can move the method to one of its parameters. However, many times this is very hard, especially when the method has important side effects.

Wrap the static method in a wrapper object

Often, these static methods are old, complex, used in many places, and because of their static nature affecting the whole system. Perhaps such a method is responsible for returning a singleton object, keeping track of whether we are in some certain mode or not, or similar.

In these cases, a way forward is to create an intermediate object, a wrapper. This object takes the place of the class with the static method in the class under test, and in its first implementation it simply delegates to the static method in question. This wrapper object can then be provided as a dependency to the class under test. In the unit test, we now have the power to create a faked version of the wrapper which we provide to the class under test. This allows us to stub, spy or mock as much as we want. A setup like this for the above example could look as follows.

A test calling a class which calls a static method through a wrapper class.
Using a wrapper class to separate the class under test from a static dependency.

When we have created the wrapper, we can deprecate the old method, encouraging everyone to use the wrapper instead. After a while, when all invocations of the static method has been changed to go through the wrapper, we can re-implement the wrapper to provide an own, real implementation. Then, finally, we can remove the static method.

Possible drawback

Perhaps the most obvious drawback is that if we have a large number of static methods, we tend to get a large number of wrapper classes. This should hopefully be a temporary state until all invocations have been updated, though in many systems I can see how this would become a rather permanent state.

Actually deprecating the static method in favor of the wrapper could at least help to alleviate this somewhat, to make other developers aware of the wrapper’s existence. Another way to reduce the problem is to let a single class act as wrapper for multiple static methods. This of course works best if the methods are logically related.

Tell me what to expect

Working with a large code base, you get to see many different styles of unit testing. One of the aspects that I find interesting is the naming of the test methods.

Below are some examples of styles that I often come across. For the example, I use a hypothetical parser which is given an empty string as input.

testParse()

testParse_EmptyString()

testParseEmptyString()

parseEmptyString()

While all of these have their pros and cons, my primary objection is one they all share – they don’t tell me what should happen! They are as informative as writing assertEquals(actual) rather than assertEquals(expected, actual). I want to see something along the lines of these.

parseEmptyStringShouldReturnEmptyDocument()

testParse_EmptyString_EmptyDocument()

parsingAnEmptyStringReturnsAnEmptyDocument()

These variants tell me not only what the stimulus is, but also the expected outcome. I want it both. Stimulus and response. Input and output. Initial state and resulting behavior.

Another way to see it is that the tests become a requirement specification for the implementation. One could actually write the corresponding implementation code based on the latter test names, try doing that from the first ones.

The Test-Driven Development Duel

What I call the Test-Driven Development Duel (or test duel for short) is a simple and fun way to use test-driven development and pair programming together. It is all about challenging each other to implement tests with as little code as possible. The rules are rather simple, but surprisingly productive. It can be especially fun if you are somewhat competitive as a person. 🙂

  1. Alice writes a failing test, and hands the keyboard over to Bob.
  2. Bob tries to make the test pass with as little code as possible.
  3. Bob writes a new failing tests, and hands the keyboard to Alice.
  4. Alice tries make the test pass with as little code as possible.
  5. Restart from step 1.

Graphical representation of the TDD Duel rules.

  • Refactor now and then, as necessary.

This is also a very good way to teach test-driven development.