A Refactoring Micro-Example

 

Dave Parsons – Valtech

david.parsons@valtech.co.uk

www.dparsons.co.uk

 

This article has two key objectives. First, to provide a fast track ‘micro-example’ of refactoring and second, to reflect on some issues raised by that example. If you’ve never quite got around to reading Martin Fowler’s ‘Refactoring’ book, the first part of this article will give you a quick overview. If you are already familiar with refactoring, you might find some of the questions raised at the end worth pondering.

 

What is refactoring?

 

Refactoring is the process of improving the design of existing code without changing its observable behaviour. It is an ongoing process of continual cleanup that hopefully mitigates the effect of software entropy, the decay caused by retrofitting new requirements and bug fixes to an existing design. It is an approach to development promoted by (among many others) Martin Fowler, Kent Beck and Ward Cunningham, based on well-established principles of code quality. The seminal publication on refactoring is Martin Fowler’s ‘Refactoring – improving the design of existing code’, a book that has a catalogue of 80 refactorings. You will find more on the dedicated web site, www.refactoring.com.

 

One key metaphor in refactoring is that of replacing code that ‘smells’. This is typically code that is suffering from rot, and may include signs of decay such as

 

 

Most or all of these will probably be familiar to you as signs of code that is not ideal. The key thing about refactoring is that we often already know what we should do. As developers we recognise when we write code that smells - when we are hacking rather than engineering - but in some cases we feel powerless to remedy this. We may find that what we write today does not always fit well with what we wrote yesterday and we know we should restructure what we already have to provide an elegant solution, but…the temptation is to simply bolt on something ugly that works for the moment -we need it now!

 

What we should understand is that refactoring is not a waste of time. It does not add business value immediately, because the functionality of the code does not change, but it pays off later in terms of maintainability and understandability. However, it must be disciplined because changes to working code can introduce bugs. For refactoring to work, a test framework is essential. Every refactoring step must be tested before we move on to the next, and in a complex refactoring there may be many steps. We must also be careful to follow through the detail of each refactoring process – you can break things if you try to do too much at a time or ignore a crucial step in a particular refactoring process. In summary; work in small steps, recompile after every change, test after every change. To do this we need unit tests for every piece of code that we refactor. These need to be automatic, self-checking and run at every opportunity. To quote Martin Fowler, ‘If you want to refactor, the essential precondition is having solid tests.

 

Refactoring improves the design of software, makes it easier to understand, helps you find bugs and helps you program faster. How is it faster? Well, as Ron Jeffries says ‘The result of this is that the system is always as simple as we can make it, which means we can understand it better, which means we can change it more rapidly while keeping it reliable. Try it, you'll like it ...  ‘

 

When, then, should you refactor? A key indicator of smelly code is finding yourself writing something that looks remarkably like another piece of code you already wrote. Fowler’s book includes the ‘Rule of Three’ about duplicated code. This says that the first time you write something, you just do it. The second time you wince at the duplication, but do it anyway. The third time, you refactor. As well as refactoring when you spot duplicated code, you should also refactor when you add functionality, need to fix a bug or do a code review. That said, you don’t need a cue to refactor – you should do it all the time. There is one proviso, however, which is the ‘two hats’ rule. This addresses the important issue of separating your roles between developer and refactorer. When developing, you are adding functionality or fixing bugs. When you are refactoring you are reorganising what you already have. Don’t mix these activities. You have two different hats, developer and refactorer, and you can only wear one hat at a time. Since refactoring is a key component of extreme programming, you may be pair programming. Likewise you should do pair refactoring, always wearing matching hats.

 

The micro example

 

Having set the scene for refactoring, we now move on to the ‘refactoring micro-example’. This was originally developed in order to distil the example program from Fowler’s ‘Refactoring’ into a form suitable for slide presentation. To that end it covers pretty much the same refactoring ground that Fowler’s larger example covers, with one or two variations. Of course it is trivial, being so small, but condenses many refactorings into a few lines of code.

 

Here, then, is our (smelly) code to refactor, a method of some unspecified dice game class that ‘throws’ a couple of dice and returns the result. We assume here that ‘dice’ is an array of ‘Die’ objects that have access methods for a ‘faceValue’ property

 

public int getScore()

{

int result;

result = (int)(Math.random() * 6) + 1;

dice[0].setFaceValue(result);

result = (int)(Math.random() * 6) + 1;

dice[1].setFaceValue(result);

int score = dice[0].getFaceValue() +

                  dice[1].getFaceValue();

return score;

}

 

Please bear with me here. I know you would never write this sort of thing (as if), but it is an example designed to show many refactorings in a small piece of code. A number of different refactorings can be applied to begin to sort out this mess, but we must avoid the temptation to fix everything at once. We must be disciplined and apply (and test) one at a time. Before reading on, it would be helpful for our later discussion to take a few moments to note what you would change and, perhaps more importantly, what you would change first (and why).

 

Before we touch the code, we must make sure we have a unit test in place. In this case we need to be sure that the values returned by ‘getScore’ fall in the acceptable range. For two standard dice, this would be an integer in the range 2 – 12. The compiler will do the type checking for us, and a test framework such as JUnit can check the values:

 

      assertTrue(diceValue >= 2 && diceValue <=12);

 

We should also check that the numbers do change each time. We could use a JUnit decorator (junit.extensions.RepeatedTest) to do this by calling the method several times. This would need to be a visual check of console output. Actually we can see that bad code is hard to test. Since nothing is encapsulated we cannot control the dice numbers and test against expected values. One advantage of refactoring is that it leads to smaller, more focused methods that are easier to write tests for.

 

Now to apply some refactorings, one at a time. During this process we will apply a number of refactorings from the book, namely

 

 

Refactoring number 1 - Self Encapsulate Field

 

This refactoring tells us not to directly access an object’s fields within its methods, but to use accessor methods. This can be useful for polymorphism, lazy initialisation and other aspects of encapsulation. It is not a refactoring that is always seen as essential (or even desirable) for simple fields, but it can enable changes to representation without breaking client code in the same class. What if, for example, we replace the dice array with a Collection? We will break the code that directly accesses the array field:

 

dice[0].setFaceValue(result);

 

If we encapsulate the references to the dice, the array could be replaced by a Collection without breaking our code:

 

getDice(0).setFaceValue(result);

 

The modified method with an encapsulated (array) field looks like this (changes in bold type):

 

public int getScore()

{

int result;

result = (int)(Math.random() * 6) + 1;

getDice(0).setFaceValue(result);

result = (int)(Math.random() * 6) + 1;

getDice(1).setFaceValue(result);

int score = getDice(0).getFaceValue() +

getDice(1).getFaceValue();

return score;

}

 

Refactoring number 2 - Extract Method

 

In this refactoring, we split up a long method into smaller methods. A good indicator of the need for this refactoring is a comment that describes what is happening. Code that is obscure enough to require a comment can be replaced by a method with a meaningful name, e.g., we might replace this

 

      // roll the die

      result = (int)(Math.random() * 6) + 1;

 

With this

 

      result = rollDie();

 

This is the code after we have extracted the method

           

public int getScore()

{

int result;

result = rollDie();

getDice(0).setFaceValue(result);

result = rollDie();

getDice(1).setFaceValue(result);

int score = getDice(0).getFaceValue() +

                  getDice(1).getFaceValue();

return score;

}

 

public int rollDie()

{

            return (int)(Math.random() * 6) + 1;

}

 

Refactoring number 3 - Rename Class / Method / Field etc,

 

Changing the names in code (of classes, methods, variables etc.) to be more meaningful can make a positive contribution to code readability. This can depend on the context as well as local naming. For example our getScore method might cause confusion if there is also a method to get player scores. We might choose to rename our method throwDice to be more explicit about what it does:

 

      public int throwDice()

      {

 

This is a rather subjective refactoring because developers might disagree about what is meaningful. However, it is an important refactoring and one that is widely supported by development tools.

 

Refactoring number 4 - Replace Temp With Query

 

This refactoring encourages us to use methods directly in code rather then storing their results in temporary variables. This has a negligible effect on performance while improving readability and is a useful complement to extracting methods, so we often use these refactorings together.

 

If a temporary variable stores the result of an expression, we first extract the expression into a method, then replace all references to the temporary variable with calls to the method. This both removes the unnecessary temporary variable and gives us a reusable method. We can do this with the last part of our method, which uses a temporary score variable to store the result of adding the dice values together:

 

      int score = getDice(0).getFaceValue() +

                   getDice(1).getFaceValue();

      return score;

 

We first extract the method (in this example we will call it getDiceValue) and then replace the score variable with the method call.

 

The code after extracting the method and replacing the temp with a query looks like this

 

public int throwDice()

{

            int result;

            result = rollDie();

            getDice(0).setFaceValue(result);

            result = rollDie();

            getDice(1).setFaceValue(result);

            return getDiceValue();

}

 

public int rollDie()

{

            return (int)(Math.random() * 6) + 1;

}

 

int getDiceValue()

{

            return getDice(0).getFaceValue() +

                  getDice(1).getFaceValue();

}

 

Of course we could (should?) have done these refactorings one at a time, first replacing the temporary variable by returning the expression directly, then running our tests, then extracting the method before testing again.

 

Refactoring  number 5 - Move Method

 

This refactoring involves moving a method from one class to another, so can potentially be quite difficult because of the possible side effects. Some of your unit tests will also need to be changed to match the updated client interfaces. For this reason your test code needs to be under the same version control as your application code.

 

Sometimes we see that code is using information that comes largely from other classes. The code may be better moved to one or more of those other classes. As well as reducing coupling between objects, this can often overcome the ‘data object’ problem where objects do not have useful operations (the ‘value object’ pattern is an exception to this, but is a special case). In our example, the dice objects are data objects with accessors. It would be better to let the dice manage its own score by moving rollDie to the Die class, changing it to set the state of the Die and renaming it roll (the ‘Die’ part of the name seems rather redundant in the context of the Die class, so the ‘rename’ refactoring is worth doing here).

 

public void roll()

{

            setFaceValue((int)(Math.random() * 6) + 1);

}

 

The code after moving the method is beginning to look much cleaner. We no longer need the result variable or calls to the ‘set’ method, and the level of code duplication is significantly reduced.

 

public int throwDice()

{

getDice(0).roll();

getDice(1).roll();

return getDiceValue();

}

 

int getDiceValue()

{

return getDice(0).getFaceValue() +

                  getDice(1).getFaceValue();

}

 

Refactoring number 6 - Replace Conditional With Polymorphism

 

There is perhaps more that we could do here, but first we will focus our attention on the Die class, which now contains our ‘roll’ code. We will assume here that the Die class contains some kind of type switch that allows the die to be loaded (i.e. to ‘fix’ its possible values). This refactoring says that switch statements that depend on the type of an object should be replaced with class hierarchies and polymorphic methods. This enables client code to remain the same even if new types are added to the system. For our example, we will assume that a Die has a type field that is set to either NORMAL or LOADED:

           

      static final int NORMAL = 1;

      static final int LOADED = 2;

        private int type;

 

The roll method uses different algorithms for different types, selecting them by applying a switch statement based on the type field

 

public void roll()

{

switch(getType())

{

case NORMAL:

setFaceValue((int)(Math.random() * 6) + 1);

break;

case LOADED:

// random is a static java.util.Random object – easier

// to fix a range of integers (2 – 6) than Math.random

setFaceValue(random.nextInt(5) + 2);

break;

default:

setFaceValue(1);

}

}

 

To replace this conditional with polymorphism we need to introduce a polymorphic classification hierarchy, where the Die class is an abstract superclass of ‘NormalDie’ and ‘LoadedDie’. The polymorphic subclass methods now look like this

 

NormalDie implementation

 

public void roll()

{

            setFaceValue((int)(Math.random() * 6) + 1);

}

 

LoadedDie implementation

 

public void roll()

{

            setFaceValue(random.nextInt(5) + 2);

}

 

Refactoring number 7 – Replace Magic Number With Symbolic Constant

 

What further improvements might we make to this code? Well those hard coded values don’t look too good.

 

      public void roll()

      {

            setFaceValue(random.nextInt(5) + 2);

      }

 

We should replace these with constants to improve the readability of the code and to guarantee that changes to these values only need to be made in one place. The Die class might usefully declare these fields:

 

public static final int MAX_RANDOM = 6;

public static final int MIN_DICE_VALUE = 2;

 

Why 6 rather than 5? Because the value we actually want to express is the maximum value of the dice. Unfortunately the nextInt method doesn’t play ball here because it will include 0 as the lower bound of its range but not the upper bound we provide as a parameter, so we will have to adjust either the loading increment or the maximum value to get the result we want. Here’s a solution that adjusts the maximum value:

           

public void roll()

{

         setFaceValue(random.nextInt(MAX_RANDOM - 1) + MIN_DICE_VALUE);

}

 

Refactoring number n? – what else could we do?

 

The more we refactor, the clearer our architecture becomes, and the more its shortcomings are exposed.

 

How about revisiting this method?

 

public int throwDice()

      {

getDice(0).roll();

            getDice(1).roll();

            return getDiceValue();

      }

 

There is still something here that looks like unnecessary duplication. At first glance we might suggest a refactoring such as ‘replace indexed access with iteration’. After all, we can easily iterate over a collection of Die objects. You won’t find this in the book because actually it’s a red herring. The key question here is not how to get rid of the indexed access but the ask ‘do I really need it’? If we have a requirement for two dice, do we really need to be able to use a different number of dice? Or, should we just have two named Die objects (that will have compile time type checking – no danger of an invalid index here!):

 

getFirstDie().roll();

getSecondDie().roll();

 

Well, it’s not a very exciting discussion point with 2 dice, but it underlines how important architectural decisions are. After all, we could have refactored our array into a collection for nothing if we didn’t even need the functionality of an array (let alone that of a collection).

 

While we are looking a bit deeper at our design, what about that polymorphic hierarchy we built. Again, do we really need it? At first glance we have two different algorithms, implying the need for a polymorphic method, but this is really another red herring. We have two different ways of generating a range of random integers, both unencapsulated. The apparent requirement for polymorphism is mitigated by considering extracting the method, maybe to something like this:

 

int getRandomInt(int min, int max);

 

This has an interesting effect on how we perceive the code. As soon as we extract the method we have a single point for the algorithm. Instead of using two different ways of generating random integers, we have one point. Does this need to be polymorphic? Hopefully the answer is no – we probably used polymorphism out of laziness. The original ‘roll’ method looks like a case of writing the first algorithm quickly (Math.random out of the box) and perhaps thinking a little more careful about the second (the slightly more sophisticated Random class) but not rethinking the original algorithm. The comments give it away, likewise the inconsistent magic numbers. Thus another possible refactoring, ‘replace polymorphism with parameterised algorithm’. i.e. if you have two different algorithms that do something similar, consider if they can be combined into as single algorithm, encapsulated in a parameterised method. This is a very common concept in Java. Consider applet and servlet initialisation parameters, EJB environment variables, resource bundles etc. All of these are commonly used to parameterise the behaviour of a single piece of code.

 

Reflecting on the refactoring process

 

Having performed a number of refactorings on our code, we have hopefully achieved two things:

 

  1. Much cleaner and more readable (reusable, maintainable) code
  2. A deeper understanding of our architecture and the implications of the design decisions we have made

 

We can also see that although some refactorings are very simple, it is not always a simple task to refactor in the best possible way. There are a several questions that arise from this exercise:

 

  1. How ‘obvious’ does a refactoring need to be? Was your reaction to the original example method “Nobody writes code like that…”            ???

 

  1. Did we refactor in the ‘right’ order? Can refactorings be prioritised? Could we have made life easier by approaching the example in a different way? What would your first refactoring have been? What about the second? Do we start with the simplest and work towards the more complex, or begin by addressing core architectural issues? In our example, we could have started with a very simple Replace Magic Number With Symbolic Constant, or immediately addressed the issue of the indexed access to the array. Did we do the right thing by starting with ‘Self Encapsulate Field’?

 

  1. Are we adding or taking away? Refactorings are often reversible (e.g. ‘Replace Delegation with Inheritance’ versus ‘Replace Inheritance with Delegation’) because context and rationale is important. This means we may do a refactoring only to undo it again. Once more, perhaps the ordering of refactoring is important here.

 

  1. Many tools support some level of automated refactoring, including the SmallTalk refactoring browser, Bicycle Repair Man (for Python refactoring), IDEA, Eclipse etc., but there is little support for identifying the refactoring required. Some metric measures (such as those available in Together) can give us an indication of the ‘smells’ in code, but a more direct link between metric failure and suggested refactoring would be a useful tool. For example, applying ‘Move Method’ can mitigate high coupling.

 

Summary

 

In conclusion, refactoring speeds development and reduces code entropy by making code as simple as possible. When we find code that smells, we clean it up by refactoring in the context of rigorous testing. We do one refactoring at a time, even if we know there are several that need to be done, compiling and testing each step as we go. Fowler’s book is a key resource, but does not contain all possible refactorings, nor is it a simple task to apply every refactoring. There is increasing tool support for refactoring but there is still some way to go before tools can do all our refactoring for us.

 

References