Functions Should Be Short And Sweet, But Why?

In my early days of programming, I would write functions good and long. Screen after screen of complicated logic and control structures would fill my coding projects with only a scant few mega functions dropped in to break things up a bit. It was years before things improved and I learned the value of splitting up those long stretches of coding logic into more manageable pieces. As I continue to improve as a programmer, I find that the functions I write keep getting shorter, more focused, and more numerous, and I keep seeing more benefits to this style of programming.

At first, the main barrier to small functions was likely the difficulty in learning to program in general. (By 'functions' I'm referring to all manner of parametrized blocks of code: functions and routines in procedural languages, methods in object-oriented languages, and lambdas and procedures in functional languages.) Functions are a hard concept to understand when they're first introduced, and to all appearances, it looks easier to write everything out in one long stream of logic. Getting the program to do what you want at all is hard enough, so why complicate things by having the execution path jump in and out of all kinds of functions as well? And once all of those chains of if-else statements and nested for loops are working, the last thing you want to do is change anything that could break it again.

Likely the most obvious value-add for functions becomes apparent when you start repeating code. Instead of copy-pasting those blocks of code, you can parametrize them into functions and reuse one block of code many times. The long road to the understanding of DRY has begun. Yet, you may resist breaking up your code into many more smaller functions because of the friction involved in doing so. After all, the code is working, and it seems like an awful lot of effort to refactor it all into functions of ten lines or less. Screen-length functions should be good enough, right?

Wrong. There are so many more benefits to writing short and sweet functions, but before getting into those benefits, let's look at an example that we can refer to.

An Example Command Interface


I've been writing a lot of embedded code in C/C++ over the last couple years, so examples in that vein come easily for me right now. For this example imagine you're implementing a command interface between a host processor and a slave processor that are connected over a serial communication protocol, like SPI or UART. A command packet will consist of a command, any associated parameters, and an optional block of data. The slave will receive command packets, process them, and reply with a response packet, or a simple acknowledgement if no response data is needed by the host. We'll keep it really simple and not allow the slave to initiate any commands.

When I first started programming, I might have implemented the slave's command processing as one massive function, and maybe broken off some repeated code blocks into functions after it was working. But that method of coding imposes a huge amount of cognitive overhead. It would be much simpler to start out with something like this:

void IntHandler() { 
  HostCommand hostCmd = HOST_CMD_READ; 
  IntDisable(INT_SPI0);
  SPIDataGet(SPI0_BASE, (unsigned long *)&hostCmd); 
  HandleHostCommand(hostCmd);
  IntEnable(INT_SPI0);
}

This is the interrupt handler for when a SPI peripheral receives data over the serial interface and raises an interrupt in the slave processor. The function simply sets a default command, disables future interrupts, reads the command from the SPI peripheral, handles the command, and re-enables interrupts. The bulk of the work is done in the HandleHostCommand(hostCmd) function call, which looks like this:

void HandleHostCommand(HostCommand hostCmd) {
  switch (hostCmd)
  { 
  case HOST_CMD_NOP:
    HandleNop();
    break;
  case HOST_CMD_READ:
    HandleRead();
    break;
  case HOST_CMD_SET_IP_ADDR:
    HandleSetIP(); 
    break;
  case HOST_CMD_SET_CHANNEL_CONFIG: 
    HandleSetChannelConfig();
    break;
  /*
   * Many more commands and function calls.
   */
  default:
    HandleError();
    break;
  }
}

Again, this is a fairly straightforward function that focuses on one task - routing each incoming command to the function that will handle it. The implementation of each HandleXXX() function is equally simple. Here's the one for a NOP command:

void HandleNop() {
  SPIDataPut(SPI0_BASE, HOST_REPLY_POSACK);
}

This function is especially simple because it implements the no-operation command, which just echos a positive acknowledgement back to the host processor. Other command handling functions will be more complicated, but not by much. I could go on with more functions, but I think you get the idea. So other than making the functions dead simple, what are the advantages to coding this way?

Easier To Write


One of the incredible things about programming this way is that as you are writing, drilling down into each of these simple functions, you quickly realize where you can reuse one small function in another place. Even though it might seem like coding is a little slower to begin with, once you get a few levels deep in a complex task, all of a sudden you find that you're not writing as much logic. You're mostly calling functions you've already written, and the logic that you do still need to write is very straightforward.

Coding lots of smaller functions allows you to concentrate more easily on the task at hand instead of holding hundreds or thousands of lines of code in your head, and you can make good progress much more quickly and correctly. It also frees up mental resources for solving the more complicated problems in programming, which is what you really want to be spending your mental energy on.

Easier To Read


Besides making code easier to write, lots of short functions makes code much easier to read. That may be counter-intuitive because you would think having so many functions with only a handful of code in them would force you to jump around the code a lot more, searching for the implementation of this or that function. Most of the time that's not the case. More often than not, you're looking for a specific block of code to find and fix a bug or add a feature. Finding that code block is actually easier when you don't have to sift through hundreds of lines of irrelevant code in the same function, trying to decide if any of it influences the bug you're tracking down or if it it will break when you make a change. Once you find the right code, it's already isolated in a nice self-contained package that's easy to understand and change.

Reading and understanding code is also aided by the focused nature of small functions. In the command interface above, the code in each function exists firmly within one layer of abstraction. Low-level peripheral interface code is not mixed with higher level decision or control structures, which in turn is not mixed with book-keeping code. It's the Single Responsibility principle applied to functions. The purpose of each function is immediately apparent, and that helps considerably when you have to come back to it in six months (or five years) to make a change. Not to mention the fact that small functions are much more portable, making changes even easier.

Easier To Test


Long functions are notoriously hard to test. Making sure all code paths are covered and the code is functionally correct becomes a nightmare. If you add in interactions with hardware peripherals, like the example above, things get even more difficult because simulating what the embedded hardware does for offline testing is a real challenge. But testing the code above is quite easy. The functions that interface directly with the hardware can be mocked without the need to maintain complicated state for long, involved tests because the functions that use the hardware interface are short and simple.

Coming up with tests for each function is also straightforward because none of them do very much. For example, the tests for HandleHostCommand() just need to pass in each command and a few invalid commands and make sure the correct function is called each time. Nothing could be simpler.

Easier To Name Variables


One of the hardest problems in programming is naming variables, but when you drastically cut down the size of functions, naming becomes much easier because the amount of context that must be held in the variable name is smaller. If done right, variable names should get shorter as functions get shorter while their meaning becomes easier to understand and remember.

Imagine the complexity of the variable names that would be required if all of the command processing code was written in HandleHostCommand(). Command-specific variable names would quickly get out of hand, and there would be the unhealthy urge to reuse variables for multiple purposes. Both of those problems are neatly avoided by splitting the command processing into individual functions.

Easier To Self-Document


In Code Complete, Steve McConnell describes the Pseudocode Programming Process, where pseudocode is used to refine a routine until it reaches a point where it is easier to write the code than to write more pseudocode. Then the pseudocode is converted to comments and the code is written below each comment. Considering what I think of comments, McConnell didn't go far enough with his recommended process.

Each comment ends up redundantly describing the following code, making the comments essentially useless. Instead of writing code for each comment, coming up with a good function name and putting the code in the resulting functions would be a better approach. Then the code becomes more self-documenting, and the comments can either be removed to de-clutter the code or used as the function's documentation if necessary.

When code is structured with lots of small functions, the function names become a great way to describe what the code is doing directly in the code. Learning Ruby and Rails is making this advantage even more apparent to me. The omission of most parentheses in Ruby makes the code so easily readable that small functions are much more common and comments are used far less frequently. Most of that advantage can be carried over to other languages, and at least in the example above, comments wouldn't add much value to the code while increasing its verbosity drastically. That's not to say that comments should never be used, but their necessity is drastically reduced.

Make Your Life Easier


Of all of the advantages of writing small functions, the most profound is probably how easy it makes testing. Without short functions, testing can be completely overwhelming and is likely to be resisted or delayed to the detriment of the project. With short functions, tests practically write themselves, and if TDD (Test-Driven Development) is practiced, short functions tend to be the natural result of the tests that are written first.

Making testing easier should motivate you to write smaller functions all by itself. Making code easier to read, write, change, and document as well is even better. It took a long time to appreciate how beneficial it is to program this way because I couldn't get over the friction of creating all of those little functions. Try it. Write those short and sweet functions. It will free your mind.

No comments:

Post a Comment