The Quandary of Working With Legacy Code

I have a dilemma. That dilemma involves choosing what to do with the legacy code I'm working on. I have a pretty loose definition of legacy code - basically any code that has been checked into the repository qualifies. If it's been committed, then it becomes someone's responsibility to maintain it, and that makes it legacy code for all intents and purposes. In my case, that maintainer is me alone. At the company I work for code gets split pretty cleanly along microprocessor and microcontroller boundaries, and the code communicates through a variety of serial interfaces. That's not what defines this situation, though, and since dealings with legacy code has as many contexts as there are code bases, knowing the particulars of this situation is important for understanding the dilemma.

This particular code is a small web server that sits on an embedded processor and uses an on-board WiFi chip to connect to a client and serve a small set of dynamic web pages. Since the code base is only about 6 KLOC, it's easily manageable as a side project for one person. Most of the original C code came from TI's example web server app, and I tore out the functionality we didn't need and added other functionality that we did need. The issue I'm having now is deciding whether or not to do a more extensive refactoring of this code, and possibly convert it to a C++ class architecture in the process.

What, Exactly, Is The Problem Here?

There are a number of problems with the code at a number of levels that make me want to do this refactoring, so let's run through them briefly. First, there is no consistent naming convention. Sometimes Hungarian Notation is used, and sometimes not. When it is used, it's not the good kind of Apps Hungarian Notation, but the obnoxious kind of Systems Hungarian Notation. And even then it's only used partially, with numerous variables prefixed only with a 'u' for unsigned, but no other type definition. Function names are ridiculously long, like HttpDynamicHandler_GetBasicUnitInfo(). Many variable names are nearly as long, and end up being longer after the huge chains of struct member accesses are written out.

Then the code and comment formatting is haphazard with many variations of tabs and spaces and block comments and line comments with most commenting restating what the code does. Type declarations are totally inconsistent with such things as int, uint32, and unsigned long used for the same variable at different levels of function calls, or sometimes using typedef struct {...} <name> and other times using struct <name> {...} for struct definitions. This makes the variable typing look more than a little disorganized.

On top of the long names, the functions themselves are mostly long, untestable messes of for loops, while loops, and if-else chains running on for hundreds of lines. It is readily apparent that most of those useless comments are marking suitable places to break the code up into shorter, more manageable functions. Unit testing is further complicated by many static functions that restrict access to calling those functions from outside their compilation unit.

Finally, the overall architecture of the code is somewhat disorganized. After drawing up a quick UML class diagram and substituting structs and files for what would otherwise be classes, it became pretty obvious that some files should be broken up into smaller, more focused units, and some reorganization would make the architecture much cleaner and more maintainable. If the code was also converted to C++, it would easily transfer to classes and many of the long names that were carrying the responsibility for defining the purpose of functions would be cleaned up in the process. I'm not saying that a conversion to C++ is necessary to make this code clean, but it was clearly written in a way that is more in line with the class structure of C++ instead of the procedural structure of C. The conversion would be quite natural.

Well, Then, What To Do About It?

For any programmer that's worked with legacy code, these problems are very familiar. Maintaining consistency is hard, and over time functions seem to accrete more and more logic until they become unbearable rat's nests of code. Even though I try to leave things better than I found them, and every time I add features or fix bugs in this code I try to make it a little bit better, that won't be enough to even begin to address the problems with this code.

To put things in perspective, the main code base that I'm responsible for was another example of code with all of these problems, and it's about five times more code. I didn't hesitate to refactor all of that code and convert it to C++, so why would I be second guessing that choice this time around when it amounts to a much smaller task?

Every situation is different, and it is important to weigh the pros and cons of such an arduous decision as transforming a code base, no matter how small. From my ranting about the poor quality of the code, you may think that only good things could come from paying down this technical debt, but I'm not so sure.

The Pros And Cons of Refactoring

The biggest thing this code base has going for it right now is that it is working. By that I don't mean that I'm afraid to break something. Quite the opposite, in fact. Even though there are currently no unit tests, or automated tests of any kind really, the system is so focused and self-contained in what it does that a simple manual test is all that's needed to see if it's working. All I have to do is access the embedded device's IP address from a browser and make sure the web page graphs the real-time data it's acquiring correctly.

No, the advantage of the code already working is that nothing else has to be done to get it working. Most of the code hardly changed while I was adding and removing features, and the areas that need to change to add more features are well defined and quarantined. Even though the code is a mess, and it offends me to my programmer's soul when I have to look at it, it is fairly easy to change what I need to and move on.

But I have had to look at that dirty code a lot lately, and every time I try to ignore the mess, the pro-refactoring part of my brain cries out in agony. Cleaning up the code would make it so much more liveable, and that is worth something. I have a couple of young kids at home, and some days it seems like they make it their mission to destroy the house Tasmanian Devil-style. After my wife and I have finally gotten them to bed, we rarely have the energy to clean the house, too, and the mess will live on until the weekend. It shifts and changes like some slow-moving monster that's consuming every square inch of floor space in the house.

Looking at that kind of mess is mentally taxing, and it quite literally exhausts you. When the weekend finally rolls around, and if we happen to be home for a couple hours, we can buckle down and put everything back in its place. The feeling of a clean house after all of that chaos is like a dark cloud has been lifted from my mind, and it becomes much easier to think and more pleasant to be in the house. Cleaning up a mess of a code base can give you much the same feeling with the added benefit that it doesn't so easily revert to its chaotic state after another rousing day of playing princesses and soldiers (don't ask, kids are creative).

Another benefit of refactoring the code is that it would become much more testable, so unit testing could be drastically improved for much better peace-of-mind. This benefit is slightly circular because it would be a good idea to implement some amount of automated testing before doing the more extensive refactoring to make sure that all is still well with the code. The first tests would likely be integration tests because unit tests are currently so difficult to implement, but some amount of testing should be put in place to enable safer refactoring. Then more testing could be added in the form of unit tests as functions were split up and put into classes with the relevant data.

That sounds like a lot of work, and it probably is. That begs the question of whether or not those benefits are worth the cost, and that is not at all clear to me in this case. With the other code base I maintain, it was obvious that I would be living with it for a long time since I started with it on a previous product, was able to migrate it to the current product we're working on, and plan to use it again on the next major product we do. I'm getting a lot of mileage out of the work I put into that code to clean it up, and I knew that I would so it was clearly worth it before I started.

With this embedded web server code, it's possible that it will be a one-off application. I'm not sure yet, but if it is, then it may not be worth putting in all of that extra work for it to only sit in a microcontroller on the daughter board of this one product. That time and effort may be better spent elsewhere.

Now You See My Dilemma

This quandary of working with legacy code must be as common as sand on a beach. The code is a disorganized mess that could be easily improved with some concerted effort, but it's currently working. If the code still has a long life ahead of it, it may be worth it to clean it up and make it more liveable. Making it testable and adding automated tests has clear benefits, but what if all of that infrastructure and testing was put in place and never used? And then there's always the intangible benefit of having a well-engineered piece of software, the practice and learning that took place while building it, and the satisfaction that comes with finishing it.

Do the benefits of refactoring outweigh the costs in this situation? Can you even know with any certainty? I'm not at all sure, and I don't have to worry about the impact on other developers in this case. But I'm itching to rename that HttpDynamicHandler_GetBasicUnitInfo() function to CDynamicRequest::GetBasicUnitInfo() anyway.