The Cost of Abstraction

Programmers love abstractions, and they spend a significant amount of time thinking up and building new ones. Look at the everything-is-a-file abstraction in Unix, the IO stream abstraction in most languages' standard libraries, or the many types of abstractions that make up the various software patterns. Abstractions are everywhere in programming, and when they are useful they can really improve the utility of software. But abstractions don't come for free.

Abstraction, Yes, But at What Cost?


Abstractions have a number of costs during the course of their design and use. To make this discussion more concrete, let's assume we're talking about abstracting a communication interface with a higher level set of messages on top of different lower level protocols, like USB, SPI, or I2C. To use an abstraction, there is the initial cost of designing and building it to generalize multiple different protocols to use the same interface. There is the incremental cost of making each additional protocol fit the abstract interface and possibly extending the abstraction when a new protocol can't easily fit the mold. There is the maintenance costs of needing to understand the abstraction whenever new messages need to be added or things stop working the way they should.

Then there's the hidden cost. What if the abstraction is never used more than once? What if you only ever use the interface over USB? The design would have most likely been much simpler without abstracting it to support multiple protocols, so every time you use it, you have to deal with the overhead of the abstraction without any of the benefits. If you design abstractions into your code by default, without thinking about the costs, you'll quickly build up an edifice of unnecessary overhead that will constantly slow you down. Uncle Bob talks about this drag on productivity when discussing the Factory pattern in Agile Principles, Patterns, and Practices in C#:
Factories are a complexity that can often be avoided, especially in the early phases of an evolving design. When they are used by default, factories dramatically increase the difficulty of extending the design. In order to create a new class, one may have to create as many as four new classes: the two interface classes that represent the new class and its factory and the two concrete classes that implement those interfaces.
Of course, the Factory pattern is an invaluable abstraction in the right context, just like a communication interface abstraction is invaluable if you need to send messages over multiple protocols. Useful abstractions make us much more efficient programmers. So how do we decide when the added flexibility and complexity of an abstraction outweighs the greater simplicity and rigidity of a direct approach?

Abstract Vs. Concrete


First, we should understand what parts of a software system should be abstract and which parts should be concrete. Uncle Bob has some good insight on this distinction as well:
Some software in the system should not change very often. This software represents the high-level architecture and design decisions. We don't want these architectural decisions to be volatile. Thus, the software that encapsulates the high-level design of the system should be put into stable components… . The instable components…should contain only the software that is likely to change.
Not only do we want the high-level architecture of the system to not change very often, but we want it to be resistant to change. We want it to be flexible under duress, and abstractions provide that flexibility. The details of the system are necessarily more rigid. They will break and need to be rebuilt when their requirements change.

The natural tendency when faced with the prospect of change is to try to design the system upfront to handle as much change as possible. Software patterns and polished examples do a disservice here because they encourage the line of thinking that more abstractions will save the day. They are presented in a finished form and explained so clearly that you begin to believe that if you can abstract away every potential point of change in the system, making changes and adding new features will be easy. The problem is, understanding and working with an overly abstract system is not easy. It requires a huge amount of cognitive overhead to get anything done.

Books and web tutorials abound with examples of all kinds of abstractions, but they have a common drawback. It's very hard to show in an example the design path that lead to using any particular abstraction. In a real system, a well designed abstraction is there because the system needed it to be there, not because the programmer thought it would be cool to stick it in or was guarding against imagined future change. Abstractions only work well in the right context, and the right context develops as the system develops.

Example code is normally presented after the system has fully developed. The system is normally small by necessity, making most abstractions look rather silly with not much code to support them, but for the purpose of the example, the system is already finished. In real life programming, the system is not finished—because you're adding to it—and it is very unlikely that you can predict where the system will end up. Adding abstractions haphazardly ends up creating a lot of useless work.

Wait For It…Wait For It…NOW!


The right time to add an abstraction to a design is at the point when you start feeling the pain of not having it. Don't do it sooner because it's quite possible the extra work will be wasted and the extra complexity will be a burden. Don't wait too long because the whole time you're feeling the pain of not having the abstraction, more and more work is piling up that will have to be done to switch over to the abstraction.

Right when you start feeling pain is the perfect time to move to an abstraction. Both the risk of wasted effort and the amount of work to change the code will be minimized at this point. The risk of wasted effort will be small because now you are sure that you will actually use the abstraction. The amount of work will be small because so far you haven't duplicated any code and the code you do have should be easy to separate into the abstract and concrete parts. That is, of course, if you have been following good development practices.

In the case of our communication interface example, the right time to move to an abstract interface is when a second low-level protocol needs to be supported. If you only need to support USB, you don't need to have an abstract interface, but as soon as you also need to support SPI, an abstract interface will greatly reduce code duplication and make development easier. It will also be clear exactly what needs to be pulled into the abstract interface so it can be shared between the two protocols and what needs to be implemented separately in each protocol. That is the time when all of the relevant information is available and the need is most apparent.

Some people may balk at what appears to be extra work, changing code to design in an abstraction that arguably should have been designed in from the ground up. That extra work could have been avoided if the abstraction was there from the start, they say. Well, no, not really. The code should have been much easier to write without the abstraction, so that was less work initially. And it wasn't clear until later that the abstraction was actually needed. Furthermore, the abstraction would have taken about as much work to design in the first place, but would have been done with less knowledge about the system so would likely have been done wrong. The abstraction would have to be fixed when the second protocol was added anyway. Which way is really more work? I bet the abstraction-up-front approach would be.

Adding abstractions only when and where they're necessary allows a software system to evolve naturally, becoming the solution it needs to be without adding a lot of extra cruft. If the team accepts this process and allows it to happen instead of fighting against it, development will have a pleasant flow to it. Progress will be faster and require less effort when the system isn't overloaded with unnecessary, costly abstractions.

1 comment:

  1. I concur with your conclusions. There is a tendency to over-abstract.

    Great article!

    ReplyDelete