Search This Blog

Exploring the Tension in Software Abstractions

Ever since we started making tools as a species, we've been dealing with abstractions. We've been adding layer upon layer of abstraction to every form of work imaginable since those first hand-made chisels and knives long ago. With every layer of abstraction that we add, we can do the same amount of work with less effort or more work with the same amount of effort. Those layers of abstraction don't come for free, though. They add complexity. Especially within software engineering, where we have fine control over abstractions and can add many layers of abstraction even within one program, a tension exists between the abstractions we create and the complexity that results. We have to make intelligent choices with those abstractions while being aware of the trade-offs.

Movement and Computation


To get an idea of how we use abstractions to increase the efficiency of doing work, we can look at something physical that we've abstracted over time, like movement. How we get from point A to point B has had a huge impact on the development of civilization, and we've been abstracting how we get around for ages. We started out on our own two feet, but once we figured out how to train other animals to let us ride them, we could move faster with less effort and carry more stuff with us. That abstraction didn't come for free, though. We had to invest time training horses, donkeys, and camels, caring for our animals to keep them alive and healthy, and making equipment like saddles, bridles, and stables to improve the animals' usefulness. The additional work we had to do to support the abstraction of using animals for movement was worth the effort, and we could move around much better and get much more done than what we gave up in time spent caring for horses.

Another abstraction we layered on top of horse riding was attachments: buggies, wagons, and carriages. We could attach essentially a platform on wheels (with many variations of platform) to one or more horses so that we could carry much more stuff for much longer distances in more relative comfort than we could on horseback alone. This abstraction added more work to build, maintain, and repair the wagons, but it was still a big win in efficiency.

From here we could look at all kinds of other abstractions like bicycles, trains, airplanes, and rockets. Every abstraction of movement has its own advantages, trade-offs, and applications, but the one we'll look at is the automobile. It has replaced the horse with horsepower in the form of an engine (or most recently an electric motor) and combined that with the wagon for transporting people and stuff. We have spent massive amounts of time and energy designing, manufacturing, and supporting this abstraction, and we've made great gains in efficiency and convenience because of it.

Anyone who owns a car knows that cars need to be maintained. Periodically, the oil and filters need to be changed, the brakes need to be replaced, and various components need to be inspected for wear. Most of the time you don't need to know much about how the car works. You just need to know how to drive it to get around, but every once in a while the abstraction leaks. The car breaks down, and if you don't know enough about the abstraction to go under the hood and fix it, you might have to hoof it without the actual abstraction of a horse. Although, you likely have a modern abstraction of communication to help you line up a tow truck, a mechanic, and a ride to where you need to go.

Now compare the abstractions of movement we've developed with abstractions of computation. They have a lot in common. Computation has gone through its own series of abstractions, from teams of human "computers" that did calculations by hand to huge computing machines that read punch cards and used vacuum tubes for logic to the modern day computer with billions of transistors, terabytes of storage, and teraFLOPS of performance (at least). A modern software program is also built on many layers of abstraction, from a high-level programming language expression at the top to the silicon bandgap at the bottom.

Like transportation abstractions, our computation abstractions require huge amounts of design, manufacturing, and support effort. We have to maintain and repair our computers, and all of the abstractions that make them up can leak from time to time. They also allow us to do huge amounts of work with orders of magnitude less effort than we could do without them. Computers enable us to do all kinds of things that would be impossible without them, like measure the creation of previously undiscovered particles.

Layers of abstraction exist within a software program as well. Every class and function provides an abstraction of the computation that's done within it, and there are some trade-offs to consider when deciding when and where to add these abstractions. Software is quite flexible, and it's relatively easy to add or remove layers of abstraction to hide or expose more of a computation in a given function. That ability to finely layer abstractions for little cost changes the analysis a bit from the big abstractions of automobiles or microprocessors. In software design we are constantly doing small cost-benefit analyses, and that adds another dimension to the choices we make when writing code.

Queues and Logs


The trade-offs involved when choosing where to add abstractions to a program create a tension in the design of the program. There is a constant pull at every level of the design between hiding implementation details so that they can be ignored and showing those details so that they can be reasoned about in the context of where they are used. To explore this tension, let's look at an example.

Let's assume we're designing a logging feature for an embedded system. The embedded system could be anything, but let's say it's a robot. We want to keep track of vital information about the state of the robot while it's running, and be able to read back that state if something goes wrong to get clues about where the program messed up.

We don't have infinite space for this log, especially because it's an embedded system, so a circular queue would be a good data structure to use. We can add new log entries to the front of the queue and read them from the tail of the queue. Once the queue gets full, it will start overwriting the oldest entries at the tail of the queue with the newest entries that are being written to the head. Here is a basic implementation of a queue written in C:
/*
 * Queue.h
 */

#ifndef SOURCES_QUEUE_H_
#define SOURCES_QUEUE_H_

#include <stdint.h>

typedef uint8_t * QUEUE_ELEM_PTR;

typedef struct QUEUE {
  QUEUE_ELEM_PTR base;
  QUEUE_ELEM_PTR head;
  QUEUE_ELEM_PTR tail;
  uint32_t elem_size;
  uint32_t elem_count;
  uint32_t q_size;
} QUEUE, *QUEUE_PTR;

void Queue_init(QUEUE_PTR p_q, QUEUE_ELEM_PTR p_q_base,
                uint32_t elem_size, uint32_t q_size);
void Queue_enqueue(QUEUE_PTR p_q, QUEUE_ELEM_PTR p_new_elem);
int32_t Queue_dequeue(QUEUE_PTR p_q, QUEUE_ELEM_PTR p_elem);

#endif /* SOURCES_QUEUE_H_ */
/*
 * Queue.c
 */

#include <string.h>
#include "queue.h"

void Queue_advance_ptr(QUEUE_PTR p_q, QUEUE_ELEM_PTR *p) {
  (*p) += p_q->elem_size;
  if (*p >= p_q->base + p_q->q_size) {
    (*p) = p_q->base;
  }
}

void Queue_init(QUEUE_PTR p_q, QUEUE_ELEM_PTR p_q_base, 
                uint32_t elem_size, uint32_t q_size) {
  p_q->base = p_q_base;
  p_q->head = p_q_base;
  p_q->tail = p_q_base;

  p_q->elem_size = elem_size;
  p_q->elem_count = 0;
  p_q->q_size = q_size;
}

void Queue_enqueue(QUEUE_PTR p_q, QUEUE_ELEM_PTR p_new_elem) {
  QUEUE_ELEM_PTR p_write = p_q->head;

  Queue_advance_ptr(p_q, &p_q->head);

  // Advance the tail if the head has caught it.
  if (p_q->head == p_q->tail) {
    Queue_advance_ptr(p_q, &p_q->tail);
    p_q->elem_count--;
  }

  // Now add event in the old location.
  memcpy(p_write, p_new_elem, p_q->elem_size);

  p_q->elem_count++;
}

int32_t Queue_dequeue(QUEUE_PTR p_q, QUEUE_ELEM_PTR p_elem) {
  if (p_q->tail == p_q->head) {
    return -1;
  } else {
    memcpy(p_elem, p_q->tail, p_q->elem_size);
    Queue_advance_ptr(p_q, &p_q->tail);
    p_q->elem_count--;
    return p_q->elem_count;
  }
}
This queue implementation is an abstraction, and we can already see some tension between it and the code that would use it. For instance, the QUEUE structure contains an element size and a queue size that are set by the caller, and the Queue_init routine doesn't check to make sure that the queue size is a multiple of the element size. It's the caller's responsibility. If this were a data structure that would be part of a library, maybe it would be appropriate to do the check for the caller. Then we would have to define an error code and make sure the caller checked the error code if there was a risk of misaligned elements. Since this is for an embedded project where all of the source code would be available to the developer and the number of developers that will be using the code is relatively small, it might be okay to forgo the check. There's no clear answer, and that creates tension.

Some other trade-offs that were made were that the queue doesn't allocate any memory—the caller is responsible for that, too—and the queue enforces that the head and tail only point to the same element if there are no elements in the queue. That means the queue can never be truly full. It will always advance the tail and remove an element if the head catches the tail.

With this queue, we could fairly easily create a log structure that uses the queue as an abstraction and offloads some of the complexity of logging so that we don't have to think about all of the details at once. It's clear that putting the code for handling the queue in one place is better than having it spread throughout the logging code, and it allows us to leverage the abstraction to get more work done, much like using a car to run errands.

The queue's functions also adhere to the Single Responsibility Principle (SRP), with each function doing one thing. The one possible exception is the Queue_dequeue function, which returns the number of elements left in the queue. It's a slight deviation from SRP, but it's very convenient when looping through the queue to know when the last element was just dequeued. Again, we see that there's tension about what the best design choice is.

What if there's a crash?


It's all well and good that we have logging to a circular buffer for our robot, but what if the code crashes and reboots? What if the robot starts running amok and we have to hit the shutdown switch? The queue only resides in memory, so in most failure cases, it's likely to be lost before we could read it. To make the log more permanent, we want to write it to non-volatile memory. Most embedded processors have internal flash, so we could use that flash or attach an external flash memory to store the log. Two problems still remain with this solution, our queue implementation doesn't know anything about writing and erasing flash or about initializing memory that may have some valid elements in it already.

Let's start with the problem of initializing the head and tail pointers for memory that might already have values in it. Erased flash memory always has a value of 0xFF (hexadecimal notation), so we can use that value to figure out where the head and the tail of the queue are. Starting from the base of the queue, (or anywhere within the queue, really) the tail will be the first non-erased element after an erased element, and the head will be the first erased element after a non-erased element. We can add this code to the Queue_init routine like so:
bool Queue_element_is_blank(QUEUE_PTR p_q, QUEUE_ELEM_PTR p) {
  QUEUE_ELEM_PTR p_end = p + p_q->elem_size;
  for (; *p == 0xff && p < p_end; p++) ;
  return (p == p_end);
}

void Queue_init(QUEUE_PTR p_q, QUEUE_ELEM_PTR p_q_base,
                    uint32_t elem_size, uint32_t q_size) {
  p_q->base = p_q_base;
  p_q->head = p_q_base;
  p_q->tail = p_q_base;

  p_q->elem_size = elem_size;
  p_q->elem_count = 0;
  p_q->q_size = q_size;

  /*
   * head = 1st blank entry after a non-blank entry
   * tail = 1st non-blank entry after a blank entry
   */
  bool was_blank = true;
  for(QUEUE_ELEM_PTR p_elem = p_q->base;
      p_elem < p_q->base + p_q->q_size;
      p_elem += p_q->elem_size) {
    if (!Queue_element_is_blank(p_elem)) {
      if( was_blank ) {
        p_q->tail = p_elem;
      }

      p_q->elem_count++;

      was_blank = false;
    } else {
      if (!was_blank) {
        p_q->head = p_elem;
      }
      was_blank = true;
    }
  }
}
Adding this code to Queue_init shows another form of tension in writing functions. The two parts of the function, assigning values to the structure and finding the head and tail, could have been pulled out into their own functions and called from Queue_init. That would make Queue_init simpler because it would only be two function calls, and it would be much easier to understand at a glance. It may also be harder to introduce bugs because each step of the initialization process is isolated in its own function. Plus, a developer that only wanted to check how the structure is initialized or to look at the head-and-tail finding algorithm could focus on that piece of the code because it's in its own function.

On the other hand, keeping all of the initialization steps together at the same level of abstraction has its own advantages. It may be more complex, but not substantially so. The whole function is only about 30 lines of code. Having all of the code in one place makes it easier to see the entire behavior of the code at once. We don't have to jump back and forth between three different functions to see everything that it's doing. Seeing all of the details at once, as long as it's not too many, can make it easier to change the function, if needed, and see bugs that might be hidden between layers of abstraction. It's a difficult design choice, and every case will have its own conditions that tilt the balance one way or the other. In this case, it's probably best to keep all of the details together.

For the case of writing and erasing the flash in the Queue_enqueue function, things are a little different. The queue doesn't really need to know how to write and erase the flash, so those operations can be separate functions that are called when needed. However, the queue does need to know a little about the structure of the flash because normally flash memory locations can only be erased a block at a time, with a block being a range of addresses. To keep track of the tail, the queue will have to know the block size and advance the tail by that much whenever it erases a block. The new Queue_enqueue code looks like this:
int32_t Queue_enqueue(QUEUE_PTR p_q, QUEUE_ELEM_PTR p_new_elem) {
  QUEUE_ELEM_PTR p_write = p_q->head;

  Queue_advance_ptr(p_q, &p_q->head);

  // check if we need to erase things
  QUEUE_ELEM_PTR next_block = (p_q->head + FLASH_BLOCK_SIZE);
  bool tail_in_block = p_q->head <= p_q->tail && p_q->tail < next_block;
  bool erase = !Queue_element_is_blank(p_q, p_q->head);

  if (erase) {
    Flash_erase((uint32_t)p_q->head);
    if( tail_in_block ) {
      //need to move the tail to the next block
      p_q->tail = next_block - p_q->elem_size;
      Queue_advance_ptr(p_q, &p_q->tail);
    }

    uint32_t events_per_block = FLASH_BLOCK_SIZE/p_q->elem_size;
    p_q->elem_count -= events_per_block;
  }

  // now add element in the old location
  uint32_t ret = Flash_Write((uint32_t)p_write, p_q->elem_size, p_new_elem);

  if (ret) {
    p_q->head = p_write;
  } else {
    p_q->elem_count++;
  }

  return ret;
}
The function now returns an error code if the flash could not be written for some reason, and the FLASH_BLOCK_SIZE constant would be defined in the code for the flash. It would be nice if the queue didn't need to know about the flash because that's a different level of abstraction, and it's better to keep the entire function at the same level of abstraction. However, the head and tail management is bound up with the flash structure now, and it would probably add more complications and be more confusing if we tried to separate them. The other alternative, moving some of the head and tail management into the Flash_erase code, is even worse because the flash will most likely be used to store other things as well and mixing code for the queue into it would pollute the flash code.

In the end the compromise taken here isn't too bad because the function is still relatively short and understandable. Because it's an embedded system, not a general library, the implementation is already tied more closely with the hardware it's running on, and these kinds of compromises are made for the sake of code size and efficiency. A log in a different type of system would warrant a different choice, but then again, it may have a hard drive instead of flash, in which case the implementation would be totally different. Code should be designed for the context in which it is used.

This code is now a good starting point for implementing a logging function for our robot. The abstractions strike a good balance, and the benefits of using the queue with flash support likely outweigh the cost of understanding and maintaining it. It may not be as big of a jump as moving from a horse and buggy to a car, but programming is full of abstraction opportunities, and every well-designed abstraction helps get a little more work done for a little less effort.

Not Too Tight, Not Too Loose


We deal with abstractions all the time, and building abstractions on top of abstractions is how we make progress with technology. In software development it is especially apparent that abstractions are everywhere. The layers of abstraction within a program are thin, and they have a definite tension at the boundaries. Making the abstractions tighter can hide unnecessary details and improve code decoupling while making them looser can expose more interactions and improve code awareness. Good design balances the tension between those two concerns.

No comments:

Post a Comment