Search This Blog

Let's Refactor Some Bad Code, Part 2

One reason to spend time refactoring code is to make it livable. If you think of your code as a workshop—a place where you get work done—you want your workshop to be clean and organized so that you can spend time efficiently getting stuff done. If you have tools and materials haphazardly strewn all over your workshop, it will take longer to find the things you need, and it will be difficult to clear out the space required to do the tasks that need to get done. A messy workshop negatively affects the quality of the workmanship that's done in it. So it is with code.

Last time we did the bare minimum cleaning necessary to get a rainflow counting program in working order by giving it consistent formatting and a small set of tests. Now the goal is to target those parts of the program that make it the most annoying to work with and fix them so that the program is more well-suited for its use cases. I'll show each change to the code as a git commit, and you can follow along with the diffs in my rainflow git repository in addition to the code snippets I'll show here.

Streamlining the UI


In looking for what to work on next, the most annoying thing is that the program asks too many questions before doing its thing. It asks for an input file name, an output file name, and a few other things that I'll never use. Every time I run the program, I have to type these things in, which is especially tedious when running the tests. It would be much better if the file names could be provided on the command line and the features that I don't use were just removed. Let's go through these changes one by one, starting with adding command line argument support.

The main function currently has no arguments:
int main() {
So we need to add in the standard arguments for passing command line arguments to the program:
int main(int argc, char * argv[]) {
What arguments should we pass in from the command line? An input file name would be good for a start, and we can also have an optional output file name. If it doesn't exist, we can use rainflow.out for the output file name. This code can be added right at the beginning of main:
  char * outf;
  if (argc == 1) {
    printf("Usage: rainflow <input file> [<output file>]\n");
    exit(0);
  } else if (argc == 2) {
    outf = (char *)"rainflow.out";
  } else {
    outf = argv[2];
  }
  char * inf = argv[1];
Next, we need to pass our input and output file names to the right functions by adding them as function parameters to the declarations and definitions:
  void read_data(const char * inf);
  void print_data(const char * outf);
And we need to add them to the function calls:
  rf.read_data(inf);
  rf.rainflow_engine();
  rf.print_data(outf);

Tearing Down read_data()


It's time to look inside read_data() and figure out how to use the input file we've sent to it. Right away we see that read_data() is a big function, nearly 100 lines. We can expect to cut that down by quite a bit because functions shouldn't be that long. There's a declaration for a variable aa right at the beginning that we don't know anything about, yet. Let's move past that. The next dozen lines ask about the format of the data. Instead of asking, let's assume it's always a list of amplitudes. All of our input data will have a regular sample rate, so the time and amplitude format is irrelevant. We can remove all of this code:
  printf("\n The input file must be a time history. \n");
  printf("\n Select format: ");
  printf("\n   1=amplitude ");
  printf("\n   2=time & amplitude \n");

  scanf("%d", &ic);

  if (ic == 1) {
    printf("\n\n The base input file must contain one column: unit \n");
  } else {
    printf("\n\n The base input file must contain two columns: \n");
    printf(" time & unit    \n");
  }
The next set of lines deals with opening and reading the input file:
  printf("\n Input filename \n");
  scanf("%s", filename[0]);

  pFile[0] = fopen(filename[0], "rb");

  while (pFile[0] == NULL) {

    printf("\n Failed to open file: %s \n", filename[0]);

    printf("\n Please enter the input filename: \n");

    scanf("%s", filename[0]);

    pFile[0] = fopen(filename[0], "rb");
  }
  printf("\n File: %s opened. \n", filename[0]);
Let's ignore for the moment where pFile comes from and why we're using an array of file pointers to manage files that shouldn't be opened for very long. We'll address that issue later. Right now we're focusing on getting the UI sorted out. We don't need filename[0] anymore (again with the nearly global variables. We really will take care of these, I promise.) because we passed in inf for the input file, and we don't need to repeatedly ask for a file name because if it fails to open for some reason, that problem can be taken care of at the command line instead of inside this program. The refactored code looks like this:
  if ((pFile[0] = fopen(inf, "rb")) == NULL) {
    printf("\n Failed to open file: %s \n", inf);
    exit(0);
  }
Of the next set of lines:
  printf("\n\n Enter the output table filename: \n");
  scanf("%s", filename[1]);

  pFile[1] = fopen(filename[1], "w");

  strcpy(filename[2], "rainflow_graph.out");
  pFile[2] = fopen(filename[2], "w");

  strcpy(filename[3], "range_cycles.out");
  pFile[3] = fopen(filename[3], "w");

  strcpy(filename[4], "amp_cycles.out");
  pFile[4] = fopen(filename[4], "w");

  //    strcpy(filename[5],"points.out");
  // pFile[5]=fopen(filename[5], "w");
The first couple aren't needed because the output file is now a command line option, the line that opens the output file doesn't belong in read_data() so we'll move it to print_data(), and the rest of the lines can be removed because we're not going to generate multiple output files. That means the following line should be stuck at the beginning of print_data():
  pFile[1] = fopen(outf, "w");
We've now broken some things by removing lines that open files, so in print_data() we need to remove
    fprintf(pFile[2], " %ld \t %g \n", i, C[i]);
and
  fclose(pFile[2]);
  fclose(pFile[3]);
  fclose(pFile[4]);
And we might as well remove these lines from the end because, who needs them:
  printf("\n\n The output files are: \n");

  printf(" %s \n", filename[1]);
  printf(" %s \n", filename[2]);
  printf(" %s \n", filename[3]);
  printf(" %s \n", filename[4]);
We also need to remove printing to the other file pointers in rainflow_engine():
    fprintf(pFile[3], " %10.4e \t %3.1f \n", Y, B[i][1]);
    fprintf(pFile[4], " %10.4e \t %3.1f \n", Y / 2., B[i][1]);
Back in read_data(), the next set of lines reads the data out of the input file:
  i = 0;

  if (ic == 1) {
    while (fscanf(pFile[0], "%f", &aa) > 0) {
      y.push_back(aa);

      i++;

      if (i == MAX) {
        printf("\n Warning:  input data limit reached \n.");
        break;
      }
    }
  } else {
    while (fscanf(pFile[0], "%f %f", &t, &aa) > 0) {
      y.push_back(aa);

      i++;

      if (i == MAX) {
        printf("\n Warning:  input data limit reached \n.");
        break;
      }
    }
  }
Since we're always using the same data format, we can remove the if condition and the entire else clause. I also want to get rid of the i variable because y is a vector, and its size can be tested directly. That leaves these lines:
  while (fscanf(pFile[0], "%f", &aa) > 0) {
    y.push_back(aa);

    if (y.size() == MAX) {
      printf("\n Warning:  input data limit reached \n.");
      break;
    }
  }
The next couple lines are redundant variables:
  ylast = y[i - 1];

  NP = i + 1;
Wherever ylast is used in the program, it should be replaced with y.back() and wherever NP is used, it should be replaced with y.size(). Using dedicated variables for these values is not saving any time. These variables, which are members of the Rainflow class, can then be removed. The rest of the lines in read_data() are also not needed because we're never going to scale the points:
  // printf("\n ref 1: last_a = %ld \n",last_a);

  printf("\n ");
  printf("\n Multiply data by scale factor?");
  printf("\n 1=yes  2=no \n");
  scanf("%d", &iscale);

  if (iscale == 1) {
    printf("\n Enter scale factor \n");
    scanf("%f", &scale);

    for (i = 0; i < NP; i++) {
      y[i] *= scale;
    }
  }
Okay, that was a lot of changes. We cut read_data() from nearly 100 lines to just 17. That's much more manageable, and the UI is greatly improved. Time to check that nothing broke with our tests and commit the changes before moving on to print_data().

Tackling print_data()


We already removed some stuff from print_data() while working on read_data() because the inter-dependencies in this program required it. That's a code smell that something else needs to be improved, but we'll save that for another time. As for the rest of print_data(), it's printing the same thing to both stdout and the output file. That seems like overkill, so let's remove the prints to stdout. Also, the function is printing to the output file at the end after the file's already been closed:
  fclose(pFile[0]);
  fclose(pFile[1]);

  printf("\n\n  Total Cycles = %g  hold=%ld  NP=%ld ymax=%g\n", sum, hold, y.size(),
         ymax);
  fprintf(pFile[1], "\n\n  Total Cycles = %g  hold=%ld  NP=%ld ymax=%g\n", sum,
          hold, y.size(), ymax);
Yikes! Let's fix that. Since the golden output files have already been generated, we can leave the line that prints to stdout as a nice summary on the command line and remove the line that prints to the output file because it never got there before anyway. That fix leaves the output files the same as they were before these changes while still showing this information for each run of the program. I also don't like that we're closing the input file in the function that's only printing to the output file. Let's move that line back into read_data(). What we're left with in print_data() is this:
void Rainflow::print_data(const char * outf) {
  pFile[1] = fopen(outf, "w");

  fprintf(pFile[1], "\n Amplitude = (peak-valley)/2 \n");

  fprintf(pFile[1], "\n ");
  fprintf(pFile[1], "\n          Range            Cycle       Ave     Max     "
                    "Ave     Min      Max");
  fprintf(pFile[1], "\n         (units)           Counts      Amp     Amp     "
                    "Mean    Valley   Peak \n");

  for (i = 13; i >= 1; i--) {
    fprintf(pFile[1],
            "  %8.4lf to %8.4lf\t%8.1lf\t%6.4g\t%6.4g\t%6.4g\t %6.4g\t %6.4g\n",
            L[i], L[i + 1], C[i], AverageAmp[i], MaxAmp[i], AverageMean[i],
            MinValley[i], MaxPeak[i]);
  }

  fclose(pFile[1]);

  printf("\n\n  Total Cycles = %g  hold=%ld  NP=%ld ymax=%g\n", sum, hold, y.size(),
         ymax);
}

Back to main()


Okay, that was quick. Now we're back in main() and we're almost done improving the UI. After the call to rf.print_data(), the program asks if we want to calculate the damage index. I never do, so we can remove this code:
  int icd;
  printf("\n\n Calculate relative damage index D?  1=yes 2=no \n");
  scanf("%d", &icd);

  if (icd == 1) {
    rf.damage_index();
  }
We can also blow away the damage_index() function declaration and definition. Then we're left with a run time printout and a "Press any key to exit" prompt:
  time_t end = time(0);
  double time = difftime(end, start);

  printf("\n Elapsed Time = %8.4g sec \n", time);

  printf("\n Press any key to exit.\n");

  getchar();

  exit(1);
Let's keep the former because, hey, why not? And remove the latter so we don't have to press a key to exit the program.

With that, we've dramatically improved the UI of this program so that it's easier to run tests and more in line with how short command line programs should be structured. After checking the tests and committing the code, things are looking pretty good. Be sure to take a look at the git diffs to see all of the changes in context. For next time we just have one ginormous function and a mess of member variables to address. How hard can that be?