Is Your Programming Language Hostile To Functions?

Last week I talked about why functions should be short and sweet. Some people commented that some programming languages make it easier to write short functions while others make it harder, and that disparity among languages could influence the natural length of functions in programs written in those languages. I absolutely agree. If you write in a language that reduces the friction involved in writing functions, you're going to write more of them. If your language is hostile to functions, you're going to write less of them.

By hostile, I don't mean active aggression towards functions. I mean making the act of writing functions difficult enough that every time an opportunity arises where adding a function could clarify the code, you find yourself debating whether or not you want to actually spend the effort to do it. Let's look at a couple examples from two languages at opposite ends of this spectrum. Ruby will take the role of the near-frictionless language, and C++ will take the role of the openly-hostile language.

Now don't take this pairing to mean that I'm taking pot shots at C++. I honestly like C++ as a high-performance, flexible language where you can really program closer to the metal than almost any other language short of assembly. It just so happens that it also exemplifies a language that makes writing functions more difficult than necessary.

Starting With A Simple Example


Let's say you want to add all of the values in an array, and you want to wrap that functionality in a method so that you can call it on the object's data at will. In Ruby the class implementation would look something like this:

class BigData
  // ... other class stuff ...

  def sum
    @data.reduce(:+)
  end
end

Assume that @data is set somehow in some other part of the class definition. The important thing here is the method definition. I'm not sure how defining methods could get any easier in Ruby. Parentheses in the method signature are not required, the return value is simply the value of the last statement in the method, and no separate declaration is needed. It's simplicity itself. On the other hand, here's a C++ implementation of the same method:

// BigData.h
class BigData {
  // ... other class stuff ...

public:
  int sum();
};

// BigData.cpp
int BigData::sum() {
  int s = 0;
  for (int i = 0; i < _data.size(); i++) {
     s += _data[i];
  }
  return s;
}

Again assume that _data is set in another part of the class that's not shown. It's also assumed that _data is a vector so that the size() method is available. Look at how much more effort is involved in creating a method here. The method needs to be declared in the class definition in the header file, and then the body of the method is defined separately in the implementation file. In the header file, the declaration needs to go under the correct public, protected, or private heading. To be fair, Ruby has protected and private access specifiers as well (with slightly different behavior than C++), but methods are public by default in Ruby, which is generally the common case for methods. In C++ methods are private by default, so the public specifier is necessary if you want to call the method from outside the class.

Then in the implementation file, the containing class of the method must be specified in the method name, so the method declaration appears a second time in almost, but not quite the same way. The body of the method is more verbose than in Ruby because Ruby has an incredible amount of support for looping over enumerable data structures, while C++ does not. Some third-party libraries, like Boost, do improve on this shortcoming somewhat, but the functionality isn't there out of the box. Then the method has to explicitly return the desired value, necessitating another minor statement.

The definition of the method could have been included in the header file, but that changes the behavior of the compiler. Putting a method definition in a C++ header file causes that method to be inlined in any code where it's called. For short methods that may be desirable, but that's certainly a premature optimization, and most of the time it's better to let the compiler make that choice for you with all of the information it has.

A More Complex Example


Okay, so Ruby makes adding short methods much easier than C++, but how do things look with something more complex? Instead of getting a simple summation of the data, what if we wanted to apply an arbitrary mathematical transformation to that data? The class doesn't know what the transformation will be ahead of time, but it takes a function, applies it to each element of its data set, and returns the transformed data set. Here's what that would look like in Ruby:

class BigData
  def transform
    @data.map { |element| yield element }
  end
end

Wow. One would think that was going to be a bit harder. Honestly, all I've done is rename the map function that comes with Ruby's Enumerable mixin and exposed it as part of the BigData class. Maybe that seems useless, but Ruby makes this kind of thing so easy that if it improves the program's readability, it can be worth doing even if it's just recycling functionality that's already available. That's partly why DSLs (Domain Specific Languages) are so common in Ruby.

The map function uses a code block to define what will be done to each element of the enumerable data that it was called on. In this case, the code block passes each element to the code block associated with the transform method, and the transformation is defined where transform is called, like so:

data = BigData.new
new_data = data.transform { |x| x*2 + 1 }

Assuming that the data is initialized in new, the data set will be scaled by two and offset by one. Nice and simple. If other transformations are necessary, you simply make more calls with the other transformations in the code block. So how would this look in C++? Here's one way to do it:

// BigData.h
#include <vector>
class BigData {
public:
  void transform(std::vector<int> *result, int (*f)(int));
};

// BigData.cpp
#include <algorithm>
void BigData::transform(std::vector<int> *result,
                        int (*f)(int)) {
  result->resize(_data.size());
  std::transform(_data.begin(), _data.end(),
                 result->begin(), f);
}

Once again, the declaration and initialization of _data is not shown, but assume it is a std::vector<int> and data is loaded into the vector by some other means. The new wrinkle in this example is the function pointer int (*f)(int). Since C++ doesn't have a yield mechanism like Ruby (and many other languages), the desired function for the data transformation has to be communicated to the transform() method some other way, and a function pointer will suffice. The syntax is not nearly as nice as the yield-to-code-block syntax, though.

The syntax for std::transform() is much uglier than map, too, with four parameters to pass instead of a single code block because it's not a method of std::vector. This design allows std::transform() to be used with any type of iterators, but then those iterators have to be passed into the function explicitly.

A vector for holding the results of the transform is also passed into the function because having the method return a vector would cause all kinds of sticky design decisions that would fill a blog article all by themselves. Let's just say that this is one way that the lack of garbage collection in C++ makes programming functions specifically, and the language in general, more difficult than garbage collected languages.

Another issue that should be apparent by now is that the C++ implementations only handle vectors of integers, while the Ruby implementations for both examples can work with collections of any type that supports the '+' operator (and any other operators used in the code block) and is enumerable. The added complexity and reduced flexibility in C++ is because of its static type system vs. Ruby's dynamic typing. The C++ code could be extended to support more types using templates, but that would add a great deal more complexity to the syntax jungle we already have so I won't get into it now.

Getting back to the example at hand, I haven't shown how to use BigData::transform(), yet. The function that will be passed to it needs to be defined, and the result vector needs to be declared.

// main.cpp
#include <vector>
#include "BigData.h"

int scale_and_offset(int x) {
  return 2*x + 1;
}

int main() {
  BigData data;
  std::vector<int> result;
  data.transform(&result, scale_and_offset);
}

The code is predictably much more verbose than the Ruby code, and you have to make sure to pass a pointer to the result vector, as is required by the BigData::transform() method declaration. Additional transforms will each need their own functions defined, and if you only need to reference them once, that's a fair amount of overhead. If the compiler you're using supports it, you could use a lambda in place of the scale_and_offset() function to simplify things a bit, but it's not the prettiest syntax you've ever seen:

data.transform(&result, []()(int x) {
  return 2*x + 1;
});

And Your Point Is...


My intention with the above examples was not to demonstrate a ground-breaking use case for functions, but to showcase the differences in how functions are declared and used in two very different languages. Ruby clearly makes writing short functions easy with things like dynamic typing, exceptionally clean syntax, and code blocks. C++ makes it more difficult at every turn, but why does it matter? All of that extra syntax, decision-making, and code-writing may seem insignificant on a small scale, but it quickly adds up to a lot of extra mental overhead when writing in languages that are hostile to functions. I'm certainly not advocating that we stop using those languages. Every language has its advantages and disadvantages, and even with all of C++ complexity, ugliness, and sometimes insanity, it is still a great language in many domains.

The key thing is to recognize when you're programming in a language that creates a lot of friction and to be aware of all the ways that it makes using functions difficult. Keep that knowledge in mind, but don't shy away from writing small functions. That path leads to madness. Be prepared to put in the extra effort to make your functions short and sweet so that you can test them better now and understand them better later. The benefits still outweigh the additional effort. And if you're writing in a language that makes functions easy, take full advantage of it, and appreciate how good life is.

No comments:

Post a Comment