May the FORs leave you alone

In his post may the FORs be with you, Michael mentioned that it’s best to avoid handwritten loops at all. This is not self explanatory for a lot of people, and while many arguments can be put forth to support the use of algorithms, in this post let’s stick with one: readabilty.

Take a look at the following example of a code using handwritten loops. It fits a line through a set of points using linear regression.

#include <vector>
#include <iostream>

int main()
{
	std::vector<double> X = {20, 16, 15, 16, 13, 10};
	std::vector<double> Y = {0,  3,  7,  4,  6,  10};

	//arithmetic mean
	auto mean_x = 0.0;
	for(auto x : X){
		mean_x += x;
	}
	mean_x /= X.size();

	auto mean_y = 0.0;
	for(auto y : Y){
		mean_y += y;
	}
	mean_y /= Y.size();

	//difference from mean
	for(auto& x : X){
		x -= mean_x;
	}
	for(auto& y : X){
		y -= mean_y;
	}

	//residuals
	auto variance   = 0.0;
	auto covariance = 0.0;
	for(auto index = 0; index != X.size(); ++index){
		variance   += X[index] * X[index];
		covariance += X[index] * Y[index];
	}

	auto slope = covariance / variance;
	auto y_intercept = mean_y - slope * mean_x;

	std::cout << "The slope of the regression line is " << slope << '\n'
		      << "and the y-intercept is " << y_intercept << ".\n";
	return 0;
}

Ok, C++11’s range-based for loops are already helping a lot to reduce the syntactic noise, but ‘for’ still only means ‘loop’. It is up to the reader to look into the body and come up with a better fitting word like ‘accumulate’, ‘transform’ or ‘product’ in his mind. The next example performs the same task, but makes use of algorithms. Since I prefer ranges to iterators in most situations, the examples uses the boost Range library instead of using the standard libraries algorithms.

#include <boost/range/algorithm.hpp>
#include <boost/range/numeric.hpp>
#include <boost/phoenix/phoenix.hpp>
#include <vector>
#include <iostream>

int main()
{
	std::vector<double> X = {20, 16, 15, 16, 13, 10};
	std::vector<double> Y = {0,  3,  7,  4,  6,  10};

	//arithmetic mean
	auto const mean_x = boost::accumulate(X, 0.0) / boost::size(X);
	auto const mean_y = boost::accumulate(Y, 0.0) / boost::size(Y);

	//difference from mean
	using boost::phoenix::arg_names::arg1;
	boost::transform(X, std::begin(X), arg1 - mean_x);
	boost::transform(Y, std::begin(Y), arg1 - mean_y);

	//residuals
	auto const variance   = boost::inner_product(X, X, 0.0);
	auto const covariance = boost::inner_product(X, Y, 0.0);

	auto const slope = covariance / variance;
	auto const intercept = mean_y - slope * mean_x;

	std::cout << "The slope of the regression line is " << slope << '\n'
	              << "and the y-intercept is " << y_intercept << ".\n";
	return 0;
}

Isn’t this more readable? Also notice how this programming style allows us to use const more offensively. However, this is not the last improvment we will apply to this example. This implementation of linear regression still changes its input data, but there is a way to get around this without copying using range adaptors. Read more on this in an upcoming post…

7 Replies to “May the FORs leave you alone”

  1. It is probably because of the triviality of the example but I would like to point out the obvious anyway: You are duplicating code! You have, in both examples, implemented the “calculate mean” and the “calculate differences” algorithm twice. I would have preferred factoring the user algorithms out before applying the range algorithms.

    1. Hello Linus,
      you are completly right. DRY ranks pretty high in my priorities, too. Not factoring these out has been a tribute to keeping the example code terse. I would never have a chance getting it like this through a code review with Michael. Another way to avoid the repetition is to actually only take one vector of (X, Y). The precondition, that there are just as many X as threre are Y, would then be visible in the signature.
      Thanks, for your feedback. In future articles I’ll try to keep it closer to production quality.

  2. Boost is a wonderfully elegant tool for things like this…until it isn’t, that is, until you have to burrow into illegible header files because you forgot a template argument or misused a non-const reference. The idea of replacing a simple for loop (which everyone can read and write) with a complicated boost template header (which nobody can read and which punishes you for attempting to understand it) is insane.

    IMO a much better way to iterate through such a vector, for performance reasons (since std::vector::size() causes problems with cache coherence):

    for(int index=0, end=X.size(); index<end; ++index){
    // do something here
    }

    This has several advantages: First, it's standard and idiomatic, and everyone can read it no matter what language they are used to or how little experience they have with the intricacies of one nonstandard library. Second, it is faster, because the compiler knows what to do with it and may more easily be able to perform optimizations. Third, X.size() only gets called once, which saves many cache misses (reading from main memory in an inner loop on PC hardware is far too slow for many applications). That only could ever cause a problem if you are modifying the size of the vector in a loop, but don't do that, because it will make your code confusing and impossible to debug.

    Another rule: code which isn't obvious should always be commented. for(int i=0; i<N; ++i) is obvious. What you are doing is not. It makes calls to several different boost headers which even I, as a fairly experienced C++ developer, have never heard of, to perform tasks which have historically been done another, much simpler, way. Why are there only three comments, none of which really provides any useful, detailed description of what is going on?

    1. What you mention is a perfect use of the 90s C++ (before I wrote a line in it :)). But the language evolves with increasing use of templates, boost stuffs and (finally!) lambdas.

      I think this is perfectly readable, with pheonix functor perhaps even more so than lambdas (e.g. [&](auto &x){ return x – mean_x; }).

      Even though transform allows the same input and destination ranges.

      Must add that for me transform better associates with something where boxes of apples come in cartons of juice come out – why not just use for_each (e.g. boost::for_each(X, arg1 -= mean_x) ;)?

      1. This Post has been part of a series. The ‘final form’ of the example can be found here.
        http://clean-cpp.org/adaptors/#more-189

        I enjoyed using boost::Phoenix a lot. Although sadly, I had to stop using it at work, because many of my colleagues are not familiar with it. In their defense: While working code, can look blindingly elegant, the compile time errors you get are a bit scary for the uninitiated.

        Regarding for_each: I use for each to express side effects. Any element wise transformation I use `transform` in some shape or form (algorithm or adapter). Since both, for_each and transform would show the same behavior at runtime the argument for one or the other has to stem from readability. Readability however ultimately depends on the reader is therefore, at least in part a subjective quality. An argument can be made however, that `transform` is the more specific, less general of the two algorithms and carries therefore more meaning.

Leave a Reply

Your email address will not be published. Required fields are marked *