In his post may the FORs be with you, Michael mentioned that it’s best to avoid handwritten loops at all. This is not self explanatory for a lot of people, and while many arguments can be put forth to support the use of algorithms, in this post let’s stick with one: readabilty.

Take a look at the following example of a code using handwritten loops. It fits a line through a set of points using linear regression.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
#include <vector> #include <iostream> int main() { std::vector<double> X = {20, 16, 15, 16, 13, 10}; std::vector<double> Y = {0, 3, 7, 4, 6, 10}; //arithmetic mean auto mean_x = 0.0; for(auto x : X){ mean_x += x; } mean_x /= X.size(); auto mean_y = 0.0; for(auto y : Y){ mean_y += y; } mean_y /= Y.size(); //difference from mean for(auto& x : X){ x -= mean_x; } for(auto& y : X){ y -= mean_y; } //residuals auto variance = 0.0; auto covariance = 0.0; for(auto index = 0; index != X.size(); ++index){ variance += X[index] * X[index]; covariance += X[index] * Y[index]; } auto slope = covariance / variance; auto y_intercept = mean_y - slope * mean_x; std::cout << "The slope of the regression line is " << slope << '\n' << "and the y-intercept is " << y_intercept << ".\n"; return 0; } |

Ok, C++11′s range-based for loops are already helping a lot to reduce the syntactic noise, but ‘for’ still only means ‘loop’. It is up to the reader to look into the body and come up with a better fitting word like ‘accumulate’, ‘transform’ or ‘product’ in his mind. The next example performs the same task, but makes use of algorithms. Since I prefer ranges to iterators in most situations, the examples uses the boost Range library instead of using the standard libraries algorithms.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
#include <boost/range/algorithm.hpp> #include <boost/range/numeric.hpp> #include <boost/phoenix/phoenix.hpp> #include <vector> #include <iostream> int main() { std::vector<double> X = {20, 16, 15, 16, 13, 10}; std::vector<double> Y = {0, 3, 7, 4, 6, 10}; //arithmetic mean auto const mean_x = boost::accumulate(X, 0.0) / boost::size(X); auto const mean_y = boost::accumulate(Y, 0.0) / boost::size(Y); //difference from mean using boost::phoenix::arg_names::arg1; boost::transform(X, std::begin(X), arg1 - mean_x); boost::transform(Y, std::begin(Y), arg1 - mean_y); //residuals auto const variance = boost::inner_product(X, X, 0.0); auto const covariance = boost::inner_product(X, Y, 0.0); auto const slope = covariance / variance; auto const intercept = mean_y - slope * mean_x; std::cout << "The slope of the regression line is " << slope << '\n' << "and the y-intercept is " << y_intercept << ".\n"; return 0; } |

Isn’t this more readable? Also notice how this programming style allows us to use const more offensively. However, this is not the last improvment we will apply to this example. This implementation of linear regression still changes its input data, but there is a way to get around this without copying using range adaptors. Read more on this in an upcoming post…

Pingback: Adaptors | Clean C++

It is probably because of the triviality of the example but I would like to point out the obvious anyway: You are duplicating code! You have, in both examples, implemented the “calculate mean” and the “calculate differences” algorithm twice. I would have preferred factoring the user algorithms out before applying the range algorithms.

Hello Linus,

you are completly right. DRY ranks pretty high in my priorities, too. Not factoring these out has been a tribute to keeping the example code terse. I would never have a chance getting it like this through a code review with Michael. Another way to avoid the repetition is to actually only take one vector of (X, Y). The precondition, that there are just as many X as threre are Y, would then be visible in the signature.

Thanks, for your feedback. In future articles I’ll try to keep it closer to production quality.

Boost is a wonderfully elegant tool for things like this…until it isn’t, that is, until you have to burrow into illegible header files because you forgot a template argument or misused a non-const reference. The idea of replacing a simple for loop (which everyone can read and write) with a complicated boost template header (which nobody can read and which punishes you for attempting to understand it) is insane.

IMO a much better way to iterate through such a vector, for performance reasons (since std::vector::size() causes problems with cache coherence):

for(int index=0, end=X.size(); index<end; ++index){

// do something here

}

This has several advantages: First, it's standard and idiomatic, and everyone can read it no matter what language they are used to or how little experience they have with the intricacies of one nonstandard library. Second, it is faster, because the compiler knows what to do with it and may more easily be able to perform optimizations. Third, X.size() only gets called once, which saves many cache misses (reading from main memory in an inner loop on PC hardware is far too slow for many applications). That only could ever cause a problem if you are modifying the size of the vector in a loop, but don't do that, because it will make your code confusing and impossible to debug.

Another rule: code which isn't obvious should always be commented. for(int i=0; i<N; ++i) is obvious. What you are doing is not. It makes calls to several different boost headers which even I, as a fairly experienced C++ developer, have never heard of, to perform tasks which have historically been done another, much simpler, way. Why are there only three comments, none of which really provides any useful, detailed description of what is going on?