Some [More] Notes on the Abstraction Penalty for IA86 C++ Compilers

Posted on February 25, 2008 by Tommy McGuire
Labels: c++, software development, programming language
I just happened to run across Dan Piponi's Some Notes on the Abstraction Penalty for IA86 C++ Compilers, which looks at the code generated for the function:

int test1(int z) {
Matrix<static<int,3,3> > m;
Vector<static<int,3> > x;
m[0][0] = 1;
m[0][1] = 0;
m[0][2] = 0;
m[1][0] = 0;
m[1][1] = z;
m[1][2] = 0;
m[2][0] = 0;
m[2][1] = z;
m[2][2] = z;
x[0] = 1;
x[1] = 2;
x[2] = 3;
return (m*x)[2];

This function (with the appropriate template definitions) multiplies its argument by five.

The Microsoft compiler (version 13.10.3052) generates three instructions: a mov, a lea, and a ret; the lea instruction multiplies the argument by five. GCC (version 3.3.1) produces about 150 instructions, one of which is a call to a matrix-times-vector function. The famed Intel compiler (version 8.1) produced something like 50 instructions.

Now, on the one hand, the Microsoft and Intel compilers have a significant optimization advantage (which is irrelevant in this case) in only compiling for one architecture. GCC's optimization capabilities are excellent in the dancing elephant sense: given everything GCC does, it optimizes amazingly well. On the other hand, this example involves template expansion and inlining, neither of which really involves the architecture.

However, GCC has significantly improved its C++ support in recent versions, so I wanted to find out what a more recent version (4.1.1) would produce using -O3. Here it is, in all its glory:

leal (%rdi,%rdi,4), %eax

nm reports that the function is 4 bytes, versus 853 unoptimized or 135 with -O2.

$ nm -t dec --demangle -S a.out | grep test1
0000000004197344 0000000000000012 t global constructors keyed to _Z5test1i
0000000004197376 0000000000000004 T test1(int)

Some progress has been made.
active directory applied formal logic ashurbanipal authentication books c c++ comics conference continuations coq data structure digital humanities Dijkstra eclipse virgo electronics emacs goodreads haskell http java job Knuth ldap link linux lisp math naming nimrod notation OpenAM osgi parsing pony programming language protocols python quote quotes R random REST ruby rust SAML scala scheme shell software development system administration theory tip toy problems unix vmware yeti
Member of The Internet Defense League
Site proudly generated by Hakyll.