Reducing C++ compilation time in Magnum: code optimizations

Large C++ pro­jects of­ten suf­fer with very long times for both full and in­cre­ment­al com­pil­a­tion, severely hurt­ing pro­ductiv­ity. The fol­low­ing series will over­view some tech­niques em­ployed in Mag­num to make it­er­a­tion times smal­ler, this art­icle fo­cuses on code-based op­tim­iz­a­tions.

To put things in­to per­spect­ive, the Mag­num graph­ics en­gine has around 100k lines of tem­plated C++ code, doc­u­ment­a­tion and com­ments. Cur­rently the un­op­tim­ized build with Clang, CMake and Ninja runs 2 minutes and 59 seconds with tests en­abled, without tests it’s just 76 seconds. Times men­tioned in the art­icle were meas­ured in vari­ous stages of de­vel­op­ment, thus they may not al­ways re­flect the cur­rent state.

The pre­pro­cessor is­sue

C pre­pro­cessor is the pre­de­cessor of all mod­ule sys­tems and it’s show­ing its age. Simply put, in­stead of provid­ing the com­piler with only es­sen­tial mod­ule in­form­a­tion, it just con­cat­en­ates all the re­quired code in­to one big file and passes it to the com­piler. This isn’t much of an is­sue with C-based pro­jects where the head­ers are small, but with C++’s tem­plates we need to put much more code in­to the head­er. When you in­clude the right STL head­ers and prop­er OpenGL head­ers, the pre­pro­cessed source can have well over 100k lines, which then takes sig­ni­fic­ant amount of time to parse. Usu­ally it doesn’t mat­ter wheth­er the code is spread over one or one hun­dred files, as any sane sys­tem should already have all the head­ers in disk cache.

To solve this and many oth­er pre­pro­cessor is­sues, Clang de­velopers are work­ing on a mod­ule sys­tem, but it is not us­able for C++ yet.

Dis­cov­er­ing prob­lem­at­ic in­cludes

If you are us­ing CMake with Make­file gen­er­at­or for your pro­ject, you can use it to gen­er­ate just the pre­pro­cessed file so you can ex­am­ine pre­pro­cessed line count for each source file — just ap­pend .i to name of source file. Then you can try re­mov­ing some #includes to bi­sect the big ones.

[build/src/Magnum/GL]$ make Framebuffer.cpp.i
Preprocessing CXX source to CMakeFiles/MagnumGLObjects.dir/Framebuffer.cpp.i
[build/src/Magnum/GL]$ wc -l CMakeFiles/MagnumGLObjects.dir/Framebuffer.cpp.i
35695 CMakeFiles/MagnumGLObjects.dir/Framebuffer.cpp.i

Re­du­cing in­cludes in head­ers

The first and ob­vi­ous trick is to re­move #includes which are not needed any­more. This is bor­ing and time-con­sum­ing task if done by hand, but helps a lot without even touch­ing the ac­tu­al code. There are also some semi-auto­mated tools for this, the simplest and dumbest brute-force meth­od is re­mov­ing #includes as long as the code can still com­pile.

If giv­en type is not used in the head­er, you can use for­ward de­clar­a­tions and move the #includes from head­er to im­ple­ment­a­tion file. Hav­ing the big #include in one *.cpp file as op­posed to hav­ing it in *.h which is in­cluded in 150 oth­er files helps a lot. If giv­en type is used only in some non-tem­plated func­tion, you can move its defin­i­tion in­to source file. The only prob­lem re­mains when the type is used as class mem­ber or in tem­plated func­tion.

For­ward-de­clar­ing non-tem­plated classes and structs is trivi­al (and very com­mon, for ex­ample with Qt). It gets more com­plic­ated when namespaces and typedefs are in­volved, with tem­plate classes you need to re­peat the whole tem­plate list and it can get quicky out of hand:

class Timeline; // easy

namespace Math { template<std::size_t, class> class Matrix; }
typedef float Float;
typedef Math::Matrix<3, Float> Matrix3x3; // ehh...

For user con­veni­ence Mag­num has for­ward de­clar­a­tion head­ers, which are avail­able for each namespace, so the users can just in­clude this tiny head­er and don’t need to write for­ward de­clar­a­tions on their own:

// forward-declares both Timeline and Matrix3x3
#include <Magnum/Magnum.h>

The prob­lem is when you want to for­ward-de­clare class with de­fault tem­plate ar­gu­ments. Sim­il­arly to de­fault ar­gu­ments in func­tions, in C++ you can’t re­peat the de­fault ar­gu­ment when de­fin­ing the type. As we already have for­ward de­clar­a­tion head­er, we can put the de­fault ar­gu­ments in that head­er and omit them in the ac­tu­al defin­i­tion. The type defin­i­tion must be com­plete, so the for­ward de­clar­a­tion head­er must be in­cluded in the type defin­i­tion head­er.

// SceneGraph.h
template<UnsignedInt, class T, class TranslationType = T> class TranslationTransformation;
// TranslationTransformation.h
#include "SceneGraph.h"

template<UnsignedInt dimensions, class T, class TranslationType> class TranslationTransformation {
    // ...
};

With C++11 it is also pos­sible to for­ward-de­clare typed enums. In Mag­num some enums are used on many places (MeshPrim­it­ive, GL::Buf­fer­Us­age …) and some are very large (Pixel­Format, GL::Tex­ture­Format, …) and the enum val­ues of­ten de­pend on OpenGL head­ers which are also big. The com­piler doesn’t care about par­tic­u­lar named val­ues and needs to know only the type, thus you can pass the value around without hav­ing full defin­i­tion of the enum around:

// forward-declares ColorFormat enum
#include <Magnum/Magnum.h>

// Don't need the header here
PixelFormat format = image.format();
// Need it here
#include <Magnum/PixelFormat.h>

format = PixelFormat::RGBA8Unorm;

Note that in C++ it is not pos­sible to for­ward de­clare class mem­bers. To re­duce head­er de­pend­en­cies I had to ex­tract some widely-used enums from their classes (thus GL::Buffer::Usage is now GL::Buf­fer­Us­age etc.), but the change res­ul­ted in im­proved com­pil­a­tion times of code where the enum for­ward-de­clar­a­tion is enough.

STL in­cludes

The Stand­ard C++ lib­rary is a prob­lem on its own. It is no­tori­ous for its heavy head­ers, which got even big­ger with C++11. The STL types are heav­ily tem­plated with de­fault tem­plate ar­gu­ments and im­ple­ment­a­tion-defined tem­plate para­met­ers, thus, as said above, it’s im­possible to work around the is­sue and cre­ate our own for­ward de­clar­a­tion head­ers.

The table be­low lists pre­pro­cessed line count of the largest STL head­ers. It was gen­er­ated from GCC’s lib­stdc++ 4.8.2 and Clang’s libc++ 3.3 with the fol­low­ing com­mand, head­ers which didn’t ex­ceed 25k lines were omit­ted. In com­par­is­on, whole <cmath> has just be­low 3k lines and <vector> is merely 11k lines in C++11 lib­stdc++.

echo "#include <iostream>" | g++ -std=c++11 -E -x c++ - | wc -l
Head­er C++03 lib­stdc++ C++11 lib­stdc++ C++11 libc++
<forward_list> 25927 18095
<queue> 8749 13830 26309
<algorithm> 9801 46279 16645
<complex> 21160 28312 44507
<valarray> 14671 49630 24949
<random> 36180 51187
<ios> 15442 21561 29202
<*stream> ~18000 ~24000 ~41000
<iomanip> 11504 24296 40545
<streambuf> 11839 17946 29652
<locale> 17913 24027 35188
<codecvt> n/a 28922
<regex> 70409 41601
<thread> 27436 17155
<future> 32254 19618

Note how the line count var­ies wildly between GCC’s lib­stdc++ and Clang’s libc++. The num­bers aren’t ex­actly ab­so­lute, as many head­ers share com­mon code, but the main of­fend­ers are the vari­ous <*stream> head­ers and vari­ous al­gorithm head­ers. For­tu­nately for <*stream> there is a for­ward-de­clar­a­tion head­er <iosfwd> which was cre­ated in some older re­vi­sion of C++ for ex­actly this pur­pose, be­cause the stream im­ple­ment­a­tion was far big­ger than the oth­er head­ers. The situ­ation changed with C++11, but sadly there were no more for­ward-de­clar­a­tion head­ers ad­ded. The vari­ous con­tain­er classes are around 10-20k lines and thus can be used as class mem­bers without much im­pact on com­pil­a­tion time, but the oth­er #includes shouldn’t ap­pear in head­ers at all.

Re­mov­ing all us­age of <algorithm> from Mag­num head­er files res­ul­ted in sig­ni­fic­ant com­pile time re­duc­tions (4:30 be­fore, 4:10 af­fer), re­mov­ing stream us­age or re­pla­cing all <*stream> oc­curences with <iosfwd> res­ul­ted in an­oth­er 20 seconds saved.

An­oth­er solu­tion is not to use STL at all and im­ple­ment everything from scratch. It’s then pos­sible to achieve very im­press­ive com­pil­a­tion times, but the re­sources re­quired to im­ple­ment the equi­val­ent of C++11 STL func­tion­al­ity are just too large.

Oth­er heavy in­cludes

The Boost lib­rary is also known for its head­er size, but it is not used in Mag­num (and C++11 in­cor­por­ates many use­ful things from this lib­rary so the need for it is even smal­ler). The oth­er heavy thing are OpenGL head­ers. Ori­gin­ally Mag­num used GLEW for OpenGL ex­ten­sion hand­ling, but GLEW head­ers have about 18k lines and con­tain many func­tions the en­gine will nev­er use. Re­cently I switched to glLoadGen, which gen­er­ates the head­er with only re­ques­ted func­tions. The gen­er­ated head­er has about 3k lines (which is roughly the size of of­fi­cial gl.h) and com­pil­a­tion time was re­duced from 5:00 to 4:45.

Re­du­cing in­cludes needed for class mem­bers

If you have some value type as class mem­ber, you need to #include, so the com­piler can know its size and can gen­er­ate prop­er con­struct­or, as­sign­ment op­er­at­or and de­struct­or. You can cir­cum­vent this by mak­ing it a ref­er­ence or point­er and then ex­pli­citly define the con­struct­or and oth­er func­tions in source file. The D-Point­er ap­proach, which is very heav­ily used in Qt, is an­oth­er solu­tion for this and many oth­er is­sues, how­ever the ad­di­tion­al heap al­loc­a­tion and in­dir­ec­tion has per­form­ance im­plic­a­tions and thus is not used in Mag­num.

Re­du­cing tem­plated code in head­ers

If the tem­plated code is used for lim­ited set of types (e.g. only floats and doubles), you can move the defin­i­tion in­to source file and ex­pli­citly in­stan­ti­ate the tem­plate for each type. This ap­proach is used in Mag­num’s scene graph. Ad­diton­ally Mag­num provides spe­cial tem­plate im­ple­ment­a­tion head­er for each class, which con­tains the defin­i­tions of tem­plated func­tions. If the users want to use the tem­plate for e.g. ints (which isn’t provided by de­fault), they can in­clude this head­er in some source file and do the ex­pli­cit in­stan­ti­ation them­selves:

// instantiation.cpp
#include "SceneGraph/AbstractObject.hpp"

template class SceneGraph::AbstractBasicObject2D<Int>;

Bal­an­cing size and count of com­pil­a­tion units

For head­ers it’s of­ten good to split the head­er in­to smal­ler ones with less de­pend­en­cies, but for source files it’s bet­ter to com­bine more of them in­to one, as the com­piler then needs to pre­pro­cess the in­cluded head­ers only once in­stead of more times. Be aware that this is double-edged sword and it will hurt it­er­a­tion times — re­com­pil­ing the whole huge file after a small change would take much longer than re­build­ing only a small one. Also the com­pile time re­duc­tion is not as sig­ni­fic­ant as when op­tim­iz­ing a widely-used head­er file. Mag­num uses this ap­proach for tem­plate in­stan­ti­ation files, the mer­ging res­ul­ted in 5 seconds short­er build time.

Re­du­cing amount of gen­er­ated code

C++11 extern template keyword tells the com­piler that the code is already com­piled in some lib­rary and thus the com­piler can skip the com­pil­a­tion and op­tim­iz­ing of giv­en code frag­ment and leave it for the linker.

Re­du­cing amount of ex­por­ted sym­bols helps the linker (and also dy­nam­ic linker at runtime), as it doesn’t have to pro­cess huge sym­bol table con­tain­ing stuff that isn’t used out­side the lib­rary. See GCC’s doc­u­ment­a­tion about vis­ib­il­ity.

The next part will be about op­tim­iz­ing the build sys­tem.