Reducing C++ compilation time in Magnum: code optimizations

Large C++ projects of­ten suf­fer with very long times for both full and in­cre­men­tal com­pi­la­tion, se­vere­ly hurt­ing pro­duc­tiv­i­ty. The fol­low­ing se­ries will over­view some tech­niques em­ployed in Mag­num to make it­er­a­tion times small­er, this ar­ti­cle fo­cus­es on code-based op­ti­miza­tions.

To put things in­to per­spec­tive, the Mag­num graph­ics en­gine has around 100k lines of tem­plat­ed C++ code, doc­u­men­ta­tion and com­ments. Cur­rent­ly the un­op­ti­mized build with Clang, CMake and Nin­ja runs 2 min­utes and 59 sec­onds with tests en­abled, with­out tests it’s just 76 sec­onds. Times men­tioned in the ar­ti­cle were mea­sured in var­i­ous stages of de­vel­op­ment, thus they may not al­ways re­flect the cur­rent state.

The pre­pro­ces­sor is­sue

C pre­proces­sor is the pre­de­ces­sor of all mod­ule sys­tems and it’s show­ing its age. Sim­ply put, in­stead of pro­vid­ing the com­pil­er with on­ly es­sen­tial mod­ule in­for­ma­tion, it just con­cate­nates all the re­quired code in­to one big file and pass­es it to the com­pil­er. This isn’t much of an is­sue with C-based projects where the head­ers are small, but with C++’s tem­plates we need to put much more code in­to the head­er. When you in­clude the right STL head­ers and prop­er OpenGL head­ers, the pre­pro­cessed source can have well over 100k lines, which then takes sig­nif­i­cant amount of time to parse. Usu­al­ly it doesn’t mat­ter whether the code is spread over one or one hun­dred files, as any sane sys­tem should al­ready have all the head­ers in disk cache.

To solve this and many oth­er pre­proces­sor is­sues, Clang de­vel­op­ers are work­ing on a mod­ule sys­tem, but it is not us­able for C++ yet.

Dis­cov­er­ing prob­lem­at­ic in­cludes

If you are us­ing CMake with Make­file gen­er­a­tor for your project, you can use it to gen­er­ate just the pre­pro­cessed file so you can ex­am­ine pre­pro­cessed line count for each source file — just ap­pend .i to name of source file. Then you can try re­mov­ing some #includes to bi­sect the big ones.

[build/src/Magnum/GL]$ make Framebuffer.cpp.i
Preprocessing CXX source to CMakeFiles/MagnumGLObjects.dir/Framebuffer.cpp.i
[build/src/Magnum/GL]$ wc -l CMakeFiles/MagnumGLObjects.dir/Framebuffer.cpp.i
35695 CMakeFiles/MagnumGLObjects.dir/Framebuffer.cpp.i

Re­duc­ing in­cludes in head­ers

The first and ob­vi­ous trick is to re­move #includes which are not need­ed any­more. This is bor­ing and time-con­sum­ing task if done by hand, but helps a lot with­out even touch­ing the ac­tu­al code. There are al­so some se­mi-au­to­mat­ed tools for this, the sim­plest and dumb­est brute-force method is re­mov­ing #includes as long as the code can still com­pile.

If giv­en type is not used in the head­er, you can use for­ward dec­la­ra­tions and move the #includes from head­er to im­ple­men­ta­tion file. Hav­ing the big #include in one *.cpp file as op­posed to hav­ing it in *.h which is in­clud­ed in 150 oth­er files helps a lot. If giv­en type is used on­ly in some non-tem­plat­ed func­tion, you can move its def­i­ni­tion in­to source file. The on­ly prob­lem re­mains when the type is used as class mem­ber or in tem­plat­ed func­tion.

For­ward-declar­ing non-tem­plat­ed class­es and structs is triv­ial (and very com­mon, for ex­am­ple with Qt). It gets more com­pli­cat­ed when namespaces and typedefs are in­volved, with tem­plate class­es you need to re­peat the whole tem­plate list and it can get quicky out of hand:

class Timeline; // easy

namespace Math { template<std::size_t, class> class Matrix; }
typedef float Float;
typedef Math::Matrix<3, Float> Matrix3x3; // ehh...

For us­er con­ve­nience Mag­num has for­ward dec­la­ra­tion head­ers, which are avail­able for each names­pace, so the users can just in­clude this tiny head­er and don’t need to write for­ward dec­la­ra­tions on their own:

// forward-declares both Timeline and Matrix3x3
#include <Magnum/Magnum.h>

The prob­lem is when you want to for­ward-de­clare class with de­fault tem­plate ar­gu­ments. Sim­i­lar­ly to de­fault ar­gu­ments in func­tions, in C++ you can’t re­peat the de­fault ar­gu­ment when defin­ing the type. As we al­ready have for­ward dec­la­ra­tion head­er, we can put the de­fault ar­gu­ments in that head­er and omit them in the ac­tu­al def­i­ni­tion. The type def­i­ni­tion must be com­plete, so the for­ward dec­la­ra­tion head­er must be in­clud­ed in the type def­i­ni­tion head­er.

// SceneGraph.h
template<UnsignedInt, class T, class TranslationType = T> class TranslationTransformation;
// TranslationTransformation.h
#include "SceneGraph.h"

template<UnsignedInt dimensions, class T, class TranslationType> class TranslationTransformation {
    // ...

With C++11 it is al­so pos­si­ble to for­ward-de­clare typed enums. In Mag­num some enums are used on many places (Mesh­Prim­i­tive, GL::BufferUsage …) and some are very large (Pix­elFor­mat, GL::Tex­ture­For­mat, …) and the enum val­ues of­ten de­pend on OpenGL head­ers which are al­so big. The com­pil­er doesn’t care about par­tic­u­lar named val­ues and needs to know on­ly the type, thus you can pass the val­ue around with­out hav­ing full def­i­ni­tion of the enum around:

// forward-declares ColorFormat enum
#include <Magnum/Magnum.h>

// Don't need the header here
PixelFormat format = image.format();
// Need it here
#include <Magnum/PixelFormat.h>

format = PixelFormat::RGBA8Unorm;

Note that in C++ it is not pos­si­ble to for­ward de­clare class mem­bers. To re­duce head­er de­pen­den­cies I had to ex­tract some wide­ly-used enums from their class­es (thus GL::Buffer::Usage is now GL::BufferUsage etc.), but the change re­sult­ed in im­proved com­pi­la­tion times of code where the enum for­ward-dec­la­ra­tion is enough.

STL in­cludes

The Stan­dard C++ li­brary is a prob­lem on its own. It is no­to­ri­ous for its heavy head­ers, which got even big­ger with C++11. The STL types are heav­i­ly tem­plat­ed with de­fault tem­plate ar­gu­ments and im­ple­men­ta­tion-de­fined tem­plate pa­ram­e­ters, thus, as said above, it’s im­pos­si­ble to work around the is­sue and cre­ate our own for­ward dec­la­ra­tion head­ers.

The ta­ble be­low lists pre­pro­cessed line count of the largest STL head­ers. It was gen­er­at­ed from GCC’s lib­st­dc++ 4.8.2 and Clang’s libc++ 3.3 with the fol­low­ing com­mand, head­ers which didn’t ex­ceed 25k lines were omit­ted. In com­par­i­son, whole <cmath> has just be­low 3k lines and <vector> is mere­ly 11k lines in C++11 lib­st­dc++.

echo "#include <iostream>" | g++ -std=c++11 -E -x c++ - | wc -l
Head­er C++03 lib­st­dc++ C++11 lib­st­dc++ C++11 libc++
<forward_list> 25927 18095
<queue> 8749 13830 26309
<algorithm> 9801 46279 16645
<complex> 21160 28312 44507
<valarray> 14671 49630 24949
<random> 36180 51187
<ios> 15442 21561 29202
<*stream> ~18000 ~24000 ~41000
<iomanip> 11504 24296 40545
<streambuf> 11839 17946 29652
<locale> 17913 24027 35188
<codecvt> n/a 28922
<regex> 70409 41601
<thread> 27436 17155
<future> 32254 19618

Note how the line count varies wild­ly be­tween GCC’s lib­st­dc++ and Clang’s libc++. The num­bers aren’t ex­act­ly ab­so­lute, as many head­ers share com­mon code, but the main of­fend­ers are the var­i­ous <*stream> head­ers and var­i­ous al­go­rithm head­ers. For­tu­nate­ly for <*stream> there is a for­ward-dec­la­ra­tion head­er <iosfwd> which was cre­at­ed in some old­er re­vi­sion of C++ for ex­act­ly this pur­pose, be­cause the stream im­ple­men­ta­tion was far big­ger than the oth­er head­ers. The sit­u­a­tion changed with C++11, but sad­ly there were no more for­ward-dec­la­ra­tion head­ers added. The var­i­ous con­tain­er class­es are around 10-20k lines and thus can be used as class mem­bers with­out much im­pact on com­pi­la­tion time, but the oth­er #includes shouldn’t ap­pear in head­ers at all.

Re­mov­ing all us­age of <algorithm> from Mag­num head­er files re­sult­ed in sig­nif­i­cant com­pile time re­duc­tions (4:30 be­fore, 4:10 af­fer), re­mov­ing stream us­age or re­plac­ing all <*stream> oc­curences with <iosfwd> re­sult­ed in an­oth­er 20 sec­onds saved.

An­oth­er so­lu­tion is not to use STL at all and im­ple­ment ev­ery­thing from scratch. It’s then pos­si­ble to achieve very im­pres­sive com­pi­la­tion times, but the re­sources re­quired to im­ple­ment the equiv­a­lent of C++11 STL func­tion­al­i­ty are just too large.

Oth­er heavy in­cludes

The Boost li­brary is al­so known for its head­er size, but it is not used in Mag­num (and C++11 in­cor­po­rates many use­ful things from this li­brary so the need for it is even small­er). The oth­er heavy thing are OpenGL head­ers. Orig­i­nal­ly Mag­num used GLEW for OpenGL ex­ten­sion han­dling, but GLEW head­ers have about 18k lines and con­tain many func­tions the en­gine will nev­er use. Re­cent­ly I switched to glLoadGen, which gen­er­ates the head­er with on­ly re­quest­ed func­tions. The gen­er­at­ed head­er has about 3k lines (which is rough­ly the size of of­fi­cial gl.h) and com­pi­la­tion time was re­duced from 5:00 to 4:45.

Re­duc­ing in­cludes need­ed for class mem­bers

If you have some val­ue type as class mem­ber, you need to #include, so the com­pil­er can know its size and can gen­er­ate prop­er con­struc­tor, as­sign­ment op­er­a­tor and de­struc­tor. You can cir­cum­vent this by mak­ing it a ref­er­ence or point­er and then ex­plic­it­ly de­fine the con­struc­tor and oth­er func­tions in source file. The D-Point­er ap­proach, which is very heav­i­ly used in Qt, is an­oth­er so­lu­tion for this and many oth­er is­sues, how­ev­er the ad­di­tion­al heap al­lo­ca­tion and in­di­rec­tion has per­for­mance im­pli­ca­tions and thus is not used in Mag­num.

Re­duc­ing tem­plat­ed code in head­ers

If the tem­plat­ed code is used for lim­it­ed set of types (e.g. on­ly floats and doubles), you can move the def­i­ni­tion in­to source file and ex­plic­it­ly in­stan­ti­ate the tem­plate for each type. This ap­proach is used in Mag­num’s scene graph. Ad­di­ton­al­ly Mag­num pro­vides spe­cial tem­plate im­ple­men­ta­tion head­er for each class, which con­tains the def­i­ni­tions of tem­plat­ed func­tions. If the users want to use the tem­plate for e.g. ints (which isn’t pro­vid­ed by de­fault), they can in­clude this head­er in some source file and do the ex­plic­it in­stan­ti­a­tion them­selves:

// instantiation.cpp
#include "SceneGraph/AbstractObject.hpp"

template class SceneGraph::AbstractBasicObject2D<Int>;

Bal­anc­ing size and count of com­pi­la­tion units

For head­ers it’s of­ten good to split the head­er in­to small­er ones with less de­pen­den­cies, but for source files it’s bet­ter to com­bine more of them in­to one, as the com­pil­er then needs to pre­pro­cess the in­clud­ed head­ers on­ly once in­stead of more times. Be aware that this is dou­ble-edged sword and it will hurt it­er­a­tion times — re­com­pil­ing the whole huge file af­ter a small change would take much longer than re­build­ing on­ly a small one. Al­so the com­pile time re­duc­tion is not as sig­nif­i­cant as when op­ti­miz­ing a wide­ly-used head­er file. Mag­num us­es this ap­proach for tem­plate in­stan­ti­a­tion files, the merg­ing re­sult­ed in 5 sec­onds short­er build time.

Re­duc­ing amount of gen­er­at­ed code

C++11 extern template key­word tells the com­pil­er that the code is al­ready com­piled in some li­brary and thus the com­pil­er can skip the com­pi­la­tion and op­ti­miz­ing of giv­en code frag­ment and leave it for the link­er.

Re­duc­ing amount of ex­port­ed sym­bols helps the link­er (and al­so dy­nam­ic link­er at run­time), as it doesn’t have to process huge sym­bol ta­ble con­tain­ing stuff that isn’t used out­side the li­brary. See GCC’s doc­u­men­ta­tion about vis­i­bil­i­ty.

The next part will be about op­ti­miz­ing the build sys­tem.